Recent Research Updates
Automatically updated daily. Total papers in DB: 4609
No recent papers in the last 7 days.
Mixed LR-$C(α)$-type tests for irregular hypotheses, general criterion functions and misspecified models
robustness properties of \(C(\alpha)\)-type procedures in an extremum
estimation setting.
The test statistic is constructed by applying separate adjustments to the
restricted and unrestricted criterion functions, and is shown to be
asymptotically pivotal under minimal conditions. It features two main
robustness properties. First, unlike standard LR-type statistics, its null
asymptotic distribution remains chi-square even under model misspecification,
where the information matrix equality fails. Second, it accommodates irregular
hypotheses involving constrained parameter spaces, such as boundary parameters,
relying solely on root-\(n\)-consistent estimators for nuisance parameters.
When the model is correctly specified, no boundary constraints are present, and
parameters are estimated by extremum estimators, the proposed test reduces to
the standard LR-type statistic.
Simulations with ARCH models, where volatility parameters are constrained to
be nonnegative, and parametric survival regressions with potentially monotone
increasing hazard functions, demonstrate that our test maintains accurate size
and exhibits good power. An empirical application to a two-way error components
model shows that the proposed test can provide more informative inference than
the conventional \(t\)-test.
arXiv link: http://arxiv.org/abs/2510.17070v1
Equilibrium-Constrained Estimation of Recursive Logit Choice Models
sequential decision-making in transportation and choice networks, with
important applications in route choice analysis, multiple discrete choice
problems, and activity-based travel demand modeling. Despite its versatility,
estimation of the RL model typically relies on nested fixed-point (NFXP)
algorithms that are computationally expensive and prone to numerical
instability. We propose a new approach that reformulates the maximum likelihood
estimation problem as an optimization problem with equilibrium constraints,
where both the structural parameters and the value functions are treated as
decision variables. We further show that this formulation can be equivalently
transformed into a conic optimization problem with exponential cones, enabling
efficient solution using modern conic solvers such as MOSEK. Experiments on
synthetic and real-world datasets demonstrate that our convex reformulation
achieves accuracy comparable to traditional methods while offering significant
improvements in computational stability and efficiency, thereby providing a
practical and scalable alternative for recursive logit model estimation.
arXiv link: http://arxiv.org/abs/2510.16886v1
Local Overidentification and Efficiency Gains in Modern Causal Inference and Data Combination
Chen and Santos (2018), and the associated semiparametric efficiency in modern
causal frameworks. We develop a unified approach that begins by translating
structural models with latent variables into their induced statistical models
of observables and then analyzes local overidentification through conditional
moment restrictions. We apply this approach to three leading models: (i) the
general treatment model under unconfoundedness, (ii) the negative control
model, and (iii) the long-term causal inference model under unobserved
confounding. The first design yields a locally just-identified statistical
model, implying that all regular asymptotically linear estimators of the
treatment effect share the same asymptotic variance, equal to the (trivial)
semiparametric efficiency bound. In contrast, the latter two models involve
nonparametric endogeneity and are naturally locally overidentified;
consequently, some doubly robust orthogonal moment estimators of the average
treatment effect are inefficient. Whereas existing work typically imposes
strong conditions to restore just-identification before deriving the efficiency
bound, we relax such assumptions and characterize the general efficiency bound,
along with efficient estimators, in the overidentified models (ii) and (iii).
arXiv link: http://arxiv.org/abs/2510.16683v1
On Quantile Treatment Effects, Rank Similarity,and Variation of Instrumental Variables
distributional treatment effects under nonseparable endogeneity. We begin by
revisiting the widely adopted rank similarity (RS) assumption and
characterizing it by the relationship it imposes between observed and
counterfactual potential outcome distributions. The characterization highlights
the restrictiveness of RS, motivating a weaker identifying condition. Under
this alternative, we construct identifying bounds on the distributional
treatment effects of interest through a linear semi-infinite programming (SILP)
formulation. Our identification strategy also clarifies how richer exogenous
instrument variation, such as multi-valued or multiple instruments, can further
tighten these bounds. Finally, exploiting the SILP's saddle-point structure and
Karush-Kuhn-Tucker (KKT) conditions, we establish large-sample properties for
the empirical SILP: consistency and asymptotic distribution results for the
estimated bounds and associated solutions.
arXiv link: http://arxiv.org/abs/2510.16681v1
Causal Inference in High-Dimensional Generalized Linear Models with Binary Outcomes
high-dimensional generalized linear models with binary outcomes and general
link functions. The estimator augments a regularized regression plug-in with
weights computed from a convex optimization problem that approximately balances
link-derivative-weighted covariates and controls variance; it does not rely on
estimated propensity scores. Under standard conditions, the estimator is
$n$-consistent and asymptotically normal for dense linear contrasts and
causal parameters. Simulation results show the superior performance of our
approach in comparison to alternatives such as inverse propensity score
estimators and double machine learning estimators in finite samples. In an
application to the National Supported Work training data, our estimates and
confidence intervals are close to the experimental benchmark.
arXiv link: http://arxiv.org/abs/2510.16669v1
On the Asymptotics of the Minimax Linear Estimator
unconfoundedness, can be written as continuous linear functionals of an unknown
regression function. We study a weighting estimator that sets weights by a
minimax procedure: solving a convex optimization problem that trades off
worst-case conditional bias against variance. Despite its growing use, general
root-$n$ theory for this method has been limited. This paper fills that gap.
Under regularity conditions, we show that the minimax linear estimator is
root-$n$ consistent and asymptotically normal, and we derive its asymptotic
variance. These results justify ignoring worst-case bias when forming
large-sample confidence intervals and make inference less sensitive to the
scaling of the function class. With a mild variance condition, the estimator
attains the semiparametric efficiency bound, so an augmentation step commonly
used in the literature is not needed to achieve first-order optimality.
Evidence from simulations and three empirical applications, including
job-training and minimum-wage policies, points to a simple rule: in designs
satisfying our regularity conditions, standard-error confidence intervals
suffice; otherwise, bias-aware intervals remain important.
arXiv link: http://arxiv.org/abs/2510.16661v1
From Reviews to Actionable Insights: An LLM-Based Approach for Attribute and Feature Extraction
extracting product and service attributes, features, and associated sentiments
from customer reviews. Grounded in marketing theory, the framework
distinguishes perceptual attributes from actionable features, producing
interpretable and managerially actionable insights. We apply the methodology to
20,000 Yelp reviews of Starbucks stores and evaluate eight prompt variants on a
random subset of reviews. Model performance is assessed through agreement with
human annotations and predictive validity for customer ratings. Results show
high consistency between LLMs and human coders and strong predictive validity,
confirming the reliability of the approach. Human coders required a median of
six minutes per review, whereas the LLM processed each in two seconds,
delivering comparable insights at a scale unattainable through manual coding.
Managerially, the analysis identifies attributes and features that most
strongly influence customer satisfaction and their associated sentiments,
enabling firms to pinpoint "joy points," address "pain points," and design
targeted interventions. We demonstrate how structured review data can power an
actionable marketing dashboard that tracks sentiment over time and across
stores, benchmarks performance, and highlights high-leverage features for
improvement. Simulations indicate that enhancing sentiment for key service
features could yield 1-2% average revenue gains per store.
arXiv link: http://arxiv.org/abs/2510.16551v1
Prediction Intervals for Model Averaging
their applications have largely been limited to point prediction, as measuring
prediction uncertainty in general settings remains an open problem. In this
paper we propose prediction intervals for model averaging based on conformal
inference. These intervals cover out-of-sample realizations of the outcome
variable with a pre-specified probability, providing a way to assess predictive
uncertainty beyond point prediction. The framework allows general model
misspecification and applies to averaging across multiple models that can be
nested, disjoint, overlapping, or any combination thereof, with weights that
may depend on the estimation sample. We establish coverage guarantees under two
sets of assumptions: exact finite-sample validity under exchangeability,
relevant for cross-sectional data, and asymptotic validity under stationarity,
relevant for time-series data. We first present a benchmark algorithm and then
introduce a locally adaptive refinement and split-sample procedures that
broaden applicability. The methods are illustrated with a cross-sectional
application to real estate appraisal and a time-series application to equity
premium forecasting.
arXiv link: http://arxiv.org/abs/2510.16224v1
Learning Correlated Reward Models: Statistical Barriers and Opportunities
preferences and play a key role in reward modeling for Reinforcement Learning
from Human Feedback (RLHF). However, a crucial shortcoming of many of these
techniques is the Independence of Irrelevant Alternatives (IIA) assumption,
which collapses all human preferences to a universal underlying utility
function, yielding a coarse approximation of the range of human preferences. On
the other hand, statistical and computational guarantees for models avoiding
this assumption are scarce. In this paper, we investigate the statistical and
computational challenges of learning a correlated probit model, a
fundamental RUM that avoids the IIA assumption. First, we establish that the
classical data collection paradigm of pairwise preference data is
fundamentally insufficient to learn correlational information,
explaining the lack of statistical and computational guarantees in this
setting. Next, we demonstrate that best-of-three preference data
provably overcomes these shortcomings, and devise a statistically and
computationally efficient estimator with near-optimal performance. These
results highlight the benefits of higher-order preference data in learning
correlated utilities, allowing for more fine-grained modeling of human
preferences. Finally, we validate these theoretical guarantees on several
real-world datasets, demonstrating improved personalization of human
preferences.
arXiv link: http://arxiv.org/abs/2510.15839v1
Dynamic Spatial Treatment Effects as Continuous Functionals: Theory and Evidence from Healthcare Access
grounded in Navier-Stokes partial differential equations. Rather than discrete
treatment parameters, the framework characterizes treatment intensity as
continuous functions $\tau(x, t)$ over space-time, enabling rigorous
analysis of boundary evolution, spatial gradients, and cumulative exposure.
Empirical validation using 32,520 U.S. ZIP codes demonstrates exponential
spatial decay for healthcare access ($\kappa = 0.002837$ per km, $R^2 =
0.0129$) with detectable boundaries at 37.1 km. The framework successfully
diagnoses when scope conditions hold: positive decay parameters validate
diffusion assumptions near hospitals, while negative parameters correctly
signal urban confounding effects. Heterogeneity analysis reveals 2-13 $\times$
stronger distance effects for elderly populations and substantial education
gradients. Model selection strongly favors logarithmic decay over exponential
($\Delta AIC > 10,000$), representing a middle ground between
exponential and power-law decay. Applications span environmental economics,
banking, and healthcare policy. The continuous functional framework provides
predictive capability ($d^*(t) = \xi^* t$), parameter sensitivity
($\partial d^*/\partial \nu$), and diagnostic tests unavailable in traditional
difference-in-differences approaches.
arXiv link: http://arxiv.org/abs/2510.15324v2
Regression Model Selection Under General Conditions
Proofs showing a model selection criterion is asymptotically optimal are
tailored to the type of model (linear regression, quantile regression,
penalized regression, etc.), the estimation method (linear smoothers, maximum
likelihood, generalized method of moments, etc.), the type of data (i.i.d.,
dependent, high dimensional, etc.), and the type of model selection criterion.
Moreover, assumptions are often restrictive and unrealistic making it a slow
and winding process for researchers to determine if a model selection criterion
is selecting an optimal model. This paper provides general proofs showing
asymptotic optimality for a wide range of model selection criteria under
general conditions. This paper not only asymptotically justifies model
selection criteria for most situations, but it also unifies and extends a range
of previously disparate results.
arXiv link: http://arxiv.org/abs/2510.14822v1
Evaluating Policy Effects under Network Interference without Network Information: A Transfer Learning Approach
average total treatment effect (ATTE) from source data with a fully observed
network to target data whose network is completely unknown. The ATTE represents
the average social impact of a policy that assigns the treatment to every
individual in the dataset. We postulate a covariate-shift type assumption that
both source and target datasets share the same conditional mean outcome.
However, because the target network is unobserved, this assumption alone is not
sufficient to pin down the ATTE for the target data. To address this issue, we
consider a sensitivity analysis based on the uncertainty of the target
network's degree distribution, where the extent of uncertainty is measured by
the Wasserstein distance from a given reference degree distribution. We then
construct bounds on the target ATTE using a linear programming-based estimator.
The limiting distribution of the bound estimator is derived via the functional
delta method, and we develop a wild bootstrap approach to approximate the
distribution. As an empirical illustration, we revisit the social network
experiment on farmers' weather insurance adoption in China by Cai et al.
(2015).
arXiv link: http://arxiv.org/abs/2510.14415v1
Dynamic Spatial Treatment Effect Boundaries: A Continuous Functional Framework from Navier-Stokes Equations
effect boundaries using continuous functional definitions grounded in
Navier-Stokes partial differential equations. Rather than discrete treatment
effect estimators, the framework characterizes treatment intensity as a
continuous function $\tau(x, t)$ over space-time, enabling rigorous
analysis of propagation dynamics, boundary evolution, and cumulative exposure
patterns. Building on exact self-similar solutions expressible through Kummer
confluent hypergeometric and modified Bessel functions, I establish that
treatment effects follow scaling laws $\tau(d, t) = t^{-\alpha} f(d/t^\beta)$
where exponents characterize diffusion mechanisms. Empirical validation using
42 million TROPOMI satellite observations of NO$_2$ pollution from U.S.
coal-fired power plants demonstrates strong exponential spatial decay
($\kappa_s = 0.004$ per km, $R^2 = 0.35$) with detectable boundaries at 572 km.
Monte Carlo simulations confirm superior performance over discrete parametric
methods in boundary detection and false positive avoidance (94% vs 27%
correct rejection). Regional heterogeneity analysis validates diagnostic
capability: positive decay parameters within 100 km confirm coal plant
dominance; negative parameters beyond 100 km correctly signal when urban
sources dominate. The continuous functional perspective unifies spatial
econometrics with mathematical physics, providing theoretically grounded
methods for boundary detection, exposure quantification, and policy evaluation
across environmental economics, banking, and healthcare applications.
arXiv link: http://arxiv.org/abs/2510.14409v2
Debiased Kernel Estimation of Spot Volatility in the Presence of Infinite Variation Jumps
becomes particularly challenging when jump activity is high, a phenomenon
observed empirically in highly traded financial securities. In this paper, we
revisit the problem of spot volatility estimation for an It\^o semimartingale
with jumps of unbounded variation. We construct truncated kernel-based
estimators and debiased variants that extend the efficiency frontier for spot
volatility estimation in terms of the jump activity index $Y$, raising the
previous bound $Y<4/3$ to $Y<20/11$, thereby covering nearly the entire
admissible range $Y<2$. Compared with earlier work, our approach attains
smaller asymptotic variances through the use of unbounded kernels, is simpler
to implement, and has broader applicability under more flexible model
assumptions. A comprehensive simulation study confirms that our procedures
substantially outperform competing methods in finite samples.
arXiv link: http://arxiv.org/abs/2510.14285v1
Nonparametric Identification of Spatial Treatment Effect Boundaries: Evidence from Bank Branch Consolidation
treatment effects without imposing parametric functional form restrictions. The
method employs local linear regression with data-driven bandwidth selection to
flexibly estimate spatial decay patterns and detect treatment effect
boundaries. Monte Carlo simulations demonstrate that the nonparametric approach
exhibits lower bias and correctly identifies the absence of boundaries when
none exist, unlike parametric methods that may impose spurious spatial
patterns. I apply this framework to bank branch openings during 2015--2020,
matching 5,743 new branches to 5.9 million mortgage applications across 14,209
census tracts. The analysis reveals that branch proximity significantly affects
loan application volume (8.5% decline per 10 miles) but not approval rates,
consistent with branches stimulating demand through local presence while credit
decisions remain centralized. Examining branch survival during the digital
transformation era (2010--2023), I find a non-monotonic relationship with area
income: high-income areas experience more closures despite conventional wisdom.
This counterintuitive pattern reflects strategic consolidation of redundant
branches in over-banked wealthy urban areas rather than discrimination against
poor neighborhoods. Controlling for branch density, urbanization, and
competition, the direct income effect diminishes substantially, with branch
density emerging as the primary determinant of survival. These findings
demonstrate the necessity of flexible nonparametric methods for detecting
complex spatial patterns that parametric models would miss, and challenge
simplistic narratives about banking deserts by revealing the organizational
complexity underlying spatial consolidation decisions.
arXiv link: http://arxiv.org/abs/2510.13148v2
Nonparametric Identification of Spatial Treatment Effect Boundaries: Evidence from Bank Branch Consolidation
treatment effects without imposing parametric functional form restrictions. The
method employs local linear regression with data-driven bandwidth selection to
flexibly estimate spatial decay patterns and detect treatment effect
boundaries. Monte Carlo simulations demonstrate that the nonparametric approach
exhibits lower bias and correctly identifies the absence of boundaries when
none exist, unlike parametric methods that may impose spurious spatial
patterns. I apply this framework to bank branch openings during 2015--2020,
matching 5,743 new branches to 5.9 million mortgage applications across 14,209
census tracts. The analysis reveals that branch proximity significantly affects
loan application volume (8.5% decline per 10 miles) but not approval rates,
consistent with branches stimulating demand through local presence while credit
decisions remain centralized. Examining branch survival during the digital
transformation era (2010--2023), I find a non-monotonic relationship with area
income: high-income areas experience more closures despite conventional wisdom.
This counterintuitive pattern reflects strategic consolidation of redundant
branches in over-banked wealthy urban areas rather than discrimination against
poor neighborhoods. Controlling for branch density, urbanization, and
competition, the direct income effect diminishes substantially, with branch
density emerging as the primary determinant of survival. These findings
demonstrate the necessity of flexible nonparametric methods for detecting
complex spatial patterns that parametric models would miss, and challenge
simplistic narratives about banking deserts by revealing the organizational
complexity underlying spatial consolidation decisions.
arXiv link: http://arxiv.org/abs/2510.13148v1
Beyond Returns: A Candlestick-Based Approach to Spot Covariance Estimation
return data over short time windows, but such approaches face a trade-off
between statistical accuracy and localization. In this paper, I introduce a new
estimation framework using high-frequency candlestick data, which include open,
high, low, and close prices, effectively addressing this trade-off. By
exploiting the information contained in candlesticks, the proposed method
improves estimation accuracy relative to benchmarks while preserving local
structure. I further develop a test for spot covariance inference based on
candlesticks that demonstrates reasonable size control and a notable increase
in power, particularly in small samples. Motivated by recent work in the
finance literature, I empirically test the market neutrality of the iShares
Bitcoin Trust ETF (IBIT) using 1-minute candlestick data for the full year of
2024. The results show systematic deviations from market neutrality, especially
in periods of market stress. An event study around FOMC announcements further
illustrates the new method's ability to detect subtle shifts in response to
relatively mild information events.
arXiv link: http://arxiv.org/abs/2510.12911v1
Nonparametric Identification and Estimation of Spatial Treatment Effect Boundaries: Evidence from 42 Million Pollution Observations
spatial boundaries of treatment effects in settings with geographic spillovers.
While atmospheric dispersion theory predicts exponential decay of pollution
under idealized assumptions, these assumptions -- steady winds, homogeneous
atmospheres, flat terrain -- are systematically violated in practice. I
establish nonparametric identification of spatial boundaries under weak
smoothness and monotonicity conditions, propose a kernel-based estimator with
data-driven bandwidth selection, and derive asymptotic theory for inference.
Using 42 million satellite observations of NO$_2$ concentrations near coal
plants (2019-2021), I find that nonparametric kernel regression reduces
prediction errors by 1.0 percentage point on average compared to parametric
exponential decay assumptions, with largest improvements at policy-relevant
distances: 2.8 percentage points at 10 km (near-source impacts) and 3.7
percentage points at 100 km (long-range transport). Parametric methods
systematically underestimate near-source concentrations while overestimating
long-range decay. The COVID-19 pandemic provides a natural experiment
validating the framework's temporal sensitivity: NO$_2$ concentrations dropped
4.6% in 2020, then recovered 5.7% in 2021. These results demonstrate that
flexible, data-driven spatial methods substantially outperform restrictive
parametric assumptions in environmental policy applications.
arXiv link: http://arxiv.org/abs/2510.12289v2
Nonparametric Identification and Estimation of Spatial Treatment Effect Boundaries: Evidence from 42 Million Pollution Observations
spatial boundaries of treatment effects in settings with geographic spillovers.
While atmospheric dispersion theory predicts exponential decay of pollution
under idealized assumptions, these assumptions -- steady winds, homogeneous
atmospheres, flat terrain -- are systematically violated in practice. I
establish nonparametric identification of spatial boundaries under weak
smoothness and monotonicity conditions, propose a kernel-based estimator with
data-driven bandwidth selection, and derive asymptotic theory for inference.
Using 42 million satellite observations of NO$_2$ concentrations near coal
plants (2019-2021), I find that nonparametric kernel regression reduces
prediction errors by 1.0 percentage point on average compared to parametric
exponential decay assumptions, with largest improvements at policy-relevant
distances: 2.8 percentage points at 10 km (near-source impacts) and 3.7
percentage points at 100 km (long-range transport). Parametric methods
systematically underestimate near-source concentrations while overestimating
long-range decay. The COVID-19 pandemic provides a natural experiment
validating the framework's temporal sensitivity: NO$_2$ concentrations dropped
4.6% in 2020, then recovered 5.7% in 2021. These results demonstrate that
flexible, data-driven spatial methods substantially outperform restrictive
parametric assumptions in environmental policy applications.
arXiv link: http://arxiv.org/abs/2510.12289v1
Optimal break tests for large linear time series models
unknown date in infinite and growing-order time series regression models, such
as AR($\infty$), linear regression with increasingly many covariates, and
nonparametric regression. Under an auxiliary i.i.d. Gaussian error assumption,
we derive an average power optimal test, establishing a growing-dimensional
analog of the exponential tests of Andrews and Ploberger (1994) to handle
identification failure under the null hypothesis of no break. Relaxing the
i.i.d. Gaussian assumption to a more general dependence structure, we establish
a functional central limit theorem for the underlying stochastic processes,
which features an extra high-order serial dependence term due to the growing
dimension. We robustify our test both against this term and finite sample bias
and illustrate its excellent performance and practical relevance in a Monte
Carlo study and a real data empirical example.
arXiv link: http://arxiv.org/abs/2510.12262v1
L2-relaxation for Economic Prediction
the sample size, for economic prediction. An underlying latent factor structure
implies a dense regression model with highly correlated covariates. We propose
the L2-relaxation method for estimating the regression coefficients and
extrapolating the out-of-sample (OOS) outcomes. This framework can be applied
to policy evaluation using the panel data approach (PDA), where we further
establish inference for the average treatment effect. In addition, we extend
the traditional single unit setting in PDA to allow for many treated units with
a short post-treatment period. Monte Carlo simulations demonstrate that our
approach exhibits excellent finite sample performance for both OOS prediction
and policy evaluation. We illustrate our method with two empirical examples:
(i) predicting China's producer price index growth rate and evaluating the
effect of real estate regulations, and (ii) estimating the impact of Brexit on
the stock returns of British and European companies.
arXiv link: http://arxiv.org/abs/2510.12183v1
Estimating Variances for Causal Panel Data Estimators
a recent surge in research on panel data models with a number of new estimators
proposed. However, there has been less attention paid to the quantification of
the precision of these estimators. Of the variance estimators that have been
proposed, their relative merits are not well understood. In this paper we
develop a common framework for comparing some of the proposed variance
estimators for generic point estimators. We reinterpret three commonly used
approaches as targeting different conditional variances under an
exchangeability assumption. We find that the estimators we consider are all
valid on average, but that their performance in terms of power differs
substantially depending on the heteroskedasticity structure of the data.
Building on these insights, we propose a new variance estimator that flexibly
accounts for heteroskedasticity in both the unit and time dimensions, and
delivers superior statistical power in realistic panel data settings.
arXiv link: http://arxiv.org/abs/2510.11841v1
Compositional difference-in-differences for categorical outcomes
treatment effects often operate on both total quantities (e.g., voter turnout)
and category shares (e.g., vote distribution across parties). In this context,
linear DiD models can be problematic: they suffer from scale dependence, may
produce negative counterfactual quantities, and are inconsistent with discrete
choice theory. We propose compositional DiD (CoDiD), a new method that
identifies counterfactual categorical quantities, and thus total levels and
shares, under a parallel growths assumption. The assumption states that, absent
treatment, each category's size grows or shrinks at the same proportional rate
in treated and control groups. In a random utility framework, we show that this
implies parallel evolution of relative preferences between any pair of
categories. Analytically, we show that it also means the shares are reallocated
in the same way in both groups in the absence of treatment. Finally,
geometrically, it corresponds to parallel trajectories (or movements) of
probability mass functions of the two groups in the probability simplex under
Aitchison geometry. We extend CoDiD to i) derive bounds under relaxed
assumptions, ii) handle staggered adoption, and iii) propose a synthetic DiD
analog. We illustrate the method's empirical relevance through two
applications: first, we examine how early voting reforms affect voter choice in
U.S. presidential elections; second, we analyze how the Regional Greenhouse Gas
Initiative (RGGI) affected the composition of electricity generation across
sources such as coal, natural gas, nuclear, and renewables.
arXiv link: http://arxiv.org/abs/2510.11659v1
A mathematical model for pricing perishable goods for quick-commerce applications
It provides informal employment to approximately 4,50,000 workers, and it is
estimated to become a USD 200 Billion industry by 2026. A significant portion
of this industry deals with perishable goods. (e.g. milk, dosa batter etc.)
These are food items which are consumed relatively fresh by the consumers and
therefore their order volume is high and repetitive even when the average
basket size is relatively small. The fundamental challenge for the retailer is
that, increasing selling price would hamper sales and would lead to unsold
inventory. On the other hand setting a price less, would lead to forgoing of
potential revenue. This paper attempts to propose a mathematical model which
formalizes this dilemma. The problem statement is not only important for
improving the unit economics of the perennially loss making quick commerce
firms, but also would lead to a trickle-down effect in improving the conditions
of the gig workers as observed in [4]. The sections below describe the
mathematical formulation. The results from the simulation would be published in
a follow-up study.
arXiv link: http://arxiv.org/abs/2510.11360v1
Superstars or Super-Villains? Productivity Spillovers and Firm Dynamics in Indonesia
relationship between the spillovers of superstar firms (those with the top
market share in their industry) and the productivity dynamics in Indonesia.
Employing data on Indonesian manufacturing firms from 2001 to 2015, we find
that superstar exposures in the market raise both the productivity level and
the growth of non-superstar firms through horizontal (within a sector-province)
and vertical (across sectors) channels. When we distinguish by ownership,
foreign superstars consistently encourage productivity except through the
horizontal channel. In contrast, domestic superstars generate positive
spillovers through both horizontal and vertical linkages, indicating that
foreign firms do not solely drive positive externalities. Furthermore, despite
overall productivity growth being positive in 2001-2015, the source of negative
growth is mainly driven by within-group reallocation, evidence of misallocation
among surviving firms, notably by domestic superstars. Although Indonesian
superstar firms are more efficient in their operations, their relatively modest
growth rates suggest a potential stagnation, which can be plausibly attributed
to limited innovation activity or a slow pace of adopting new technologies.
arXiv link: http://arxiv.org/abs/2510.11139v1
Spatial and Temporal Boundaries in Difference-in-Differences: A Framework from Navier-Stokes Equation
boundaries of treatment effects in difference-in-differences designs. Starting
from fundamental fluid dynamics equations (Navier-Stokes), we derive conditions
under which treatment effects decay exponentially in space and time, enabling
researchers to calculate explicit boundaries beyond which effects become
undetectable. The framework encompasses both linear (pure diffusion) and
nonlinear (advection-diffusion with chemical reactions) regimes, with testable
scope conditions based on dimensionless numbers from physics (P\'eclet and
Reynolds numbers). We demonstrate the framework's diagnostic capability using
air pollution from coal-fired power plants. Analyzing 791 ground-based
PM$_{2.5}$ monitors and 189,564 satellite-based NO$_2$ grid cells in the
Western United States over 2019-2021, we find striking regional heterogeneity:
within 100 km of coal plants, both pollutants show positive spatial decay
(PM$_{2.5}$: $\kappa_s = 0.00200$, $d^* = 1,153$ km; NO$_2$: $\kappa_s =
0.00112$, $d^* = 2,062$ km), validating the framework. Beyond 100 km, negative
decay parameters correctly signal that urban sources dominate and diffusion
assumptions fail. Ground-level PM$_{2.5}$ decays approximately twice as fast as
satellite column NO$_2$, consistent with atmospheric transport physics. The
framework successfully diagnoses its own validity in four of eight analyzed
regions, providing researchers with physics-based tools to assess whether their
spatial difference-in-differences setting satisfies diffusion assumptions
before applying the estimator. Our results demonstrate that rigorous boundary
detection requires both theoretical derivation from first principles and
empirical validation of underlying physical assumptions.
arXiv link: http://arxiv.org/abs/2510.11013v1
Macroeconomic Forecasting and Machine Learning
systematically integrating three key principles: using high-dimensional data
with appropriate regularization, adopting rigorous out-of-sample validation
procedures, and incorporating nonlinearities. By exploiting the rich
information embedded in a large set of macroeconomic and financial predictors,
we produce accurate predictions of the entire profile of macroeconomic risk in
real time. Our findings show that regularization via shrinkage is essential to
control model complexity, while introducing nonlinearities yields limited
improvements in predictive accuracy. Out-of-sample validation plays a critical
role in selecting model architecture and preventing overfitting.
arXiv link: http://arxiv.org/abs/2510.11008v1
Identifying treatment effects on categorical outcomes in IV models
categorical outcomes under binary treatment and binary instrument settings. We
decompose the observed joint probability of outcomes and treatment into
marginal probabilities of potential outcomes and treatment, and association
parameters that capture selection bias due to unobserved heterogeneity. Under a
novel identifying assumption, association similarity, which requires the
dependence between unobserved factors and potential outcomes to be invariant
across treatment states, we achieve point identification of the full
distribution of potential outcomes. Recognizing that this assumption may be
strong in some contexts, we propose two weaker alternatives: monotonic
association, which restricts the direction of selection heterogeneity, and
bounded association, which constrains its magnitude. These relaxed assumptions
deliver sharp partial identification bounds that nest point identification as a
special case and facilitate transparent sensitivity analysis. We illustrate the
framework in an empirical application, estimating the causal effect of private
health insurance on health outcomes.
arXiv link: http://arxiv.org/abs/2510.10946v1
Denoised IPW-Lasso for Heterogeneous Treatment Effect Estimation in Randomized Experiments
effects (CATE) in randomized experiments. We adopt inverse probability
weighting (IPW) for identification; however, IPW-transformed outcomes are known
to be noisy, even when true propensity scores are used. To address this issue,
we introduce a noise reduction procedure and estimate a linear CATE model using
Lasso, achieving both accuracy and interpretability. We theoretically show that
denoising reduces the prediction error of the Lasso. The method is particularly
effective when treatment effects are small relative to the variability of
outcomes, which is often the case in empirical applications. Applications to
the Get-Out-the-Vote dataset and Criteo Uplift Modeling dataset demonstrate
that our method outperforms fully nonparametric machine learning methods in
identifying individuals with higher treatment effects. Moreover, our method
uncovers informative heterogeneity patterns that are consistent with previous
empirical findings.
arXiv link: http://arxiv.org/abs/2510.10527v1
Ranking Policies Under Loss Aversion and Inequality Aversion
population surveys, shows that individuals, when evaluating their situations,
pay attention to whether they experience gains or losses, with losses weighing
more heavily than gains. The electorate's loss aversion, in turn, influences
politicians' choices. We propose a new framework for welfare analysis of policy
outcomes that, in addition to the traditional focus on post-policy incomes,
also accounts for individuals' gains and losses resulting from policies. We
develop several bivariate stochastic dominance criteria for ranking policy
outcomes that are sensitive to features of the joint distribution of
individuals' income changes and absolute incomes. The main social objective
assumes that individuals are loss averse with respect to income gains and
losses, inequality averse with respect to absolute incomes, and hold varying
preferences regarding the association between incomes and income changes. We
translate these and other preferences into functional inequalities that can be
tested using sample data. The concepts and methods are illustrated using data
from an income support experiment conducted in Connecticut.
arXiv link: http://arxiv.org/abs/2510.09590v1
Boundary estimation in the regression-discontinuity design: Evidence for a merit- and need-based financial aid program
that units receive a treatment changes discontinuously as a function of one
covariate exceeding a threshold or cutoff point. This paper studies an extended
RD design where assignment rules simultaneously involve two or more continuous
covariates. We show that assignment rules with more than one variable allow the
estimation of a more comprehensive set of treatment effects, relaxing in a
research-driven style the local and sometimes limiting nature of univariate RD
designs. We then propose a flexible nonparametric approach to estimate the
multidimensional discontinuity by univariate local linear regression and
compare its performance to existing methods. We present an empirical
application to a large-scale and countrywide financial aid program for
low-income students in Colombia. The program uses a merit-based (academic
achievement) and need-based (wealth index) assignment rule to select students
for the program. We show that our estimation strategy fully exploits the
multidimensional assignment rule and reveals heterogeneous effects along the
treatment boundaries.
arXiv link: http://arxiv.org/abs/2510.09257v1
Flexibility without foresight: the predictive limitations of mixture models
class, are generally observed to obtain superior model fit and yield detailed
insights into unobserved preference heterogeneity. Using theoretical arguments
and two case studies on revealed and stated choice data, this paper highlights
that these advantages do not translate into any benefits in forecasting,
whether looking at prediction performance or the recovery of market shares. The
only exception arises when using conditional distributions in making
predictions for the same individuals included in the estimation sample, which
obviously precludes any out-of-sample forecasting.
arXiv link: http://arxiv.org/abs/2510.09185v1
Sensitivity Analysis for Causal ML: A Use Case at Booking.com
estimating causal effects from observational data in both industry and
academia. However, causal inference from observational data relies on
untestable assumptions about the data-generating process, such as the absence
of unobserved confounders. When these assumptions are violated, causal effect
estimates may become biased, undermining the validity of research findings. In
these contexts, sensitivity analysis plays a crucial role, by enabling data
scientists to assess the robustness of their findings to plausible violations
of unconfoundedness. This paper introduces sensitivity analysis and
demonstrates its practical relevance through a (simulated) data example based
on a use case at Booking.com. We focus our presentation on a recently proposed
method by Chernozhukov et al. (2023), which derives general non-parametric
bounds on biases due to omitted variables, and is fully compatible with (though
not limited to) modern inferential tools of Causal Machine Learning. By
presenting this use case, we aim to raise awareness of sensitivity analysis and
highlight its importance in real-world scenarios.
arXiv link: http://arxiv.org/abs/2510.09109v1
Sensitivity Analysis for Treatment Effects in Difference-in-Differences Models using Riesz Representation
empirical research in economics, political science, and beyond. Identification
in these models is based on the conditional parallel trends assumption: In the
absence of treatment, the average outcome of the treated and untreated group
are assumed to evolve in parallel over time, conditional on pre-treatment
covariates. We introduce a novel approach to sensitivity analysis for DiD
models that assesses the robustness of DiD estimates to violations of this
assumption due to unobservable confounders, allowing researchers to
transparently assess and communicate the credibility of their causal estimation
results. Our method focuses on estimation by Double Machine Learning and
extends previous work on sensitivity analysis based on Riesz Representation in
cross-sectional settings. We establish asymptotic bounds for point estimates
and confidence intervals in the canonical $2\times2$ setting and group-time
causal parameters in settings with staggered treatment adoption. Our approach
makes it possible to relate the formulation of parallel trends violation to
empirical evidence from (1) pre-testing, (2) covariate benchmarking and (3)
standard reporting statistics and visualizations. We provide extensive
simulation experiments demonstrating the validity of our sensitivity approach
and diagnostics and apply our approach to two empirical applications.
arXiv link: http://arxiv.org/abs/2510.09064v1
Blackwell without Priors
decision maker observes the entire distribution of signals generated by a known
experiment under an unknown distribution of the state of the world. One
experiment is robustly more informative than another if the decision maker's
maxmin expected utility after observing the output of the former is always at
least her maxmin expected utility after observing the latter. We show that this
ranking holds if and only if the less informative experiment is a linear
transformation of the more informative experiment; equivalently, the null space
of the more informative experiment is a subset of the null space of the less
informative experiment. Our criterion is implied by Blackwell's order but does
not imply it, and we show by example that our ranking admits strictly more
comparable pairs of experiments than the classical ranking.
arXiv link: http://arxiv.org/abs/2510.08709v2
Stochastic Volatility-in-mean VARs with Time-Varying Skewness
volatility-in-mean and time-varying skewness. Unlike previous approaches, the
proposed model allows both volatility and skewness to directly affect
macroeconomic variables. We provide a Gibbs sampling algorithm for posterior
inference and apply the model to quarterly data for the US and the UK.
Empirical results show that skewness shocks have economically significant
effects on output, inflation and spreads, often exceeding the impact of
volatility shocks. In a pseudo-real-time forecasting exercise, the proposed
model outperforms existing alternatives in many cases. Moreover, the model
produces sharper measures of tail risk, revealing that standard stochastic
volatility models tend to overstate uncertainty. These findings highlight the
importance of incorporating time-varying skewness for capturing macro-financial
risks and improving forecast performance.
arXiv link: http://arxiv.org/abs/2510.08415v1
Beyond the Oracle Property: Adaptive LASSO in Cointegrating Regressions
estimator in cointegrating regression models. We study model selection
probabilities, estimator consistency, and limiting distributions under both
standard and moving-parameter asymptotics. We also derive uniform convergence
rates and the fastest local-to-zero rates that can still be detected by the
estimator, complementing and extending the results of Lee, Shi, and Gao (2022,
Journal of Econometrics, 229, 322--349). Our main findings include that under
conservative tuning, the adaptive LASSO estimator is uniformly $T$-consistent
and the cut-off rate for local-to-zero coefficients that can be detected by the
procedure is $1/T$. Under consistent tuning, however, both rates are slower and
depend on the tuning parameter. The theoretical results are complemented by a
detailed simulation study showing that the finite-sample distribution of the
adaptive LASSO estimator deviates substantially from what is suggested by the
oracle property, whereas the limiting distributions derived under
moving-parameter asymptotics provide much more accurate approximations.
Finally, we show that our results also extend to models with local-to-unit-root
regressors and to predictive regressions with unit-root predictors.
arXiv link: http://arxiv.org/abs/2510.07204v1
Bayesian Portfolio Optimization by Predictive Synthesis
portfolio optimization methods require information on the distribution of
returns of the assets that make up the portfolio. However, such distribution
information is usually unknown to investors. Various methods have been proposed
to estimate distribution information, but their accuracy greatly depends on the
uncertainty of the financial markets. Due to this uncertainty, a model that
could well predict the distribution information at one point in time may
perform less accurately compared to another model at a different time. To solve
this problem, we investigate a method for portfolio optimization based on
Bayesian predictive synthesis (BPS), one of the Bayesian ensemble methods for
meta-learning. We assume that investors have access to multiple asset return
prediction models. By using BPS with dynamic linear models to combine these
predictions, we can obtain a Bayesian predictive posterior about the mean
rewards of assets that accommodate the uncertainty of the financial markets. In
this study, we examine how to construct mean-variance portfolios and
quantile-based portfolios based on the predicted distribution information.
arXiv link: http://arxiv.org/abs/2510.07180v1
Robust Inference for Convex Pairwise Difference Estimators
for a broad class of convex pairwise difference estimators. These estimators
minimize a kernel-weighted convex-in-parameter function over observation pairs
that are similar in terms of certain covariates, where the similarity is
governed by a localization (bandwidth) parameter. While classical results
establish asymptotic normality under restrictive bandwidth conditions, we show
that valid Gaussian and bootstrap-based inference remains possible under
substantially weaker assumptions. First, we extend the theory of small
bandwidth asymptotics to convex pairwise estimation settings, deriving robust
Gaussian approximations even when a smaller than standard bandwidth is used.
Second, we employ a debiasing procedure based on generalized jackknifing to
enable inference with larger bandwidths, while preserving convexity of the
objective function. Third, we construct a novel bootstrap method that adjusts
for bandwidth-induced variance distortions, yielding valid inference across a
wide range of bandwidth choices. Our proposed inference method enjoys
demonstrable more robustness, while retaining the practical appeal of convex
pairwise difference estimators.
arXiv link: http://arxiv.org/abs/2510.05991v1
Assessing the Effects of Monetary Shocks on Macroeconomic Stars: A SMUC-IV Framework
with external instrument (SMUC-IV) to investigate the effects of monetary
policy shocks on key U.S. macroeconomic "stars"-namely, the level of potential
output, the growth rate of potential output, trend inflation, and the neutral
interest rate. A key feature of our approach is the use of an external
instrument to identify monetary policy shocks within the multivariate unob-
served components modeling framework. We develop an MCMC estimation method to
facilitate posterior inference within our proposed SMUC-IV frame- work. In
addition, we propose an marginal likelihood estimator to enable model
comparison across alternative specifications. Our empirical analysis shows that
contractionary monetary policy shocks have significant negative effects on the
macroeconomic stars, highlighting the nonzero long-run effects of transitory
monetary policy shocks.
arXiv link: http://arxiv.org/abs/2510.05802v1
Correcting sample selection bias with categorical outcomes
the outcome of interest is categorical, such as occupational choice, health
status, or field of study. Classical approaches to sample selection rely on
strong parametric distributional assumptions, which may be restrictive in
practice. While the recent framework of Chernozhukov et al. (2023) offers a
nonparametric identification using a local Gaussian representation (LGR) that
holds for any bivariate joint distributions. This makes this approach limited
to ordered discrete outcomes. We therefore extend it by developing a local
representation that applies to joint probabilities, thereby eliminating the
need to impose an artificial ordering on categories. Our representation
decomposes each joint probability into marginal probabilities and a
category-specific association parameter that captures how selection
differentially affects each outcome. Under exclusion restrictions analogous to
those in the LGR model, we establish nonparametric point identification of the
latent categorical distribution. Building on this identification result, we
introduce a semiparametric multinomial logit model with sample selection,
propose a computationally tractable two-step estimator, and derive its
asymptotic properties. This framework significantly broadens the set of tools
available for analyzing selection in categorical and other discrete outcomes,
offering substantial relevance for empirical work across economics, health
sciences, and social sciences.
arXiv link: http://arxiv.org/abs/2510.05551v1
Can language models boost the power of randomized experiments without statistical bias?
standards for causal inference, yet cost and sample-size constraints limit
power. Meanwhile, modern RCTs routinely collect rich, unstructured data that
are highly prognostic of outcomes but rarely used in causal analyses. We
introduce CALM (Causal Analysis leveraging Language Models), a statistical
framework that integrates large language models (LLMs) predictions with
established causal estimators to increase precision while preserving
statistical validity. CALM treats LLM outputs as auxiliary prognostic
information and corrects their potential bias via a heterogeneous calibration
step that residualizes and optimally reweights predictions. We prove that CALM
remains consistent even when LLM predictions are biased and achieves efficiency
gains over augmented inverse probability weighting estimators for various
causal effects. In particular, CALM develops a few-shot variant that aggregates
predictions across randomly sampled demonstration sets. The resulting
U-statistic-like predictor restores i.i.d. structure and also mitigates
prompt-selection variability. Empirically, in simulations calibrated to a
mobile-app depression RCT, CALM delivers lower variance relative to other
benchmarking methods, is effective in zero- and few-shot settings, and remains
stable across prompt designs. By principled use of LLMs to harness unstructured
data and external knowledge learned during pretraining, CALM provides a
practical path to more precise causal analyses in RCTs.
arXiv link: http://arxiv.org/abs/2510.05545v1
Estimating Treatment Effects Under Bounded Heterogeneity
treatment effect under the assumption of constant effects. When treatment
effects are heterogeneous, however, such specifications generally fail to
recover this average effect. Augmenting these specifications with interaction
terms between demeaned covariates and treatment eliminates this bias, but often
leads to imprecise estimates and becomes infeasible under limited overlap. We
propose a generalized ridge regression estimator, $regulaTE$, that
penalizes the coefficients on the interaction terms to achieve an optimal
trade-off between worst-case bias and variance in estimating the average effect
under limited treatment effect heterogeneity. Building on this estimator, we
construct confidence intervals that remain valid under limited overlap and can
also be used to assess sensitivity to violations of the constant effects
assumption. We illustrate the method in empirical applications under
unconfoundedness and staggered adoption, providing a practical approach to
inference under limited overlap.
arXiv link: http://arxiv.org/abs/2510.05454v1
Risk-Adjusted Policy Learning and the Social Cost of Uncertainty: Theory and Evidence from CAP evaluation
learning (OPL) for observational data by importing Roy's (1952) safety-first
principle into the treatment assignment problem. We formalize a welfare
functional that maximizes the probability that outcomes exceed a socially
required threshold and show that the associated pointwise optimal rule ranks
treatments by the ratio of conditional means to conditional standard
deviations. We implement the framework using microdata from the Italian Farm
Accountancy Data Network to evaluate the allocation of subsidies under the EU
Common Agricultural Policy. Empirically, risk-adjusted optimal policies
systematically dominate the realized allocation across specifications, while
risk aversion lowers overall welfare relative to the risk-neutral benchmark,
making transparent the social cost of insurance against uncertainty. The
results illustrate how safety-first OPL provides an implementable,
interpretable tool for risk-sensitive policy design, quantifying the
efficiency-insurance trade-off that policymakers face when outcomes are
volatile.
arXiv link: http://arxiv.org/abs/2510.05007v1
Identification in Auctions with Truncated Transaction Prices
auctions when transaction prices are truncated by a binding reserve price under
a range of information structures. When the number of potential bidders is
fixed and known across all auctions, if only the transaction price is observed,
the bidders' private-value distribution is identified in second-price auctions
but not in first-price auctions. Identification in first-price auctions can be
achieved if either the number of active bidders or the number of auctions with
no sales is observed. When the number of potential bidders varies across
auctions and is unknown, the bidders' private-value distribution is identified
in first-price auctions but not in second-price auctions, provided that both
the transaction price and the number of active bidders are observed. I derive
analogous results to auctions with entry costs, which face a similar truncation
issue when data on potential bidders who do not enter are missing.
arXiv link: http://arxiv.org/abs/2510.04464v2
Forecasting Inflation Based on Hybrid Integration of the Riemann Zeta Function and the FPAS Model (FPAS + $ζ$): Cyclical Flexibility, Socio-Economic Challenges and Shocks, and Comparative Analysis of Models
macroeconomic modeling, especially when cyclical, structural, and shock factors
act simultaneously. Traditional systems such as FPAS and ARIMA often struggle
with cyclical asymmetry and unexpected fluctuations. This study proposes a
hybrid framework (FPAS + $\zeta$) that integrates a structural macro model
(FPAS) with cyclical components derived from the Riemann zeta function
$\zeta(1/2 + i t)$. Using Georgia's macro data (2005-2024), a nonlinear
argument $t$ is constructed from core variables (e.g., GDP, M3, policy rate),
and the hybrid forecast is calibrated by minimizing RMSE via a modulation
coefficient $\alpha$. Fourier-based spectral analysis and a Hidden Markov Model
(HMM) are employed for cycle/phase identification, and a multi-criteria
AHP-TOPSIS scheme compares FPAS, FPAS + $\zeta$, and ARIMA. Results show lower
RMSE and superior cyclical responsiveness for FPAS + $\zeta$, along with
early-warning capability for shocks and regime shifts, indicating practical
value for policy institutions.
arXiv link: http://arxiv.org/abs/2510.02966v1
Repeated Matching Games: An Empirical Framework
the static model of Shapley and Shubik (1971). Forward-looking agents have
individual states that evolve with current matches. Each period, a matching
market with market-clearing prices takes place. We prove the existence of an
equilibrium with time-varying distributions of agent types and show it is the
solution to a social planner's problem. We also prove that a stationary
equilibrium exists. We introduce econometric shocks to account for unobserved
heterogeneity in match formation. We propose two algorithms to compute a
stationary equilibrium. We adapt both algorithms for estimation. We estimate a
model of accumulation of job-specific human capital using data on Swedish
engineers.
arXiv link: http://arxiv.org/abs/2510.02737v1
"Post" Pre-Analysis Plans: Valid Inference for Non-Preregistered Specifications
research, but it is nevertheless common to see researchers deviating from their
PAPs to supplement preregistered estimates with non-prespecified findings.
While such ex-post analysis can yield valuable insights, there is broad
uncertainty over how to interpret -- or whether to even acknowledge --
non-preregistered results. In this paper, we consider the case of a
truth-seeking researcher who, after seeing the data, earnestly wishes to report
additional estimates alongside those preregistered in their PAP. We show that,
even absent "nefarious" behavior, conventional confidence intervals and point
estimators are invalid due to the fact that non-preregistered estimates are
only reported in a subset of potential data realizations. We propose inference
procedures that account for this conditional reporting. We apply these
procedures to Bessone et al. (2021), which studies the economic effects of
increased sleep among the urban poor. We demonstrate that, depending on the
reason for deviating, the adjustments from our procedures can range from having
no difference to an economically significant difference relative to
conventional practice. Finally, we consider the robustness of our procedure to
certain forms of misspecification, motivating possible heuristic checks and
norms for journals to adopt.
arXiv link: http://arxiv.org/abs/2510.02507v1
Cautions on Tail Index Regressions
the usual full-rank condition can fail because conditioning on extreme outcomes
causes regressors to degenerate to constants. More generally, the conditional
distribution of the covariates in the tails concentrates on the values at which
the tail index is minimized. Away from those points, the conditional density
tends to zero. For local nonparametric tail index regression, the convergence
rate can be very slow. We conclude with practical suggestions for applied work.
arXiv link: http://arxiv.org/abs/2510.01535v1
Generalized Bayes in Conditional Moment Restriction Models
moment restriction models, where the parameter of interest is a nonparametric
structural function of endogenous variables. We establish contraction rates for
a class of Gaussian process priors and provide conditions under which a
Bernstein-von Mises theorem holds for the quasi-Bayes posterior. Consequently,
we show that optimally weighted quasi-Bayes credible sets achieve exact
asymptotic frequentist coverage, extending classical results for parametric GMM
models. As an application, we estimate firm-level production functions using
Chilean plant-level data. Simulations illustrate the favorable performance of
generalized Bayes estimators relative to common alternatives.
arXiv link: http://arxiv.org/abs/2510.01036v1
An alternative bootstrap procedure for factor-augmented regression models
than existing methods for approximating the distribution of the
factor-augmented regression estimator for a rotated parameter vector. The
regression is augmented by $r$ factors extracted from a large panel of $N$
variables observed over $T$ time periods. We consider general weak factor (WF)
models with $r$ signal eigenvalues that may diverge at different rates,
$N^{\alpha _{k}}$, where $0<\alpha _{k}\leq 1$ for $k=1,2,...,r$. We establish
the asymptotic validity of our bootstrap method using not only the conventional
data-dependent rotation matrix $\bH$, but also an alternative
data-dependent rotation matrix, $\bH_q$, which typically exhibits smaller
asymptotic bias and achieves a faster convergence rate. Furthermore, we
demonstrate the asymptotic validity of the bootstrap under a purely
signal-dependent rotation matrix ${\bH}$, which is unique and can be regarded
as the population analogue of both $\bH$ and $\bH_q$. Experimental
results provide compelling evidence that the proposed bootstrap procedure
achieves superior performance relative to the existing procedure.
arXiv link: http://arxiv.org/abs/2510.00947v1
A Unified Framework for Spatial and Temporal Treatment Effect Boundaries: Theory and Identification
estimating boundaries in treatment effects across both spatial and temporal
dimensions. We formalize the concept of treatment effect boundaries as
structural parameters characterizing regime transitions where causal effects
cease to operate. Building on reaction-diffusion models of information
propagation, we establish conditions under which spatial and temporal
boundaries share common dynamics governed by diffusion parameters (delta,
lambda), yielding the testable prediction d^*/tau^* = 3.32 lambda sqrt{delta}
for standard detection thresholds. We derive formal identification results
under staggered treatment adoption and develop a three-stage estimation
procedure implementable with standard panel data. Monte Carlo simulations
demonstrate excellent finite-sample performance, with boundary estimates
achieving RMSE below 10% in realistic configurations. We apply the framework to
two empirical settings: EU broadband diffusion (2006-2021) and US wildfire
economic impacts (2017-2022). The broadband application reveals a scope
limitation -- our framework assumes depreciation dynamics and fails when
effects exhibit increasing returns through network externalities. The wildfire
application provides strong validation: estimated boundaries satisfy d^* = 198
km and tau^* = 2.7 years, with the empirical ratio (72.5) exactly matching the
theoretical prediction 3.32 lambda sqrt{delta} = 72.5. The framework provides
practical tools for detecting when localized treatments become systemic and
identifying critical thresholds for policy intervention.
arXiv link: http://arxiv.org/abs/2510.00754v2
Persuasion Effects in Regression Discontinuity Designs
regression discontinuity (RD) designs. The RD persuasion rate measures the
probability that individuals at the threshold would take the action if exposed
to a persuasive message, given that they would not take the action without
exposure. We present identification results for both sharp and fuzzy RD
designs, derive sharp bounds under various data scenarios, and extend the
analysis to local compliers. Estimation and inference rely on local polynomial
regression, enabling straightforward implementation with standard RD tools.
Applications to public health and media illustrate its empirical relevance.
arXiv link: http://arxiv.org/abs/2509.26517v1
Triadic Network Formation
fixed effects in a nonlinear binary choice logit framework. Dyad-level effects
provide a richer and more realistic representation of heterogeneity across
pairs of dimensions (e.g. importer-exporter, importer-product,
exporter-product), yet their sheer number creates a severe incidental parameter
problem. We propose a novel “hexad logit” estimator and establish its
consistency and asymptotic normality. Identification is achieved through a
conditional likelihood approach that eliminates the fixed effects by
conditioning on sufficient statistics, in the form of hexads -- wirings that
involve two nodes from each part of the network. Our central finding is that
dyad-level heterogeneity fundamentally changes how information accumulates.
Unlike under node-level heterogeneity, where informative wirings automatically
grow with link formation, under dyad-level heterogeneity the network may
generate infinitely many links yet asymptotically zero informative wirings. We
derive explicit sparsity thresholds that determine when consistency holds and
when asymptotic normality is attainable. These results have important practical
implications, as they reveal that there is a limit to how granular or
disaggregate a dataset one can employ under dyad-level heterogeneity.
arXiv link: http://arxiv.org/abs/2509.26420v1
Joint Inference for the Regression Discontinuity Effect and Its External Validity
for informing policy and remains an active research area in econometrics and
statistics. However, we document that only a limited number of empirical
studies explicitly address the external validity of standard RD effects. To
advance empirical practice, we propose a simple joint inference procedure for
the RD effect and its local external validity, building on Calonico, Cattaneo,
and Titiunik (2014, Econometrica) and Dong and Lewbel (2015, Review of
Economics and Statistics). We further introduce a locally linear treatment
effects assumption, which enhances the interpretability of the treatment effect
derivative proposed by Dong and Lewbel. Under this assumption, we establish
identification and derive a uniform confidence band for the extrapolated
treatment effects. Our approaches require no additional covariates or design
features, making them applicable to virtually all RD settings and thereby
enhancing the policy relevance of many empirical RD studies. The usefulness of
the method is demonstrated through an empirical application, highlighting its
complementarity to existing approaches.
arXiv link: http://arxiv.org/abs/2509.26380v1
Leveraging LLMs to Improve Experimental Design: A Generative Stratification Approach
for designing more efficient experiments and increasing the precision of the
experimental estimates. However, when researchers have access to many
covariates at the experiment design stage, they often face challenges in
effectively selecting or weighting covariates when creating their strata. This
paper proposes a Generative Stratification procedure that leverages Large
Language Models (LLMs) to synthesize high-dimensional covariate data to improve
experimental design. We demonstrate the value of this approach by applying it
to a set of experiments and find that our method would have reduced the
variance of the treatment effect estimate by 10%-50% compared to simple
randomization in our empirical applications. When combined with other standard
stratification methods, it can be used to further improve the efficiency. Our
results demonstrate that LLM-based simulation is a practical and
easy-to-implement way to improve experimental design in covariate-rich
settings.
arXiv link: http://arxiv.org/abs/2509.25709v1
Efficient Difference-in-Differences Estimation when Outcomes are Missing at Random
inference, yet its application is often complicated by missing data. Although
recent work has developed robust DiD estimators for complex settings like
staggered treatment adoption, these methods typically assume complete data and
fail to address the critical challenge of outcomes that are missing at random
(MAR) -- a common problem that invalidates standard estimators. We develop a
rigorous framework, rooted in semiparametric theory, for identifying and
efficiently estimating the Average Treatment Effect on the Treated (ATT) when
either pre- or post-treatment (or both) outcomes are missing at random. We
first establish nonparametric identification of the ATT under two minimal sets
of sufficient conditions. For each, we derive the semiparametric efficiency
bound, which provides a formal benchmark for asymptotic optimality. We then
propose novel estimators that are asymptotically efficient, achieving this
theoretical bound. A key feature of our estimators is their multiple
robustness, which ensures consistency even if some nuisance function models are
misspecified. We validate the properties of our estimators and showcase their
broad applicability through an extensive simulation study.
arXiv link: http://arxiv.org/abs/2509.25009v1
Nowcasting and aggregation: Why small Euro area countries matter
growth using mixed data sampling machine learning panel data regressions with
both standard macro releases and daily news data. Using a panel of 19 Euro area
countries, we investigate whether directly nowcasting the Euro area aggregate
is better than weighted individual country nowcasts. Our results highlight the
importance of the information from small- and medium-sized countries,
particularly when including the COVID-19 pandemic period. The empirical
analysis is supplemented by studying the so-called Big Four -- France, Germany,
Italy, and Spain -- and the value added of news data when official statistics
are lagging. From a theoretical perspective, we formally show that the
aggregation of individual components forecasted with pooled panel data
regressions is superior to direct aggregate forecasting due to lower estimation
error.
arXiv link: http://arxiv.org/abs/2509.24780v1
Robust Semiparametric Inference for Bayesian Additive Regression Trees
missing-data settings using a corrected posterior distribution. Our approach is
tailored to Bayesian Additive Regression Trees (BART), which is a powerful
predictive method but whose nonsmoothness complicate asymptotic theory with
multi-dimensional covariates. When using BART combined with Bayesian bootstrap
weights, we establish a new Bernstein-von Mises theorem and show that the limit
distribution generally contains a bias term. To address this, we introduce
RoBART, a posterior bias-correction that robustifies BART for valid inference
on the mean response. Monte Carlo studies support our theory, demonstrating
reduced bias and improved coverage relative to existing procedures using BART.
arXiv link: http://arxiv.org/abs/2509.24634v1
Automatic Order, Bandwidth Selection and Flaws of Eigen Adjustment in HAC Estimation
consistent covariance matrix estimator based on the prewhitened kernel
estimator and a localized leave-one-out frequency domain cross-validation
(FDCV). We adapt the cross-validated log likelihood (CVLL) function to
simultaneously select the order of the prewhitening vector autoregression (VAR)
and the bandwidth. The prewhitening VAR is estimated by the Burg method without
eigen adjustment as we find the eigen adjustment rule of Andrews and Monahan
(1992) can be triggered unnecessarily and harmfully when regressors have
nonzero mean. Through Monte Carlo simulations and three empirical examples, we
illustrate the flaws of eigen adjustment and the reliability of our method.
arXiv link: http://arxiv.org/abs/2509.23256v2
Nonparametric and Semiparametric Estimation of Upward Rank Mobility Curves
intergenerational mobility that captures upward movements across the entire
parental income distribution. Our approach extends Bhattacharya and Mazumder
(2011) by conditioning on a single parental income rank, thereby eliminating
aggregation bias. We show that the measure can be characterized solely by the
copula of parent and child income, and we propose a nonparametric copula-based
estimator with better properties than kernel-based alternatives. For a
conditional version of the measure without such a representation, we develop a
two-step semiparametric estimator based on distribution regression and
establish its asymptotic properties. An application to U.S. data reveals that
whites exhibit significant upward mobility dominance over blacks among
lower-middle-income families.
arXiv link: http://arxiv.org/abs/2509.23174v1
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
constraints. Classical IVaR methods (like two-stage least squares regression)
rely on solving moment equations that directly use sensitive covariates and
instruments, creating significant risks of privacy leakage and posing
challenges in designing algorithms that are both statistically efficient and
differentially private. We propose a noisy two-state gradient descent algorithm
that ensures $\rho$-zero-concentrated differential privacy by injecting
carefully calibrated noise into the gradient updates. Our analysis establishes
finite-sample convergence rates for the proposed method, showing that the
algorithm achieves consistency while preserving privacy. In particular, we
derive precise bounds quantifying the trade-off among privacy parameters,
sample size, and iteration-complexity. To the best of our knowledge, this is
the first work to provide both privacy guarantees and provable convergence
rates for instrumental variable regression in linear models. We further
validate our theoretical findings with experiments on both synthetic and real
datasets, demonstrating that our method offers practical accuracy-privacy
trade-offs.
arXiv link: http://arxiv.org/abs/2509.22794v1
Direct Bias-Correction Term Estimation for Propensity Scores and Average Treatment Effect Estimation
For ATE estimation, we estimate the propensity score through direct
bias-correction term estimation. Let $\{(X_i, D_i, Y_i)\}_{i=1}^{n}$ be the
observations, where $X_i \in R^p$ denotes $p$-dimensional covariates,
$D_i \in \{0, 1\}$ denotes a binary treatment assignment indicator, and $Y_i
\in R$ is an outcome. In ATE estimation, the bias-correction term
$h_0(X_i, D_i) = 1[D_i = 1]{e_0(X_i)} - 1[D_i = 0]{1 - e_0(X_i)}$
plays an important role, where $e_0(X_i)$ is the propensity score, the
probability of being assigned treatment $1$. In this study, we propose
estimating $h_0$ (or equivalently the propensity score $e_0$) by directly
minimizing the prediction error of $h_0$. Since the bias-correction term $h_0$
is essential for ATE estimation, this direct approach is expected to improve
estimation accuracy for the ATE. For example, existing studies often employ
maximum likelihood or covariate balancing to estimate $e_0$, but these
approaches may not be optimal for accurately estimating $h_0$ or the ATE. We
present a general framework for this direct bias-correction term estimation
approach from the perspective of Bregman divergence minimization and conduct
simulation studies to evaluate the effectiveness of the proposed method.
arXiv link: http://arxiv.org/abs/2509.22122v1
Inverse Reinforcement Learning Using Just Classification and a Few Regressions
uncovering an underlying reward. In the maximum-entropy or
Gumbel-shocks-to-reward frameworks, this amounts to fitting a reward function
and a soft value function that together satisfy the soft Bellman consistency
condition and maximize the likelihood of observed actions. While this
perspective has had enormous impact in imitation learning for robotics and
understanding dynamic choices in economics, practical learning algorithms often
involve delicate inner-loop optimization, repeated dynamic programming, or
adversarial training, all of which complicate the use of modern, highly
expressive function approximators like neural nets and boosting. We revisit
softmax IRL and show that the population maximum-likelihood solution is
characterized by a linear fixed-point equation involving the behavior policy.
This observation reduces IRL to two off-the-shelf supervised learning problems:
probabilistic classification to estimate the behavior policy, and iterative
regression to solve the fixed point. The resulting method is simple and modular
across function approximation classes and algorithms. We provide a precise
characterization of the optimal solution, a generic oracle-based algorithm,
finite-sample error bounds, and empirical results showing competitive or
superior performance to MaxEnt IRL.
arXiv link: http://arxiv.org/abs/2509.21172v1
Overidentification testing with weak instruments and heteroskedasticity
overidentification (OID) tests. We discuss the Kleibergen-Paap (KP) rank test
as a heteroskedasticity-robust OID test and compare to the typical J-test. We
derive the heteroskedastic weak-instrument limiting distributions for J and KP
as special cases of the robust score test estimated via 2SLS and LIML
respectively. Monte Carlo simulations show that KP usually performs better than
J, which is prone to severe size distortions. Test size depends on model
parameters not consistently estimable with weak instruments, so a conservative
approach is recommended. This generalises recommendations to use LIML-based OID
tests under homoskedasticity. We then revisit the classic problem of estimating
the elasticity of intertemporal substitution (EIS) in lifecycle consumption
models. Lagged macroeconomic indicators should provide naturally valid but
frequently weak instruments. The literature provides a wide range of estimates
for this parameter, and J frequently rejects the null of valid instruments. J
often rejects the null whereas KP does not; we suggest that J over-rejects,
sometimes severely. We argue that KP-test should be used over the J-test. We
also argue that instrument invalidity/misspecification is unlikely the cause of
the range of EIS estimates in the literature.
arXiv link: http://arxiv.org/abs/2509.21096v1
Recidivism and Peer Influence with LLM Text Embeddings in Low Security Correctional Facilities
Language Model (LLM) of 80,000-120,000 written affirmations and correction
exchanges among residents in low-security correctional facilities to be highly
predictive of recidivism. The prediction accuracy is 30% higher with embedding
vectors than with only pre-entry covariates. However, since the text embedding
vectors are high-dimensional, we perform Zero-Shot classification of these
texts to a low-dimensional vector of user-defined classes to aid interpretation
while retaining the predictive power. To shed light on the social dynamics
inside the correctional facilities, we estimate peer effects in these
LLM-generated numerical representations of language with a multivariate peer
effect model, adjusting for network endogeneity. We develop new methodology and
theory for peer effect estimation that accommodate sparse networks,
multivariate latent variables, and correlated multivariate outcomes. With these
new methods, we find significant peer effects in language usage for interaction
and feedback.
arXiv link: http://arxiv.org/abs/2509.20634v1
Identification and Semiparametric Estimation of Conditional Means from Aggregate Data
within groups when researchers only observe the average of the outcome and
group indicators across a set of aggregation units, such as geographical areas.
Existing methods for this problem, also known as ecological inference,
implicitly make strong assumptions about the aggregation process. We first
formalize weaker conditions for identification, which motivates estimators that
can efficiently control for many covariates. We propose a debiased machine
learning estimator that is based on nuisance functions restricted to a
partially linear form. Our estimator also admits a semiparametric sensitivity
analysis for violations of the key identifying assumption, as well as
asymptotically valid confidence intervals for local, unit-level estimates under
additional assumptions. Simulations and validation on real-world data where
ground truth is available demonstrate the advantages of our approach over
existing methods. Open-source software is available which implements the
proposed methods.
arXiv link: http://arxiv.org/abs/2509.20194v1
Identification and Estimation of Seller Risk Aversion in Ascending Auctions
optimal reserve price depends on the seller's risk attitude. Numerous studies
have found that observed reserve prices lie below the optimal level implied by
risk-neutral sellers, while the theoretical literature suggests that
risk-averse sellers can rationalize these empirical findings. In this paper, we
develop an econometric model of ascending auctions with a risk-averse seller
under independent private values. We provide primitive conditions for the
identification of the Arrow-Pratt measures of risk aversion and an estimator
for these measures that is consistent and converges in distribution to a normal
distribution at the parametric rate under standard regularity conditions. A
Monte Carlo study demonstrates good finite-sample performance of the estimator,
and we illustrate the approach using data from foreclosure real estate auctions
in S\ {a}o Paulo.
arXiv link: http://arxiv.org/abs/2509.19945v1
Decomposing Co-Movements in Matrix-Valued Time Series: A Pseudo-Structural Reduced-Rank Approach
co-movements in reduced-rank matrix autoregressive (RRMAR) models. Unlike
conventional vector-autoregressive (VAR) models that would discard the matrix
structure, our formulation preserves it, enabling a decomposition of
co-movements into three interpretable components: row-specific,
column-specific, and joint (row-column) interactions across the matrix-valued
time series. Our estimator admits standard asymptotic inference and we propose
a BIC-type criterion for the joint selection of the reduced ranks and the
autoregressive lag order. We validate the method's finite-sample performance in
terms of estimation accuracy, coverage and rank selection in simulation
experiments, including cases of rank misspecification. We illustrate the
method's practical usefelness in identifying co-movement structures in two
empirical applications: U.S. state-level coincident and leading indicators, and
cross-country macroeconomic indicators.
arXiv link: http://arxiv.org/abs/2509.19911v1
Driver Identification and PCA Augmented Selection Shrinkage Framework for Nordic System Price Forecasting
reference for financial hedge contracts such as Electricity Price Area
Differentials (EPADs) and other risk management instruments. Therefore, the
identification of drivers and the accurate forecasting of SP are essential for
market participants to design effective hedging strategies. This paper develops
a systematic framework that combines interpretable drivers analysis with robust
forecasting methods. It proposes an interpretable feature engineering algorithm
to identify the main drivers of the Nordic SP based on a novel combination of
K-means clustering, Multiple Seasonal-Trend Decomposition (MSTD), and Seasonal
Autoregressive Integrated Moving Average (SARIMA) model. Then, it applies
principal component analysis (PCA) to the identified data matrix, which is
adapted to the downstream task of price forecasting to mitigate the issue of
imperfect multicollinearity in the data. Finally, we propose a multi-forecast
selection-shrinkage algorithm for Nordic SP forecasting, which selects a subset
of complementary forecast models based on their bias-variance tradeoff at the
ensemble level and then computes the optimal weights for the retained forecast
models to minimize the error variance of the combined forecast. Using
historical data from the Nordic electricity market, we demonstrate that the
proposed approach outperforms individual input models uniformly, robustly, and
significantly, while maintaining a comparable computational cost. Notably, our
systematic framework produces superior results using simple input models,
outperforming the state-of-the-art Temporal Fusion Transformer (TFT).
Furthermore, we show that our approach also exceeds the performance of several
well-established practical forecast combination methods.
arXiv link: http://arxiv.org/abs/2509.18887v1
Optimal estimation for regression discontinuity design with binary outcomes
designs when the outcomes are bounded, including binary outcomes as the leading
case. Our finite-sample optimal estimator achieves the exact minimax mean
squared error among linear shrinkage estimators with nonnegative weights when
the regression function of a bounded outcome lies in a Lipschitz class.
Although the original minimax problem involves an iterating (n+1)-dimensional
non-convex optimization problem where n is the sample size, we show that our
estimator is obtained by solving a convex optimization problem. A key advantage
of our estimator is that the Lipschitz constant is the only tuning parameter.
We also propose a uniformly valid inference procedure without a large-sample
approximation. In a simulation exercise for small samples, our estimator
exhibits smaller mean squared errors and shorter confidence intervals than
conventional large-sample techniques which may be unreliable when the effective
sample size is small. We apply our method to an empirical multi-cutoff design
where the sample size for each cutoff is small. In the application, our method
yields informative confidence intervals, in contrast to the leading
large-sample approach.
arXiv link: http://arxiv.org/abs/2509.18857v1
Filtering amplitude dependence of correlation dynamics in complex systems: application to the cryptocurrency market
methodology for analyzing evolving correlation structures in complex systems
using the $q$-dependent detrended cross-correlation coefficient \rho(q,s). By
extending traditional metrics, this approach captures correlations at varying
fluctuation amplitudes and time scales. The method employs $q$-dependent
minimum spanning trees ($q$MSTs) to visualize evolving network structures.
Using minute-by-minute exchange rate data for 140 cryptocurrencies on Binance
(Jan 2021-Oct 2024), a rolling window analysis reveals significant shifts in
$q$MSTs, notably around April 2022 during the Terra/Luna crash. Initially
centralized around Bitcoin (BTC), the network later decentralized, with
Ethereum (ETH) and others gaining prominence. Spectral analysis confirms BTC's
declining dominance and increased diversification among assets. A key finding
is that medium-scale fluctuations exhibit stronger correlations than
large-scale ones, with $q$MSTs based on the latter being more decentralized.
Properly exploiting such facts may offer the possibility of a more flexible
optimal portfolio construction. Distance metrics highlight that major
disruptions amplify correlation differences, leading to fully decentralized
structures during crashes. These results demonstrate $q$MSTs' effectiveness in
uncovering fluctuation-dependent correlations, with potential applications
beyond finance, including biology, social and other complex systems.
arXiv link: http://arxiv.org/abs/2509.18820v1
An Econometric Analysis of the Impact of Telecare on the Length of Stay in Hospital
telecare to the length of stay in hospital and formulate three models that can
be used to derive the treatment effect by making various assumptions about the
probability distribution of the outcome measure. We then fit the models to data
and estimate them using a strategy that controls for the effects of confounding
variables and unobservable factors, and compare the treatment effects with that
of the Propensity Score Matching (PSM) technique which adopts a
quasi-experimental study design. To ensure comparability, the covariates are
kept identical in all cases. An important finding that emerges from our
analysis is that the treatment effects derived from our econometric models of
interest are better than that obtained from an experimental study design as the
latter does not account for all the relevant unobservable factors. In
particular, the results show that estimating the treatment effect of telecare
in the way that an experimental study design entails fails to account for the
systematic variations in individuals' health production functions within each
experimental arm.
arXiv link: http://arxiv.org/abs/2509.22706v1
Functional effects models: Accounting for preference heterogeneity in panel data with machine learning
Models, which use Machine Learning (ML) methodologies to learn
individual-specific preference parameters from socio-demographic
characteristics, therefore accounting for inter-individual heterogeneity in
panel choice data. We identify three specific advantages of the Functional
Effects Model over traditional fixed, and random/mixed effects models: (i) by
mapping individual-specific effects as a function of socio-demographic
variables, we can account for these effects when forecasting choices of
previously unobserved individuals (ii) the (approximate) maximum-likelihood
estimation of functional effects avoids the incidental parameters problem of
the fixed effects model, even when the number of observed choices per
individual is small; and (iii) we do not rely on the strong distributional
assumptions of the random effects model, which may not match reality. We learn
functional intercept and functional slopes with powerful non-linear machine
learning regressors for tabular data, namely gradient boosting decision trees
and deep neural networks. We validate our proposed methodology on a synthetic
experiment and three real-world panel case studies, demonstrating that the
Functional Effects Model: (i) can identify the true values of
individual-specific effects when the data generation process is known; (ii)
outperforms both state-of-the-art ML choice modelling techniques that omit
individual heterogeneity in terms of predictive performance, as well as
traditional static panel choice models in terms of learning inter-individual
heterogeneity. The results indicate that the FI-RUMBoost model, which combines
the individual-specific constants of the Functional Effects Model with the
complex, non-linear utilities of RUMBoost, performs marginally best on
large-scale revealed preference panel data.
arXiv link: http://arxiv.org/abs/2509.18047v1
Local Projections Bootstrap Inference
the data generating process (DGP) is a finite order vector autoregression
(VAR), often taken to be that implied by the local projection at horizon 1.
Although convenient, it is well documented that a VAR can be a poor
approximation to impulse dynamics at horizons beyond its lag length. In this
paper we assume instead that the precise form of the parametric model
generating the data is not known. If one is willing to assume that the DGP is
perhaps an infinite order process, a larger class of models can be accommodated
and more tailored bootstrap procedures can be constructed. Using the moving
average representation of the data, we construct appropriate bootstrap
procedures.
arXiv link: http://arxiv.org/abs/2509.17949v1
Bayesian Semi-supervised Inference via a Debiased Modeling Approach
in recent years due to increased relevance in modern big-data problems. In a
typical SS setting, there is a much larger-sized unlabeled data, containing
only observations of predictors, and a moderately sized labeled data containing
observations for both an outcome and the set of predictors. Such data naturally
arises when the outcome, unlike the predictors, is costly or difficult to
obtain. One of the primary statistical objectives in SS settings is to explore
whether parameter estimation can be improved by exploiting the unlabeled data.
We propose a novel Bayesian method for estimating the population mean in SS
settings. The approach yields estimators that are both efficient and optimal
for estimation and inference. The method itself has several interesting
artifacts. The central idea behind the method is to model certain summary
statistics of the data in a targeted manner, rather than the entire raw data
itself, along with a novel Bayesian notion of debiasing. Specifying appropriate
summary statistics crucially relies on a debiased representation of the
population mean that incorporates unlabeled data through a flexible nuisance
function while also learning its estimation bias. Combined with careful usage
of sample splitting, this debiasing approach mitigates the effect of bias due
to slow rates or misspecification of the nuisance parameter from the posterior
of the final parameter of interest, ensuring its robustness and efficiency.
Concrete theoretical results, via Bernstein--von Mises theorems, are
established, validating all claims, and are further supported through extensive
numerical studies. To our knowledge, this is possibly the first work on
Bayesian inference in SS settings, and its central ideas also apply more
broadly to other Bayesian semi-parametric inference problems.
arXiv link: http://arxiv.org/abs/2509.17385v1
Improving S&P 500 Volatility Forecasting through Regime-Switching Methods
management, derivatives pricing, and investment strategy. In this study, we
propose a multitude of regime-switching methods to improve the prediction of
S&P 500 volatility by capturing structural changes in the market across time.
We use eleven years of SPX data, from May 1st, 2014 to May 27th, 2025, to
compute daily realized volatility (RV) from 5-minute intraday log returns,
adjusted for irregular trading days. To enhance forecast accuracy, we
engineered features to capture both historical dynamics and forward-looking
market sentiment across regimes. The regime-switching methods include a soft
Markov switching algorithm to estimate soft-regime probabilities, a
distributional spectral clustering method that uses XGBoost to assign clusters
at prediction time, and a coefficient-based soft regime algorithm that extracts
HAR coefficients from time segments segmented through the Mood test and
clusters through Bayesian GMM for soft regime weights, using XGBoost to predict
regime probabilities. Models were evaluated across three time periods--before,
during, and after the COVID-19 pandemic. The coefficient-based clustering
algorithm outperformed all other models, including the baseline autoregressive
model, during all time periods. Additionally, each model was evaluated on its
recursive forecasting performance for 5- and 10-day horizons during each time
period. The findings of this study demonstrate the value of regime-aware
modeling frameworks and soft clustering approaches in improving volatility
forecasting, especially during periods of heightened uncertainty and structural
change.
arXiv link: http://arxiv.org/abs/2510.03236v1
Regularizing Extrapolation in Causal Inference
smoothers, where the prediction is a weighted average of the training outcomes.
Some estimators, such as ordinary least squares and kernel ridge regression,
allow for arbitrarily negative weights, which improve feature imbalance but
often at the cost of increased dependence on parametric modeling assumptions
and higher variance. By contrast, estimators like importance weighting and
random forests (sometimes implicitly) restrict weights to be non-negative,
reducing dependence on parametric modeling and variance at the cost of worse
imbalance. In this paper, we propose a unified framework that directly
penalizes the level of extrapolation, replacing the current practice of a hard
non-negativity constraint with a soft constraint and corresponding
hyperparameter. We derive a worst-case extrapolation error bound and introduce
a novel "bias-bias-variance" tradeoff, encompassing biases due to feature
imbalance, model misspecification, and estimator variance; this tradeoff is
especially pronounced in high dimensions, particularly when positivity is poor.
We then develop an optimization procedure that regularizes this bound while
minimizing imbalance and outline how to use this approach as a sensitivity
analysis for dependence on parametric modeling assumptions. We demonstrate the
effectiveness of our approach through synthetic experiments and a real-world
application, involving the generalization of randomized controlled trial
estimates to a target population of interest.
arXiv link: http://arxiv.org/abs/2509.17180v1
KRED: Korea Research Economic Database for Macroeconomic Research
macroeconomic dataset for South Korea. KRED is constructed by aggregating 88
key monthly time series from multiple official sources (e.g., Bank of Korea
ECOS, Statistics Korea KOSIS) into a unified, publicly available database. The
dataset is aligned with the FRED MD format, enabling standardized
transformations and direct comparability; an Appendix maps each Korean series
to its FRED MD counterpart. Using a balanced panel of 80 series from 2009 to
2024, we extract four principal components via PCA that explain approximately
40% of the total variance. These four factors have intuitive economic
interpretations, capturing monetary conditions, labor market activity, real
output, and housing demand, analogous to diffusion indexes summarizing broad
economic movements. Notably, the factor based diffusion indexes derived from
KRED clearly trace major macroeconomic fluctuations over the sample period such
as the 2020 COVID 19 recession. Our results demonstrate that KRED's factor
structure can effectively condense complex economic information into a few
informative indexes, yielding new insights into South Korea's business cycles
and co movements.
arXiv link: http://arxiv.org/abs/2509.16115v1
Beyond the Average: Distributional Causal Inference under Imperfect Compliance
experiments with imperfect compliance. When participants do not adhere to their
assigned treatments, we leverage treatment assignment as an instrumental
variable to identify the local distributional treatment effect-the difference
in outcome distributions between treatment and control groups for the
subpopulation of compliers. We propose a regression-adjusted estimator based on
a distribution regression framework with Neyman-orthogonal moment conditions,
enabling robustness and flexibility with high-dimensional covariates. Our
approach accommodates continuous, discrete, and mixed discrete-continuous
outcomes, and applies under a broad class of covariate-adaptive randomization
schemes, including stratified block designs and simple random sampling. We
derive the estimator's asymptotic distribution and show that it achieves the
semiparametric efficiency bound. Simulation results demonstrate favorable
finite-sample performance, and we demonstrate the method's practical relevance
in an application to the Oregon Health Insurance Experiment.
arXiv link: http://arxiv.org/abs/2509.15594v1
Inference on the Distribution of Individual Treatment Effects in Nonseparable Triangular Models
heterogeneous individual treatment effects (ITEs) in the nonseparable
triangular model with a binary endogenous treatment and a binary instrument of
Vuong and Xu (2017) and Feng, Vuong, and Xu (2019). We focus on the estimation
of the cumulative distribution function (CDF) of the ITE, which can be used to
address a wide range of practically important questions such as inference on
the proportion of individuals with positive ITEs, the quantiles of the
distribution of ITEs, and the interquartile range as a measure of the spread of
the ITEs, as well as comparison of the ITE distributions across
sub-populations. Moreover, our CDF-based approach can deliver more precise
results than density-based approach previously considered in the literature. We
establish weak convergence to tight Gaussian processes for the empirical CDF
and quantile function computed from nonparametric ITE estimates of Feng, Vuong,
and Xu (2019). Using those results, we develop bootstrap-based nonparametric
inferential methods, including uniform confidence bands for the CDF and
quantile function of the ITE distribution.
arXiv link: http://arxiv.org/abs/2509.15401v1
Efficient and Accessible Discrete Choice Experiments: The DCEtool Package for R
products or services by analyzing choices among alternatives described by their
attributes. The quality of the insights obtained from a DCE heavily depends on
the properties of its experimental design. While early DCEs often relied on
linear criteria such as orthogonality, these approaches were later found to be
inappropriate for discrete choice models, which are inherently non-linear. As a
result, statistically efficient design methods, based on minimizing the D-error
to reduce parameter variance, have become the standard. Although such methods
are implemented in several commercial tools, researchers seeking free and
accessible solutions often face limitations. This paper presents DCEtool, an R
package with a Shiny-based graphical interface designed to support both novice
and experienced users in constructing, decoding, and analyzing statistically
efficient DCE designs. DCEtool facilitates the implementation of serial DCEs,
offers flexible design settings, and enables rapid estimation of discrete
choice models. By making advanced design techniques more accessible, DCEtool
contributes to the broader adoption of rigorous experimental practices in
choice modelling.
arXiv link: http://arxiv.org/abs/2509.15326v1
Monetary Policy and Exchange Rate Fluctuations
general stochastic process and incorporate monetary policy shock to examine how
bilateral exchange rate fluctuations affect the Revealed Comparative Advantage
(RCA) index. Numerical simulations indicate that as the mean of bilateral
exchange rate fluctuations increases, i.e., currency devaluation, the RCA index
rises. Moreover, smaller bilateral exchange rate fluctuations after the policy
shock cause the RCA index to gradually converge toward its mean level. For the
empirical analysis, we select the USD-CNY bilateral exchange rate and
provincial manufacturing industry export competitiveness data in China from
2008 to 2021. We find that in the short term, when exchange rate fluctuations
stabilize within a range less than 0.2 RMB depreciation will effectively boost
export competitiveness. Then, the 8.11 exchange rate policy reversed the
previous linear trend of the CNY, stabilizing it within a narrow fluctuation
range over the long term. This policy leads to a gradual convergence of
provincial RCA indices toward a relatively high level, which is commensurate
with our numerical simulations, and indirectly enhances provincial export
competitiveness.
arXiv link: http://arxiv.org/abs/2509.15169v1
Forecasting in small open emerging economies Evidence from Thailand
time series and strong external exposures create an imbalance between few
observations and many potential predictors. We study this challenge using
Thailand as a representative case, combining more than 450 domestic and
international indicators. We evaluate modern Bayesian shrinkage and factor
models, including Horseshoe regressions, factor-augmented autoregressions,
factor-augmented VARs, dynamic factor models, and Bayesian additive regression
trees.
Our results show that factor models dominate at short horizons, when global
shocks and exchange rate movements drive inflation, while shrinkage-based
regressions perform best at longer horizons. These models not only improve
point and density forecasts but also enhance tail-risk performance at the
one-year horizon.
Shrinkage diagnostics, on the other hand, additionally reveal that Google
Trends variables, especially those related to food essential goods and housing
costs, progressively rotate into predictive importance as the horizon
lengthens. This underscores their role as forward-looking indicators of
household inflation expectations in small open economies.
arXiv link: http://arxiv.org/abs/2509.14805v1
Time-Varying Heterogeneous Treatment Effects in Event Studies
treatment effects in event studies, emphasizing the importance of both lagged
dependent variables and treatment effect heterogeneity. We show that omitting
lagged dependent variables can induce omitted variable bias in the estimated
time-varying treatment effects. We develop a novel semiparametric approach
based on a short-T dynamic linear panel model with correlated random
coefficients, where the time-varying heterogeneous treatment effects can be
modeled by a time-series process to reduce dimensionality. We construct a
two-step estimator employing quasi-maximum likelihood for common parameters and
empirical Bayes for the heterogeneous treatment effects. The procedure is
flexible, easy to implement, and achieves ratio optimality asymptotically. Our
results also provide insights into common assumptions in the event study
literature, such as no anticipation, homogeneous treatment effects across
treatment timing cohorts, and state dependence structure.
arXiv link: http://arxiv.org/abs/2509.13698v1
Generalized Covariance Estimator under Misspecification and Constraints
estimator under misspecification and constraints with application to processes
with local explosive patterns, such as causal-noncausal and double
autoregressive (DAR) processes. We show that GCov is consistent and has an
asymptotically Normal distribution under misspecification. Then, we construct
GCov-based Wald-type and score-type tests to test one specification against the
other, all of which follow a $\chi^2$ distribution. Furthermore, we propose the
constrained GCov (CGCov) estimator, which extends the use of the GCov estimator
to a broader range of models with constraints on their parameters. We
investigate the asymptotic distribution of the CGCov estimator when the true
parameters are far from the boundary and on the boundary of the parameter
space. We validate the finite sample performance of the proposed estimators and
tests in the context of causal-noncausal and DAR models. Finally, we provide
two empirical applications by applying the noncausal model to the final energy
demand commodity index and also the DAR model to the US 3-month treasury bill.
arXiv link: http://arxiv.org/abs/2509.13492v1
Dynamic Local Average Treatment Effects in Time Series
local average treatment effects (LATEs) in instrumental variables (IVs)
settings. First, we show that compliers--observations whose treatment status is
affected by the instrument--can be identified individually in time series data
using smoothness assumptions and local comparisons of treatment assignments.
Second, we show that this result enables not only better interpretability of IV
estimates but also direct testing of the exclusion restriction by comparing
outcomes among identified non-compliers across instrument values. Third, we
document pervasive weak identification in applied work using IVs with time
series data by surveying recent publications in leading economics journals.
However, we find that strong identification often holds in large subsamples for
which the instrument induces changes in the treatment. Motivated by this, we
introduce a method based on dynamic programming to detect the most
strongly-identified subsample and show how to use this subsample to improve
estimation and inference. We also develop new identification-robust inference
procedures that focus on the most strongly-identified subsample, offering
efficiency gains relative to existing full sample identification-robust
inference when identification fails over parts of the sample. Finally, we apply
our results to heteroskedasticity-based identification of monetary policy
effects. We find that about 75% of observations are compliers (i.e., cases
where the variance of the policy shifts up on FOMC announcement days), and we
fail to reject the exclusion restriction. Estimation using the most
strongly-identified subsample helps reconcile conflicting IV and GMM estimates
in the literature.
arXiv link: http://arxiv.org/abs/2509.12985v1
Policy-relevant causal effect estimation using instrumental variables with interference
individuals who interact with each other, potentially violating the standard IV
assumptions. This paper defines and partially identifies direct and spillover
effects with a clear policy-relevant interpretation under relatively mild
assumptions on interference. Our framework accommodates both spillovers from
the instrument to treatment and from treatment to outcomes and allows for
multiple peers. By generalizing monotone treatment response and selection
assumptions, we derive informative bounds on policy-relevant effects without
restricting the type or direction of interference. The results extend IV
estimation to more realistic social contexts, informing program evaluation and
treatment scaling when interference is present.
arXiv link: http://arxiv.org/abs/2509.12538v1
A Decision Theoretic Perspective on Artificial Superintelligence: Coping with Missing Data Problems in Prediction and Treatment Choice
artificial general intelligence and, even more ambitiously, artificial
superintelligence. We wonder about the implications for our methodological
research, which aims to help decision makers cope with what econometricians
call identification problems, inferential problems in empirical research that
do not diminish as sample size grows. Of particular concern are missing data
problems in prediction and treatment choice. Essentially all data collection
intended to inform decision making is subject to missing data, which gives rise
to identification problems. Thus far, we see no indication that the current
dominant architecture of machine learning (ML)-based artificial intelligence
(AI) systems will outperform humans in this context. In this paper, we explain
why we have reached this conclusion and why we see the missing data problem as
a cautionary case study in the quest for superintelligence more generally. We
first discuss the concept of intelligence, before presenting a
decision-theoretic perspective that formalizes the connection between
intelligence and identification problems. We next apply this perspective to two
leading cases of missing data problems. Then we explain why we are skeptical
that AI research is currently on a path toward machines doing better than
humans at solving these identification problems.
arXiv link: http://arxiv.org/abs/2509.12388v1
Fairness-Aware and Interpretable Policy Learning
decision-making algorithms across many application domains. These requirements
are intended to avoid undesirable group differences and to alleviate concerns
related to transparency. This paper proposes a framework that integrates
fairness and interpretability into algorithmic decision making by combining
data transformation with policy trees, a class of interpretable policy
functions. The approach is based on pre-processing the data to remove
dependencies between sensitive attributes and decision-relevant features,
followed by a tree-based optimization to obtain the policy. Since data
pre-processing compromises interpretability, an additional transformation maps
the parameters of the resulting tree back to the original feature space. This
procedure enhances fairness by yielding policy allocations that are pairwise
independent of sensitive attributes, without sacrificing interpretability.
Using administrative data from Switzerland to analyze the allocation of
unemployed individuals to active labor market programs (ALMP), the framework is
shown to perform well in a realistic policy setting. Effects of integrating
fairness and interpretability constraints are measured through the change in
expected employment outcomes. The results indicate that, for this particular
application, fairness can be substantially improved at relatively low cost.
arXiv link: http://arxiv.org/abs/2509.12119v1
The Honest Truth About Causal Trees: Accuracy Limits for Heterogeneous Treatment Effect Estimation
heterogeneous causal treatment effect estimation and inference in experimental
and observational settings. These procedures are fitted using the celebrated
CART (Classification And Regression Tree) algorithm [Breiman et al., 1984], or
custom variants thereof, and hence are believed to be "adaptive" to
high-dimensional data, sparsity, or other specific features of the underlying
data generating process. Athey and Imbens [2016] proposed several "honest"
causal decision tree estimators, which have become the standard in both
academia and industry. We study their estimators, and variants thereof, and
establish lower bounds on their estimation error. We demonstrate that these
popular heterogeneous treatment effect estimators cannot achieve a
polynomial-in-$n$ convergence rate under basic conditions, where $n$ denotes
the sample size. Contrary to common belief, honesty does not resolve these
limitations and at best delivers negligible logarithmic improvements in sample
size or dimension. As a result, these commonly used estimators can exhibit poor
performance in practice, and even be inconsistent in some settings. Our
theoretical insights are empirically validated through simulations.
arXiv link: http://arxiv.org/abs/2509.11381v1
What is in a Price? Estimating Willingness-to-Pay with Bayesian Hierarchical Models
but about understanding the perceived monetary value of the features that
justify a higher cost. This paper proposes a robust methodology to deconstruct
a product's price into the tangible value of its constituent parts. We employ
Bayesian Hierarchical Conjoint Analysis, a sophisticated statistical technique,
to solve this high-stakes business problem using the Apple iPhone as a
universally recognizable case study. We first simulate a realistic choice based
conjoint survey where consumers choose between different hypothetical iPhone
configurations. We then develop a Bayesian Hierarchical Logit Model to infer
consumer preferences from this choice data. The core innovation of our model is
its ability to directly estimate the Willingness-to-Pay (WTP) in dollars for
specific feature upgrades, such as a "Pro" camera system or increased storage.
Our results demonstrate that the model successfully recovers the true,
underlying feature valuations from noisy data, providing not just a point
estimate but a full posterior probability distribution for the dollar value of
each feature. This work provides a powerful, practical framework for
data-driven product design and pricing strategy, enabling businesses to make
more intelligent decisions about which features to build and how to price them.
arXiv link: http://arxiv.org/abs/2509.11089v1
Large-Scale Curve Time Series with Common Stochastic Trends
trends. A dual functional factor model structure is adopted with a
high-dimensional factor model for the observed curve time series and a
low-dimensional factor model for the latent curves with common trends. A
functional PCA technique is applied to estimate the common stochastic trends
and functional factor loadings. Under some regularity conditions we derive the
mean square convergence and limit distribution theory for the developed
estimates, allowing the dimension and sample size to jointly diverge to
infinity. We propose an easy-to-implement criterion to consistently select the
number of common stochastic trends and further discuss model estimation when
the nonstationary factors are cointegrated. Extensive Monte-Carlo simulations
and two empirical applications to large-scale temperature curves in Australia
and log-price curves of S&P 500 stocks are conducted, showing finite-sample
performance and providing practical implementations of the new methodology.
arXiv link: http://arxiv.org/abs/2509.11060v1
Climate change: across time and frequencies
change across time and frequencies. This approach allows us to capture the
changing patterns in the relationship between global mean temperature anomalies
and climate forcings. Using historical data from 1850 to 2022, we find that
greenhouse gases, and CO$_2$ in particular, play a significant role in driving
the very low frequency trending behaviour in temperatures, even after
controlling for the effects of natural forcings. At shorter frequencies, the
effect of forcings on temperatures switches on and off, most likely because of
complex feedback mechanisms in Earth's climate system.
arXiv link: http://arxiv.org/abs/2509.21334v1
Taking the Highway or the Green Road? Conditional Temperature Forecasts Under Alternative SSP Scenarios
(2025), we produce conditional temperature forecasts up until 2050, by
exploiting both equality and inequality constraints on climate drivers like
carbon dioxide or methane emissions. Engaging in a counterfactual scenario
analysis by imposing a Shared Socioeconomic Pathways (SSPs) scenario of
"business as-usual", with no mitigation and high emissions, we observe that
conditional and unconditional forecasts would follow a similar path. Instead,
if a high mitigation with low emissions scenario were to be followed, the
conditional temperature paths would remain below the unconditional trajectory
after 2040, i.e. temperatures increases can potentially slow down in a
meaningful way, but the lags for changes in emissions to have an effect are
quite substantial. The latter should be taken into account greatly when
designing response policies to climate change.
arXiv link: http://arxiv.org/abs/2509.09384v1
Functional Regression with Nonstationarity and Error Contamination: Application to the Economic Impact of Climate Change
explanatory variables, both of which exhibit nonstationary dynamics. The model
assumes that the nonstationary stochastic trends of the dependent variable are
explained by those of the explanatory variables, and hence that there exists a
stable long-run relationship between the two variables despite their
nonstationary behavior. We also assume that the functional observations may be
error-contaminated. We develop novel autocovariance-based estimation and
inference methods for this model. The methodology is broadly applicable to
economic and statistical functional time series with nonstationary dynamics. To
illustrate our methodology and its usefulness, we apply it to evaluating the
global economic impact of climate change, an issue of intrinsic importance.
arXiv link: http://arxiv.org/abs/2509.08591v3
On the Identification of Diagnostic Expectations: Econometric Insights from DSGE Models
expectations (DE) in DSGE models. Using the identification framework of Qu and
Tkachenko (2017), I show that DE generate dynamics unreplicable under rational
expectations (RE), with no RE parameterization capable of matching the
autocovariance implied by DE. Consequently, DE are not observationally
equivalent to RE and constitute an endogenous source of macroeconomic
fluctuations, distinct from both structural frictions and exogenous shocks.
From an econometric perspective, DE preserve overall model identification but
weaken the identification of shock variances. To ensure robust conclusions
across estimation methods and equilibrium conditions, I extend Bayesian
estimation with Sequential Monte Carlo sampling to the indeterminacy domain.
These findings advance the econometric study of expectations and highlight the
macroeconomic relevance of diagnostic beliefs.
arXiv link: http://arxiv.org/abs/2509.08472v2
Posterior inference of attitude-behaviour relationships using latent class choice models
modelling for two decades, with the widespread application of ever more complex
hybrid choice models. This paper proposes a flexible and transparent
alternative framework for empirically examining the relationship between
attitudes and behaviours using latent class choice models (LCCMs). Rather than
embedding attitudinal constructs within the structural model, as in hybrid
choice frameworks, we recover class-specific attitudinal profiles through
posterior inference. This approach enables analysts to explore
attitude-behaviour associations without the complexity and convergence issues
often associated with integrated estimation. Two case studies are used to
demonstrate the framework: one on employee preferences for working from home,
and another on public acceptance of COVID-19 vaccines. Across both studies, we
compare posterior profiling of indicator means, fractional multinomial logit
(FMNL) models, factor-based representations, and hybrid specifications. We find
that posterior inference methods provide behaviourally rich insights with
minimal additional complexity, while factor-based models risk discarding key
attitudinal information, and fullinformation hybrid models offer little gain in
explanatory power and incur substantially greater estimation burden. Our
findings suggest that when the goal is to explain preference heterogeneity,
posterior inference offers a practical alternative to hybrid models, one that
retains interpretability and robustness without sacrificing behavioural depth.
arXiv link: http://arxiv.org/abs/2509.08373v1
Chaotic Bayesian Inference: Strange Attractors as Risk Models for Black Swan Events
geometry of Bayesian inference. By combining heavy-tailed priors with Lorenz
and Rossler dynamics, the models naturally generate volatility clustering, fat
tails, and extreme events. We compare two complementary approaches: Model A,
which emphasizes geometric stability, and Model B, which highlights rare bursts
using Fibonacci diagnostics. Together, they provide a dual perspective for
systemic risk analysis, linking Black Swan theory to practical tools for stress
testing and volatility monitoring.
arXiv link: http://arxiv.org/abs/2509.08183v1
Estimating Peer Effects Using Partial Network Data
researchers do not observe the entire network structure. Special cases include
sampled networks, censored networks, and misclassified links. We assume that
researchers can obtain a consistent estimator of the distribution of the
network. We show that this assumption is sufficient for estimating peer effects
using a linear-in-means model. We provide an empirical application to the study
of peer effects on students' academic achievement using the widely used Add
Health database, and show that network data errors have a large downward bias
on estimated peer effects.
arXiv link: http://arxiv.org/abs/2509.08145v1
Epsilon-Minimax Solutions of Statistical Decision Problems
epsilon. We present an algorithm for provably obtaining epsilon-minimax
solutions of statistical decision problems. We are interested in problems where
the statistician chooses randomly among I decision rules. The minimax solution
of these problems admits a convex programming representation over the
(I-1)-simplex. Our suggested algorithm is a well-known mirror subgradient
descent routine, designed to approximately solve the convex optimization
problem that defines the minimax decision rule. This iterative routine is known
in the computer science literature as the hedge algorithm and it is used in
algorithmic game theory as a practical tool to find approximate solutions of
two-person zero-sum games. We apply the suggested algorithm to different
minimax problems in the econometrics literature. An empirical application to
the problem of optimally selecting sites to maximize the external validity of
an experimental policy evaluation illustrates the usefulness of the suggested
procedure.
arXiv link: http://arxiv.org/abs/2509.08107v1
Forecasting dementia incidence
over time. We proceed in two steps: first, we estimate a time trend for
dementia using a multi-state Cox model. The multi-state model addresses
problems of both interval censoring arising from infrequent measurement and
also measurement error in dementia. Second, we feed the estimated mean and
variance of the time trend into a Kalman filter to infer the population level
dementia process. Using data from the English Longitudinal Study of Aging
(ELSA), we find that dementia incidence is no longer declining in England.
Furthermore, our forecast is that future incidence remains constant, although
there is considerable uncertainty in this forecast. Our two-step estimation
procedure has significant computational advantages by combining a multi-state
model with a time series method. To account for the short sample that is
available for dementia, we derive expressions for the Kalman filter's
convergence speed, size, and power to detect changes and conclude our estimator
performs well even in short samples.
arXiv link: http://arxiv.org/abs/2509.07874v1
Estimating Social Network Models with Link Misclassification
binary network links are misclassified (some zeros reported as ones and vice
versa) due, e.g., to survey respondents' recall errors, or lapses in data
input. We show misclassification adds new sources of correlation between the
regressors and errors, which makes all covariates endogenous and invalidates
conventional estimators. We resolve these issues by constructing a novel
estimator of misclassification rates and using those estimates to both adjust
endogenous peer outcomes and construct new instruments for 2SLS estimation. A
distinctive feature of our method is that it does not require structural
modeling of link formation. Simulation results confirm our adjusted 2SLS
estimator corrects the bias from a naive, unadjusted 2SLS estimator which
ignores misclassification and uses conventional instruments. We apply our
method to study peer effects in household decisions to participate in a
microfinance program in Indian villages.
arXiv link: http://arxiv.org/abs/2509.07343v1
Optimal Policy Learning for Multi-Action Treatment with Risk Preference using Stata
the companion command "opl_ma_vf"), for implementing the first-best Optimal
Policy Learning (OPL) algorithm to estimate the best treatment assignment given
the observation of an outcome, a multi-action (or multi-arm) treatment, and a
set of observed covariates (features). It allows for different risk preferences
in decision-making (i.e., risk-neutral, linear risk-averse, and quadratic
risk-averse), and provides a graphical representation of the optimal policy,
along with an estimate of the maximal welfare (i.e., the value-function
estimated at optimal policy) using regression adjustment (RA),
inverse-probability weighting (IPW), and doubly robust (DR) formulas.
arXiv link: http://arxiv.org/abs/2509.06851v1
Neural ARFIMA model for forecasting BRIC exchange rates with long memory under oil shocks and policy uncertainties
particularly for emerging economies such as Brazil, Russia, India, and China
(BRIC). These series exhibit long memory, nonlinearity, and non-stationarity
properties that conventional time series models struggle to capture.
Additionally, there exist several key drivers of exchange rate dynamics,
including global economic policy uncertainty, US equity market volatility, US
monetary policy uncertainty, oil price growth rates, and country-specific
short-term interest rate differentials. These empirical complexities underscore
the need for a flexible modeling framework that can jointly accommodate long
memory, nonlinearity, and the influence of external drivers. To address these
challenges, we propose a Neural AutoRegressive Fractionally Integrated Moving
Average (NARFIMA) model that combines the long-memory representation of ARFIMA
with the nonlinear learning capacity of neural networks, while flexibly
incorporating exogenous causal variables. We establish theoretical properties
of the model, including asymptotic stationarity of the NARFIMA process using
Markov chains and nonlinear time series techniques. We quantify forecast
uncertainty using conformal prediction intervals within the NARFIMA framework.
Empirical results across six forecast horizons show that NARFIMA consistently
outperforms various state-of-the-art statistical and machine learning models in
forecasting BRIC exchange rates. These findings provide new insights for
policymakers and market participants navigating volatile financial conditions.
The narfima R package provides an implementation of our
approach.
arXiv link: http://arxiv.org/abs/2509.06697v1
Largevars: An R Package for Testing Large VARs for the Presence of Cointegration
whether its non-stationary, growing components have a stationary linear
combination. Largevars R package conducts a cointegration test for
high-dimensional vector autoregressions of order k based on the large N, T
asymptotics of Bykhovskaya and Gorin (2022, 2025). The implemented test is a
modification of the Johansen likelihood ratio test. In the absence of
cointegration the test converges to the partial sum of the Airy_1 point
process, an object arising in random matrix theory.
The package and this article contain simulated quantiles of the first ten
partial sums of the Airy_1 point process that are precise up to the first 3
digits. We also include two examples using Largevars: an empirical example on
S&P100 stocks and a simulated VAR(2) example.
arXiv link: http://arxiv.org/abs/2509.06295v1
Predicting Market Troughs: A Machine Learning Approach with Causal Interpretation
troughs. We demonstrate that conclusions about these triggers are critically
sensitive to model specification, moving beyond restrictive linear models with
a flexible DML average partial effect causal machine learning framework. Our
robust estimates identify the volatility of options-implied risk appetite and
market liquidity as key causal drivers, relationships misrepresented or
obscured by simpler models. These findings provide high-frequency empirical
support for intermediary asset pricing theories. This causal analysis is
enabled by a high-performance nowcasting model that accurately identifies
capitulation events in real-time.
arXiv link: http://arxiv.org/abs/2509.05922v1
Polynomial Log-Marginals and Tweedie's Formula : When Is Bayes Possible?
the theoretical foundations of empirical Bayes estimators that directly model
the marginal density $m(y)$. Our main result shows that polynomial
log-marginals of degree $k \ge 3 $ cannot arise from any valid prior
distribution in exponential family models, while quadratic forms correspond
exactly to Gaussian priors. This provides theoretical justification for why
certain empirical Bayes decision rules, while practically useful, do not
correspond to any formal Bayes procedures. We also strengthen the diagnostic by
showing that a marginal is a Gaussian convolution only if it extends to a
bounded solution of the heat equation in a neighborhood of the smoothing
parameter, beyond the convexity of $c(y)=\tfrac12 y^2+\log m(y)$.
arXiv link: http://arxiv.org/abs/2509.05823v1
Utilitarian or Quantile-Welfare Evaluation of Health Policy?
alternative to utilitarian evaluation. Manski (1988) originally proposed and
studied maximization of quantile utility as a model of individual decision
making under uncertainty, juxtaposing it with maximization of expected utility.
That paper's primary motivation was to exploit the fact that maximization of
quantile utility requires only an ordinal formalization of utility, not a
cardinal one. This paper transfers these ideas from analysis of individual
decision making to analysis of social planning. We begin by summarizing basic
theoretical properties of quantile welfare in general terms rather than related
specifically to health policy. We then turn attention to health policy and
propose a procedure to nonparametrically bound the quantile welfare of health
states using data from binary-choice time-tradeoff (TTO) experiments of the
type regularly performed by health economists. After this we assess related
econometric considerations concerning measurement, using the EQ-5D framework to
structure our discussion.
arXiv link: http://arxiv.org/abs/2509.05529v2
Bayesian Inference for Confounding Variables and Limited Information
variables that may distort observed associations between treatment and outcome.
Conventional "causal" methods, grounded in assumptions such as ignorability,
exclude the possibility of unobserved confounders, leading to posterior
inferences that overstate certainty. We develop a Bayesian framework that
relaxes these assumptions by introducing entropy-favoring priors over
hypothesis spaces that explicitly allow for latent confounding variables and
partial information. Using the case of Simpson's paradox, we demonstrate how
this approach produces logically consistent posterior distributions that widen
credibly intervals in the presence of potential confounding. Our method
provides a generalizable, information-theoretic foundation for more robust
predictive inference in observational sciences.
arXiv link: http://arxiv.org/abs/2509.05520v1
Causal mechanism and mediation analysis for macroeconomics dynamics: a bridge of Granger and Sims causality
disentangle the dynamic contributions of the mediator variables in the
transmission of structural shocks. We justify our decomposition by drawing on
causal mediation analysis and demonstrating its equivalence to the average
mediation effect. Our result establishes a formal link between Sims and Granger
causality. Sims causality captures the total effect, while Granger causality
corresponds to the mediation effect. We construct a dynamic mediation index
that quantifies the evolving role of mediator variables in shock propagation.
Applying our framework to studies of the transmission channels of US monetary
policy, we find that investor sentiment explains approximately 60% of the peak
aggregate output response in three months following a policy shock, while
expected default risk contributes negligibly across all horizons.
arXiv link: http://arxiv.org/abs/2509.05284v1
Treatment Effects of Multi-Valued Treatments in Hyper-Rectangle Model
within multi-valued treatment models. Extending the hyper-rectangle model
introduced by Lee and Salanie (2018), this paper relaxes restrictive
assumptions, including the requirement of known treatment selection thresholds
and the dependence of treatments on all unobserved heterogeneity. By
incorporating an additional ranked treatment assumption, this study
demonstrates that the marginal treatment responses can be identified under a
broader set of conditions, either point or set identification. The framework
further enables the derivation of various treatment effects from the marginal
treatment responses. Additionally, this paper introduces a hypothesis testing
method to evaluate the effectiveness of policies on treatment effects,
enhancing its applicability to empirical policy analysis.
arXiv link: http://arxiv.org/abs/2509.05177v1
Optimal Estimation for General Gaussian Processes
for general Gaussian processes, where all parameters are estimated jointly. The
exact ML estimator (MLE) is consistent and asymptotically normally distributed.
We prove the local asymptotic normality (LAN) property of the sequence of
statistical experiments for general Gaussian processes in the sense of Le Cam,
thereby enabling optimal estimation and facilitating statistical inference. The
results rely solely on the asymptotic behavior of the spectral density near
zero, allowing them to be widely applied. The established optimality not only
addresses the gap left by Adenstedt(1974), who proposed an efficient but
infeasible estimator for the long-run mean $\mu$, but also enables us to
evaluate the finite-sample performance of the existing method -- the commonly
used plug-in MLE, in which the sample mean is substituted into the likelihood.
Our simulation results show that the plug-in MLE performs nearly as well as the
exact MLE, alleviating concerns that inefficient estimation of $\mu$ would
compromise the efficiency of the remaining parameter estimates.
arXiv link: http://arxiv.org/abs/2509.04987v1
A Bayesian Gaussian Process Dynamic Factor Model
to observed variables with unknown and potentially nonlinear functions. The key
novelty and source of flexibility of our approach is a nonparametric
observation equation, specified via Gaussian Process (GP) priors for each
series. Factor dynamics are modeled with a standard vector autoregression
(VAR), which facilitates computation and interpretation. We discuss a
computationally efficient estimation algorithm and consider two empirical
applications. First, we forecast key series from the FRED-QD dataset and show
that the model yields improvements in predictive accuracy relative to linear
benchmarks. Second, we extract driving factors of global inflation dynamics
with the GP-DFM, which allows for capturing international asymmetries.
arXiv link: http://arxiv.org/abs/2509.04928v1
The exact distribution of the conditional likelihood-ratio test in instrumental variables regression
likelihood-ratio test in instrumental variables regression under weak
instrument asymptotics and for multiple endogenous variables. The distribution
is conditional on all eigenvalues of the concentration matrix, rather than only
the smallest eigenvalue as in an existing asymptotic upper bound. This exact
characterization leads to a substantially more powerful test if there are
differently identified endogenous variables. We provide computational methods
implementing the test and demonstrate the power gains through numerical
analysis.
arXiv link: http://arxiv.org/abs/2509.04144v2
Selecting the Best Arm in One-Shot Multi-Arm RCTs: The Asymptotic Minimax-Regret Decision Framework for the Best-Population Selection Problem
arm in one-shot, multi-arm randomized controlled trials (RCTs). Our approach
characterizes the minimax-regret (MMR) optimal decision rule for any
location-family reward distribution with full support. We show that the MMR
rule is deterministic, unique, and computationally tractable, as it can be
derived by solving the dual problem with nature's least-favorable prior. We
then specialize to the case of multivariate normal (MVN) rewards with an
arbitrary covariance matrix, and establish the local asymptotic minimaxity of a
plug-in version of the rule when only estimated means and covariances are
available. This asymptotic MMR (AMMR) procedure maps a covariance-matrix
estimate directly into decision boundaries, allowing straightforward
implementation in practice. Our analysis highlights a sharp contrast between
two-arm and multi-arm designs. With two arms, the empirical success rule
("pick-the-winner") remains MMR-optimal, regardless of the arm-specific
variances. By contrast, with three or more arms and heterogeneous variances,
the empirical success rule is no longer optimal: the MMR decision boundaries
become nonlinear and systematically penalize high-variance arms, requiring
stronger evidence to select them. This result underscores that variance plays
no role in optimal two-arm comparisons, but it matters critically when more
than two options are on the table. Our multi-arm AMMR framework extends
classical decision theory to multi-arm RCTs, offering a rigorous foundation and
a practical tool for comparing multiple policies simultaneously.
arXiv link: http://arxiv.org/abs/2509.03796v1
Data driven modeling of multiple interest rates with generalized Vasicek-type models
many extensions and generalizations of it. However, most generalizations of the
model are either univariate or assume the noise process to be Gaussian, or
both. In this article, we study a generalized multivariate Vasicek model that
allows simultaneous modeling of multiple interest rates while making minimal
assumptions. In the model, we only assume that the noise process has stationary
increments with a suitably decaying autocovariance structure. We provide
estimators for the unknown parameters and prove their consistencies. We also
derive limiting distributions for each estimator and provide theoretical
examples. Furthermore, the model is tested empirically with both simulated data
and real data.
arXiv link: http://arxiv.org/abs/2509.03208v1
Bias Correction in Factor-Augmented Regression Models with Weak Factors
regression estimator and its reduction, which is augmented by the $r$ factors
extracted from a large number of $N$ variables with $T$ observations. In
particular, we consider general weak latent factor models with $r$ signal
eigenvalues that may diverge at different rates, $N^{\alpha _{k}}$, $0<\alpha
_{k}\leq 1$, $k=1,\dots,r$. In the existing literature, the bias has been
derived using an approximation for the estimated factors with a specific
data-dependent rotation matrix $H$ for the model with $\alpha_{k}=1$ for
all $k$, whereas we derive the bias for weak factor models. In addition, we
derive the bias using the approximation with a different rotation matrix
$H_q$, which generally has a smaller bias than with $H$. We also
derive the bias using our preferred approximation with a purely
signal-dependent rotation $H$, which is unique and can be regarded as the
population version of $H$ and $H_q$. Since this bias is
parametrically inestimable, we propose a split-panel jackknife bias correction,
and theory shows that it successfully reduces the bias. The extensive
finite-sample experiments suggest that the proposed bias correction works very
well, and the empirical application illustrates its usefulness in practice.
arXiv link: http://arxiv.org/abs/2509.02066v2
Interpretational errors with instrumental variables
observational settings and experiments subject to non-compliance. Under
canonical assumptions, IVs allow us to identify a so-called local average
treatment effect (LATE). The use of IVs is often accompanied by a pragmatic
decision to abandon the identification of the causal parameter that corresponds
to the original research question and target the LATE instead. This pragmatic
decision presents a potential source of error: an investigator mistakenly
interprets findings as if they had made inference on their original causal
parameter of interest. We conducted a systematic review and meta-analysis of
patterns of pragmatism and interpretational errors in the applied IV literature
published in leading journals of economics, political science, epidemiology,
and clinical medicine (n = 309 unique studies). We found that a large fraction
of studies targeted the LATE, although specific interest in this parameter was
rare. Of these studies, 61% contained claims that mistakenly suggested that
another parameter was targeted -- one whose value likely differs, and could
even have the opposite sign, from the parameter actually estimated. Our
findings suggest that the validity of conclusions drawn from IV applications is
often compromised by interpretational errors.
arXiv link: http://arxiv.org/abs/2509.02045v1
On the role of the design phase in a linear regression
researcher constructs a subsample that achieves a better balance in covariate
distributions between the treated and untreated units. In this paper, we study
the role of this preliminary phase in the context of linear regression,
offering a justification for its utility. To that end, we first formalize the
design phase as a process of estimand adjustment via selecting a subsample.
Then, we show that covariate balance of a subsample is indeed a justifiable
criterion for guiding the selection: it informs on the maximum degree of model
misspecification that can be allowed for a subsample, when a researcher wishes
to restrict the bias of the estimand for the parameter of interest within a
target level of precision. In this sense, the pursuit of a balanced subsample
in the design phase is interpreted as identifying an estimand that is less
susceptible to bias in the presence of model misspecification. Also, we
demonstrate that covariate imbalance can serve as a sensitivity measure in
regression analysis, and illustrate how it can structure a communication
between a researcher and the readers of her report.
arXiv link: http://arxiv.org/abs/2509.01861v1
Cohort-Anchored Robust Inference for Event-Study with Staggered Adoption
studies with staggered adoption, building on Rambachan and Roth (2023). Robust
inference based on event-study coefficients aggregated across cohorts can be
misleading due to the dynamic composition of treated cohorts, especially when
pre-trends differ across cohorts. My approach avoids this problem by operating
at the cohort-period level. To address the additional challenge posed by
time-varying control groups in modern DiD estimators, I introduce the concept
of block bias: the parallel-trends violation for a cohort relative to its fixed
initial control group. I show that the biases of these estimators can be
decomposed invertibly into block biases. Because block biases maintain a
consistent comparison across pre- and post-treatment periods, researchers can
impose transparent restrictions on them to conduct robust inference. In
simulations and a reanalysis of minimum-wage effects on teen employment, my
framework yields better-centered (and sometimes narrower) confidence sets than
the aggregated approach when pre-trends vary across cohorts. The framework is
most useful in settings with multiple cohorts, sufficient within-cohort
precision, and substantial cross-cohort heterogeneity.
arXiv link: http://arxiv.org/abs/2509.01829v2
Finite-Sample Non-Parametric Bounds with an Application to the Causal Effect of Workforce Gender Diversity on Firm Performance
assumptions but, in finite samples, assume that latent conditional expectations
are bounded by the sample's own extrema or that the population extrema are
known a priori -- often untrue in firm-level data with heavy tails. We develop
a finite-sample, concentration-driven band (concATE) that replaces that
assumption with a Dvoretzky--Kiefer--Wolfowitz tail bound, combines it with
delta-method variance, and allocates size via Bonferroni. The band extends to a
group-sequential design that controls the family-wise error when the first
“significant” diversity threshold is data-chosen. Applied to 945 listed firms
(2015 Q2--2022 Q1), concATE shows that senior-level gender diversity raises
Tobin's Q once representation exceeds approximately 30% in growth sectors and
approximately 65% in cyclical sectors.
arXiv link: http://arxiv.org/abs/2509.01622v1
Constrained Recursive Logit for Route Choice Analysis
choice modeling, but it suffers from a key limitation: it assigns nonzero
probabilities to all paths in the network, including those that are
unrealistic, such as routes exceeding travel time deadlines or violating energy
constraints. To address this gap, we propose a novel Constrained Recursive
Logit (CRL) model that explicitly incorporates feasibility constraints into the
RL framework. CRL retains the main advantages of RL-no path sampling and ease
of prediction-but systematically excludes infeasible paths from the universal
choice set. The model is inherently non-Markovian; to address this, we develop
a tractable estimation approach based on extending the state space, which
restores the Markov property and enables estimation using standard value
iteration methods. We prove that our estimation method admits a unique solution
under positive discrete costs and establish its equivalence to a multinomial
logit model defined over restricted universal path choice sets. Empirical
experiments on synthetic and real networks demonstrate that CRL improves
behavioral realism and estimation stability, particularly in cyclic networks.
arXiv link: http://arxiv.org/abs/2509.01595v1
On the Estimation of Multinomial Logit and Nested Logit Models: A Conic Optimization Approach
nested logit (NL), and tree-nested logit (TNL) models through the framework of
convex conic optimization. Traditional approaches typically solve the maximum
likelihood estimation (MLE) problem using gradient-based methods, which are
sensitive to step-size selection and initialization, and may therefore suffer
from slow or unstable convergence. In contrast, we propose a novel estimation
strategy that reformulates these models as conic optimization problems,
enabling more robust and reliable estimation procedures. Specifically, we show
that the MLE for MNL admits an equivalent exponential cone program (ECP). For
NL and TNL, we prove that when the dissimilarity (scale) parameters are fixed,
the estimation problem is convex and likewise reducible to an ECP. Leveraging
these results, we design a two-stage procedure: an outer loop that updates the
scale parameters and an inner loop that solves the ECP to update the utility
coefficients. The inner problems are handled by interior-point methods with
iteration counts that grow only logarithmically in the target accuracy, as
implemented in off-the-shelf solvers (e.g., MOSEK). Extensive experiments
across estimation instances of varying size show that our conic approach
attains better MLE solutions, greater robustness to initialization, and
substantial speedups compared to standard gradient-based MLE, particularly on
large-scale instances with high-dimensional specifications and large choice
sets. Our findings establish exponential cone programming as a practical and
scalable alternative for estimating a broad class of discrete choice models.
arXiv link: http://arxiv.org/abs/2509.01562v1
Using Aggregate Relational Data to Infer Social Networks
structures using Aggregate Relational Data (ARD), addressing the challenge of
limited detailed network data availability. By integrating ARD with variational
approximation methods, we provide a computationally efficient and
cost-effective solution for network analysis. Our methodology demonstrates the
potential of ARD to offer insightful approximations of network dynamics, as
evidenced by Monte Carlo Simulations. This paper not only showcases the utility
of ARD in social network inference but also opens avenues for future research
in enhancing estimation precision and exploring diverse network datasets.
Through this work, we contribute to the field of network analysis by offering
an alternative strategy for understanding complex social networks with
constrained data.
arXiv link: http://arxiv.org/abs/2509.01503v2
Handling Sparse Non-negative Data in Finance
regression for modeling count and other non-negative variables in finance and
economics, can be far from optimal when heteroskedasticity and sparsity -- two
common features of such data -- are both present. We propose a general class of
moment estimators, encompassing Poisson regression, that balances the
bias-variance trade-off under these conditions. A simple cross-validation
procedure selects the optimal estimator. Numerical simulations and applications
to corporate finance data reveal that the best choice varies substantially
across settings and often departs from Poisson regression, underscoring the
need for a more flexible estimation framework.
arXiv link: http://arxiv.org/abs/2509.01478v1
Bootstrap Diagnostic Tests
often yields unreliable statistical inference. This paper shows that the
bootstrap can detect such violations by delivering simple and powerful
diagnostic tests that (a) induce no pre-testing bias, (b) use the same critical
values across applications, and (c) are consistent against deviations from
asymptotic normality. The tests compare the conditional distribution of a
bootstrap statistic with the Gaussian limit implied by valid specification and
assess whether the resulting discrepancy is large enough to indicate failure of
the asymptotic Gaussian approximation. The method is computationally
straightforward and only requires a sample of i.i.d. draws of the bootstrap
statistic. We derive sufficient conditions for the randomness in the data to
mix with the randomness in the bootstrap repetitions in a way such that (a),
(b) and (c) above hold. We demonstrate the practical relevance and broad
applicability of bootstrap diagnostics by considering several scenarios where
the asymptotic Gaussian approximation may fail, including weak instruments,
non-stationarity, parameters on the boundary of the parameter space, infinite
variance data and singular Jacobian in applications of the delta method. An
illustration drawn from the empirical macroeconomic literature concludes.
arXiv link: http://arxiv.org/abs/2509.01351v2
Treatment effects at the margin: Everyone is marginal
policy simultaneously alters both the incentive to participate and the outcome
of interest -- such as hiring decisions and wages in response to employment
subsidies; or working decisions and wages in response to job trainings. This
framework was inspired by my PhD project on a Belgian reform that subsidised
first-time hiring, inducing entry by marginal firms yet meanwhile changing the
wages they pay. Standard methods addressing selection-into-treatment concepts
(like Heckman selection equations and local average treatment effects), or
before-after comparisons (including simple DiD or RDD), cannot isolate effects
at this shifting margin where treatment defines who is observed. I introduce
marginality-weighted estimands that recover causal effects among policy-induced
entrants, offering a policy-relevant alternative in settings with endogenous
selection. This method can thus be applied widely to understanding the economic
impacts of public programmes, especially in fields largely relying on
reduced-form causal inference estimation (e.g. labour economics, development
economics, health economics).
arXiv link: http://arxiv.org/abs/2508.21583v1
Triply Robust Panel Estimators
introduce a new estimator, the Triply RObust Panel (TROP) estimator, that
combines (i) a flexible model for the potential outcomes based on a low-rank
factor structure on top of a two-way-fixed effect specification, with (ii) unit
weights intended to upweight units similar to the treated units and (iii) time
weights intended to upweight time periods close to the treated time periods. We
study the performance of the estimator in a set of simulations designed to
closely match several commonly studied real data sets. We find that there is
substantial variation in the performance of the estimators across the settings
considered. The proposed estimator outperforms
two-way-fixed-effect/difference-in-differences, synthetic control, matrix
completion and synthetic-difference-in-differences estimators. We investigate
what features of the data generating process lead to this performance, and
assess the relative importance of the three components of the proposed
estimator. We have two recommendations. Our preferred strategy is that
researchers use simulations closely matched to the data they are interested in,
along the lines discussed in this paper, to investigate which estimators work
well in their particular setting. A simpler approach is to use more robust
estimators such as synthetic difference-in-differences or the new triply robust
panel estimator which we find to substantially outperform two-way fixed effect
estimators in many empirically relevant settings.
arXiv link: http://arxiv.org/abs/2508.21536v2
Uniform Quasi ML based inference for the panel AR(1) model
initial conditions and heteroskedasticity and possibly additional regressors
that are robust to the strength of identification. Specifically, we consider
several Maximum Likelihood based methods of constructing tests and confidence
sets (CSs) and show that (Quasi) LM tests and CSs that use the expected Hessian
rather than the observed Hessian of the log-likelihood have correct asymptotic
size (in a uniform sense). We derive the power envelope of a Fixed Effects
version of such a LM test for hypotheses involving the autoregressive parameter
when the average information matrix is estimated by a centered OPG estimator
and the model is only second-order identified, and show that it coincides with
the maximal attainable power curve in the worst case setting. We also study the
empirical size and power properties of these (Quasi) LM tests and CSs.
arXiv link: http://arxiv.org/abs/2508.20855v1
Time Series Embedding and Combination of Forecasts: A Reinforcement Learning Approach
literature, stressing the challenge of outperforming the simple average when
aggregating forecasts from diverse methods. This study proposes a Reinforcement
Learning - based framework as a dynamic model selection approach to address
this puzzle. Our framework is evaluated through extensive forecasting exercises
using simulated and real data. Specifically, we analyze the M4 Competition
dataset and the Survey of Professional Forecasters (SPF). This research
introduces an adaptable methodology for selecting and combining forecasts under
uncertainty, offering a promising advancement in resolving the forecasting
combination puzzle.
arXiv link: http://arxiv.org/abs/2508.20795v1
A further look at Modified ML estimation of the panel AR(1) model with fixed effects and arbitrary initial conditions
of Economic Studies, 2002) Modified ML estimator (MMLE) for the panel AR(1)
model with fixed effects and arbitrary initial conditions and possibly
covariates when the time dimension, T, is fixed. When the autoregressive
parameter rho=1, the limiting modified profile log-likelihood function for this
model has a stationary point of inflection, and rho is first-order
underidentified but second-order identified. We show that the generalized MMLEs
exist w.p.a.1. and are uniquely defined w.p.1. and consistent for any value of
|rho| =< 1. When rho=1, the rate of convergence of the MMLEs is N^{1/4}, where
N is the cross-sectional dimension of the panel. We then develop an asymptotic
theory for GMM estimators when one of the parameters is only second-order
identified and use this to derive the limiting distributions of the MMLEs. They
are generally asymmetric when rho=1. We also show that Quasi LM tests that are
based on the modified profile log-likelihood and use its expected rather than
observed Hessian, with an additional modification for rho=1, and confidence
regions based on inverting these tests have correct asymptotic size in a
uniform sense when |rho| =< 1. Finally, we investigate the finite sample
properties of the MMLEs and the QLM test in a Monte Carlo study.
arXiv link: http://arxiv.org/abs/2508.20753v1
Inference on Partially Identified Parameters with Separable Nuisance Parameters: a Two-Stage Method
parameters in moment inequality models with separable nuisance parameters. In
the first stage, the nuisance parameters are estimated separately, and in the
second stage, the identified set for the parameters of interest is constructed
using a refined chi-squared test with variance correction that accounts for the
first-stage estimation error. We establish the asymptotic validity of the
proposed method under mild conditions and characterize its finite-sample
properties. The method is broadly applicable to models where direct elimination
of nuisance parameters is difficult or introduces conservativeness. Its
practical performance is illustrated through an application: structural
estimation of entry and exit costs in the U.S. vehicle market based on Wollmann
(2018).
arXiv link: http://arxiv.org/abs/2508.19853v1
Is attention truly all we need? An empirical study of asset pricing in pretrained RNN sparse and global attention models
mainstream attention mechanisms such as additive attention, Luong's three
attentions, global self-attention (Self-att) and sliding window sparse
attention (Sparse-att) for the empirical asset pricing research on top 420
large-cap US stocks. This is the first paper on the large-scale
state-of-the-art (SOTA) attention mechanisms applied in the asset pricing
context. They overcome the limitations of the traditional machine learning (ML)
based asset pricing, such as mis-capturing the temporal dependency and short
memory. Moreover, the enforced causal masks in the attention mechanisms address
the future data leaking issue ignored by the more advanced attention-based
models, such as the classic Transformer. The proposed attention models also
consider the temporal sparsity characteristic of asset pricing data and
mitigate potential overfitting issues by deploying the simplified model
structures. This provides some insights for future empirical economic research.
All models are examined in three periods, which cover pre-COVID-19 (mild
uptrend), COVID-19 (steep uptrend with a large drawdown) and one year
post-COVID-19 (sideways movement with high fluctuations), for testing the
stability of these models under extreme market conditions. The study finds that
in value-weighted portfolio back testing, Model Self-att and Model Sparse-att
exhibit great capabilities in deriving the absolute returns and hedging
downside risks, while they achieve an annualized Sortino ratio of 2.0 and 1.80
respectively in the period with COVID-19. And Model Sparse-att performs more
stably than Model Self-att from the perspective of absolute portfolio returns
with respect to the size of stocks' market capitalization.
arXiv link: http://arxiv.org/abs/2508.19006v1
A bias test for heteroscedastic linear least-squares regression
variable, a mismeasured regressor, or simultaneity. A simple test to detect the
bias is proposed and explored in simulation and in real data sets.
arXiv link: http://arxiv.org/abs/2508.15969v1
Multivariate quantile regression
based on the multivariate distribution function, termed multivariate quantile
regression (MQR). In contrast to existing approaches--such as directional
quantiles, vector quantile regression, or copula-based methods--MQR defines
quantiles through the conditional probability structure of the joint
conditional distribution function. The method constructs multivariate quantile
curves using sequential univariate quantile regressions derived from
conditioning mechanisms, allowing for an intuitive interpretation and flexible
estimation of marginal effects. The paper develops theoretical foundations of
MQR, including asymptotic properties of the estimators. Through simulation
exercises, the estimator demonstrates robust finite sample performance across
different dependence structures. As an empirical application, the MQR framework
is applied to the analysis of exchange rate pass-through in Argentina from 2004
to 2024.
arXiv link: http://arxiv.org/abs/2508.15749v1
Effect Identification and Unit Categorization in the Multi-Score Regression Discontinuity Design with Application to LED Manufacturing
identifying and estimating causal effects at the cutoff of a single running
variable. In practice, however, decision-making often involves multiple
thresholds and criteria, especially in production systems. Standard MRD
(multi-score RDD) methods address this complexity by reducing the problem to a
one-dimensional design. This simplification allows existing approaches to be
used to identify and estimate causal effects, but it can introduce
non-compliance by misclassifying units relative to the original cutoff rules.
We develop theoretical tools to detect and reduce "fuzziness" when estimating
the cutoff effect for units that comply with individual subrules of a
multi-rule system. In particular, we propose a formal definition and
categorization of unit behavior types under multi-dimensional cutoff rules,
extending standard classifications of compliers, alwaystakers, and nevertakers,
and incorporating defiers and indecisive units. We further identify conditions
under which cutoff effects for compliers can be estimated in multiple
dimensions, and establish when identification remains valid after excluding
nevertakers and alwaystakers. In addition, we examine how decomposing complex
Boolean cutoff rules (such as AND- and OR-type rules) into simpler components
affects the classification of units into behavioral types and improves
estimation by making it possible to identify and remove non-compliant units
more accurately. We validate our framework using both semi-synthetic
simulations calibrated to production data and real-world data from
opto-electronic semiconductor manufacturing. The empirical results demonstrate
that our approach has practical value in refining production policies and
reduces estimation variance. This underscores the usefulness of the MRD
framework in manufacturing contexts.
arXiv link: http://arxiv.org/abs/2508.15692v2
Large-dimensional Factor Analysis with Weighted PCA
for large-dimensional factor analysis. While it is effective when the factors
are sufficiently strong, it can be inconsistent when the factors are weak
and/or the noise has complex dependence structure. We argue that the
inconsistency often stems from bias and introduce a general approach to restore
consistency. Specifically, we propose a general weighting scheme for PCA and
show that with a suitable choice of weighting matrices, it is possible to
deduce consistent and asymptotic normal estimators under much weaker conditions
than the usual PCA. While the optimal weight matrix may require knowledge about
the factors and covariance of the idiosyncratic noise that are not known a
priori, we develop an agnostic approach to adaptively choose from a large class
of weighting matrices that can be viewed as PCA for weighted linear
combinations of auto-covariances among the observations. Theoretical and
numerical results demonstrate the merits of our methodology over the usual PCA
and other recently developed techniques for large-dimensional approximate
factor models.
arXiv link: http://arxiv.org/abs/2508.15675v1
K-Means Panel Data Clustering in the Presence of Small Groups
behavior of least-squares estimators and information criterion for the number
of groups, allowing for the presence of small groups that have an
asymptotically negligible relative size. Our contributions are threefold.
First, we derive sufficient conditions under which the least-squares estimators
are consistent and asymptotically normal. One of the conditions implies that a
longer sample period is required as there are smaller groups. Second, we show
that information criteria for the number of groups proposed in earlier works
can be inconsistent or perform poorly in the presence of small groups. Third,
we propose modified information criteria (MIC) designed to perform well in the
presence of small groups. A Monte Carlo simulation confirms their good
performance in finite samples. An empirical application illustrates that
K-means clustering paired with the proposed MIC allows one to discover small
groups without producing too many groups. This enables characterizing small
groups and differentiating them from the other large groups in a parsimonious
group structure.
arXiv link: http://arxiv.org/abs/2508.15408v1
A Nonparametric Approach to Augmenting a Bayesian VAR with Nonlinear Factors
that are modeled nonparametrically using regression trees. There are four main
advantages of our model. First, modeling potential nonlinearities
nonparametrically lessens the risk of mis-specification. Second, the use of
factor methods ensures that departures from linearity are modeled
parsimoniously. In particular, they exhibit functional pooling where a small
number of nonlinear factors are used to model common nonlinearities across
variables. Third, Bayesian computation using MCMC is straightforward even in
very high dimensional models, allowing for efficient, equation by equation
estimation, thus avoiding computational bottlenecks that arise in popular
alternatives such as the time varying parameter VAR. Fourth, existing methods
for identifying structural economic shocks in linear factor models can be
adapted for the nonlinear case in a straightforward fashion using our model.
Exercises involving artificial and macroeconomic data illustrate the properties
of our model and its usefulness for forecasting and structural economic
analysis.
arXiv link: http://arxiv.org/abs/2508.13972v1
Partial Identification of Causal Effects for Endogenous Continuous Treatments
counterfactual outcomes, but such an assumption may not be plausible in
observational studies. Sensitivity analysis is often employed to assess the
robustness of causal conclusions to unmeasured confounding, but existing
methods are predominantly designed for binary treatments. In this paper, we
provide natural extensions of two extensively used sensitivity frameworks --
the Rosenbaum and Marginal sensitivity models -- to the setting of continuous
exposures. Our generalization replaces scalar sensitivity parameters with
sensitivity functions that vary with exposure level, enabling richer modeling
and sharper identification bounds. We develop a unified pseudo-outcome
regression formulation for bounding the counterfactual dose-response curve
under both models, and propose corresponding nonparametric estimators which
have second order bias. These estimators accommodate modern machine learning
methods for obtaining nuisance parameter estimators, which are shown to achieve
$L^2$- consistency, minimax rates of convergence under suitable conditions. Our
resulting estimators of bounds for the counterfactual dose-response curve are
shown to be consistent and asymptotic normal allowing for a user-specified
bound on the degree of uncontrolled exposure endogeneity. We also offer a
geometric interpretation that relates the Rosenbaum and Marginal sensitivity
model and guides their practical usage in global versus targeted sensitivity
analysis. The methods are validated through simulations and a real-data
application on the effect of second-hand smoke exposure on blood lead levels in
children.
arXiv link: http://arxiv.org/abs/2508.13946v1
Reasonable uncertainty: Confidence intervals in empirical Bayes discrimination detection
arising from both partial identification and sampling variability. While prior
work has mostly focused on partial identification, we find that some empirical
findings are not robust to sampling uncertainty. To better connect statistical
evidence to the magnitude of real-world discriminatory behavior, we propose a
counterfactual odds-ratio estimand with a attractive properties and
interpretation. Our analysis reveals the importance of careful attention to
uncertainty quantification and downstream goals in empirical Bayes analyses.
arXiv link: http://arxiv.org/abs/2508.13110v1
The purpose of an estimator is what it does: Misspecification, estimands, and over-identification
-- fundamentally changes what estimators estimate. Different estimators imply
different estimands rather than different efficiency for the same target. A
review of recent applications of generalized method of moments in the American
Economic Review suggests widespread acceptance of this fact: There is little
formal specification testing and widespread use of estimators that would be
inefficient were the model correct, including the use of "hand-selected"
moments and weighting matrices. Motivated by these observations, we review and
synthesize recent results on estimation under model misspecification, providing
guidelines for transparent and robust empirical research. We also provide a new
theoretical result, showing that Hansen's J-statistic measures, asymptotically,
the range of estimates achievable at a given standard error. Given the
widespread use of inefficient estimators and the resulting researcher degrees
of freedom, we thus particularly recommend the broader reporting of
J-statistics.
arXiv link: http://arxiv.org/abs/2508.13076v3
Estimation in linear models with clustered data
controls, and a complicated structure of exclusion restrictions. We propose a
correctly centered internal IV estimator that accommodates a variety of
exclusion restrictions and permits within-cluster dependence. The estimator has
a simple leave-out interpretation and remains computationally tractable. We
derive a central limit theorem for its quadratic form and propose a robust
variance estimator. We also develop inference methods that remain valid under
weak identification. Our framework extends classical dynamic panel methods to
more general clustered settings. An empirical application of a large-scale
fiscal intervention in rural Kenya with spatial interference illustrates the
approach.
arXiv link: http://arxiv.org/abs/2508.12860v1
Bivariate Distribution Regression; Theory, Estimation and an Application to Intergenerational Mobility
two outcome variables conditional on chosen covariates. While Bivariate
Distribution Regression (BDR) is useful in a variety of settings, it is
particularly valuable when some dependence between the outcomes persists after
accounting for the impact of the covariates. Our analysis relies on a result
from Chernozhukov et al. (2018) which shows that any conditional joint
distribution has a local Gaussian representation. We describe how BDR can be
implemented and present some associated functionals of interest. As modeling
the unexplained dependence is a key feature of BDR, we focus on functionals
related to this dependence. We decompose the difference between the joint
distributions for different groups into composition, marginal and sorting
effects. We provide a similar decomposition for the transition matrices which
describe how location in the distribution in one of the outcomes is associated
with location in the other. Our theoretical contributions are the derivation of
the properties of these estimated functionals and appropriate procedures for
inference. Our empirical illustration focuses on intergenerational mobility.
Using the Panel Survey of Income Dynamics data, we model the joint distribution
of parents' and children's earnings. By comparing the observed distribution
with constructed counterfactuals, we isolate the impact of observable and
unobservable factors on the observed joint distribution. We also evaluate the
forces responsible for the difference between the transition matrices of sons'
and daughters'.
arXiv link: http://arxiv.org/abs/2508.12716v1
Bayesian Double Machine Learning for Causal Inference
inference in partially linear models with high-dimensional control variables.
Off-the-shelf machine learning methods can introduce biases in the causal
parameter known as regularization-induced confounding. To address this, we
propose a Bayesian Double Machine Learning (BDML) method, which modifies a
standard Bayesian multivariate regression model and recovers the causal effect
of interest from the reduced-form covariance matrix. Our BDML is related to the
burgeoning frequentist literature on DML while addressing its limitations in
finite-sample inference. Moreover, the BDML is based on a fully generative
probability model in the DML context, adhering to the likelihood principle. We
show that in high dimensional setups the naive estimator implicitly assumes no
selection on observables--unlike our BDML. The BDML exhibits lower asymptotic
bias and achieves asymptotic normality and semiparametric efficiency as
established by a Bernstein-von Mises theorem, thereby ensuring robustness to
misspecification. In simulations, our BDML achieves lower RMSE, better
frequentist coverage, and shorter confidence interval width than alternatives
from the literature, both Bayesian and frequentist.
arXiv link: http://arxiv.org/abs/2508.12688v1
Reconstructing Subnational Labor Indicators in Colombia: An Integrated Machine and Deep Learning Approach
monthly and annual labor indicators for all 33 Colombian departments from 1993
to 2025. The approach integrates temporal disaggregation, time-series splicing
and interpolation, statistical learning, and institutional covariates to
estimate seven key variables: employment, unemployment, labor force
participation (PEA), inactivity, working-age population (PET), total
population, and informality rate, including in regions without direct survey
coverage. The framework enforces labor accounting identities, scales results to
demographic projections, and aligns all estimates with national benchmarks to
ensure internal coherence. Validation against official departmental GEIH
aggregates and city-level informality data for the 23 metropolitan areas yields
in-sample Mean Absolute Percentage Errors (MAPEs) below 2.3% across indicators,
confirming strong predictive performance. To our knowledge, this is the first
dataset to provide spatially exhaustive and temporally consistent monthly labor
measures for Colombia. By incorporating both quantitative and qualitative
dimensions of employment, the panel enhances the empirical foundation for
analysing long-term labor market dynamics, identifying regional disparities,
and designing targeted policy interventions.
arXiv link: http://arxiv.org/abs/2508.12514v2
A statistician's guide to weak-instrument-robust inference in instrumental variables regression with illustrations in Python
weak-instrument-robust inference in instrumental variables regression. Methods
are implemented in the ivmodels software package for Python, which we use to
illustrate results.
arXiv link: http://arxiv.org/abs/2508.12474v1
The Identification Power of Combining Experimental and Observational Data for Distributional Treatment Effect Parameters
experimental data, in which treatment is randomized, with observational data,
in which treatment is self-selected, for distributional treatment effect (DTE)
parameters. While experimental data identify average treatment effects, many
DTE parameters, such as the distribution of individual treatment effects, are
only partially identified. We examine whether and how combining these two data
sources tightens the identified set for such parameters. For broad classes of
DTE parameters, we derive nonparametric sharp bounds under the combined data
and clarify the mechanism through which data combination improves
identification relative to using experimental data alone. Our analysis
highlights that self-selection in observational data is a key source of
identification power. We establish necessary and sufficient conditions under
which the combined data shrink the identified set, showing that such shrinkage
generally occurs unless selection-on-observables holds in the observational
data. We also propose a linear programming approach to compute sharp bounds
that can incorporate additional structural restrictions, such as positive
dependence between potential outcomes and the generalized Roy model. An
empirical application using data on negative campaign advertisements in the
2008 U.S. presidential election illustrates the practical relevance of the
proposed approach.
arXiv link: http://arxiv.org/abs/2508.12206v3
A note on simulation methods for the Dirichlet-Laplace prior
110(512): 1479-1490) introduce a novel prior, the Dirichlet-Laplace (DL) prior,
and propose a Markov chain Monte Carlo (MCMC) method to simulate posterior
draws under this prior in a conditionally Gaussian setting. The original
algorithm samples from conditional distributions in the wrong order, i.e., it
does not correctly sample from the joint posterior distribution of all latent
variables. This note details the issue and provides two simple solutions: A
correction to the original algorithm and a new algorithm based on an
alternative, yet equivalent, formulation of the prior. This corrigendum does
not affect the theoretical results in Bhattacharya et al. (2015).
arXiv link: http://arxiv.org/abs/2508.11982v1
Approximate Factor Model with S-vine Copula Structure
S-vine copula structure to capture complex dependencies among common factors.
Our estimation procedure proceeds in two steps: first, we apply principal
component analysis (PCA) to extract the factors; second, we employ maximum
likelihood estimation that combines kernel density estimation for the margins
with an S-vine copula to model the dependence structure. Jointly fitting the
S-vine copula with the margins yields an oblique factor rotation without
resorting to ad hoc restrictions or traditional projection pursuit methods. Our
theoretical contributions include establishing the consistency of the rotation
and copula parameter estimators, developing asymptotic theory for the
factor-projected empirical process under dependent data, and proving the
uniform consistency of the projected entropy estimators. Simulation studies
demonstrate convergence with respect to both the dimensionality and the sample
size. We further assess model performance through Value-at-Risk (VaR)
estimation via Monte Carlo methods and apply our methodology to the daily
returns of S&P 500 Index constituents to forecast the VaR of S&P 500 index.
arXiv link: http://arxiv.org/abs/2508.11619v1
Binary choice logit models with general fixed effects for panel and network data
binary choice logit models with fixed effects in panel and network data
settings. We examine both static and dynamic models with general fixed-effect
structures, including individual effects, time trends, and two-way or dyadic
effects. A key challenge is the incidental parameter problem, which arises from
the increasing number of fixed effects as the sample size grows. We explore two
main strategies for eliminating nuisance parameters: conditional likelihood
methods, which remove fixed effects by conditioning on sufficient statistics,
and moment-based methods, which derive fixed-effect-free moment conditions. We
demonstrate how these approaches apply to a variety of models, summarizing key
findings from the literature while also presenting new examples and new
results.
arXiv link: http://arxiv.org/abs/2508.11556v1
Stealing Accuracy: Predicting Day-ahead Electricity Prices with Temporal Hierarchy Forecasting (THieF)
predicting day-ahead electricity prices and show that reconciling forecasts for
hourly products, 2- to 12-hour blocks, and baseload contracts significantly (up
to 13%) improves accuracy at all levels. These results remain consistent
throughout a challenging 4-year test period (2021-2024) in the German power
market and across model architectures, including linear regression, a shallow
neural network, gradient boosting, and a state-of-the-art transformer. Given
that (i) trading of block products is becoming more common and (ii) the
computational cost of reconciliation is comparable to that of predicting hourly
prices alone, we recommend using it in daily forecasting practice.
arXiv link: http://arxiv.org/abs/2508.11372v1
Factor Models of Matrix-Valued Time Series: Nonstationarity and Cointegration
common stochastic trends. Unlike the traditional factor analysis which flattens
matrix observations into vectors, we adopt a matrix factor model in order to
fully explore the intrinsic matrix structure in the data, allowing interaction
between the row and column stochastic trends, and subsequently improving the
estimation convergence. It also reduces the computation complexity in
estimation. The main estimation methodology is built on the eigenanalysis of
sample row and column covariance matrices when the nonstationary matrix factors
are of full rank and the idiosyncratic components are temporally stationary,
and is further extended to tackle a more flexible setting when the matrix
factors are cointegrated and the idiosyncratic components may be nonstationary.
Under some mild conditions which allow the existence of weak factors, we derive
the convergence theory for the estimated factor loading matrices and
nonstationary factor matrices. In particular, the developed methodology and
theory are applicable to the general case of heterogeneous strengths over weak
factors. An easy-to-implement ratio criterion is adopted to consistently
estimate the size of latent factor matrix. Both simulation and empirical
studies are conducted to examine the numerical performance of the developed
model and methodology in finite samples.
arXiv link: http://arxiv.org/abs/2508.11358v2
Higher-order Gini indices: An axiomatic approach
deviation, defined as the expected range over n independent draws from a
distribution, to quantify joint dispersion across multiple observations. This
family extends the classical Gini deviation, which relies solely on pairwise
comparisons. The normalized version is called a high-order Gini coefficient.
The generalized indices grow increasingly sensitive to tail inequality as n
increases, offering a more nuanced view of distributional extremes. The
higher-order Gini deviations admit a Choquet integral representation,
inheriting the desirable properties of coherent deviation measures.
Furthermore, we show that both the n-th order Gini deviation and the n-th order
Gini coefficient are statistically n-observation elicitable, allowing for
direct computation through empirical risk minimization. Data analysis using
World Inequality Database data reveals that higher-order Gini coefficients
capture disparities that the classical Gini coefficient may fail to reflect,
particularly in cases of extreme income or wealth concentration.
arXiv link: http://arxiv.org/abs/2508.10663v2
On the implications of proportional hazards assumptions for competing risks modelling
made in empirical research and extensive research has been done to develop
tests of its validity. This paper does not contribute on this end. Instead, it
gives new insights on the implications of proportional hazards (PH) modelling
in competing risks models. It is shown that the use of a PH model for the
cause-specific hazards or subdistribution hazards can strongly restrict the
class of copulas and marginal hazards for being compatible with a competing
risks model. The empirical researcher should be aware that working with these
models can be so restrictive that only degenerate or independent risks models
are compatible. Numerical results confirm that estimates of cause-specific
hazards models are not informative about patterns in the data generating
process.
arXiv link: http://arxiv.org/abs/2508.10577v1
Heterogeneity in Women's Nighttime Ride-Hailing Intention: Evidence from an LC-ICLV Model Analysis
convenience, persistent nighttime safety concerns significantly reduce women's
willingness to use them. Existing research often treats women as a homogeneous
group, neglecting the heterogeneity in their decision-making processes. To
address this gap, this study develops the Latent Class Integrated Choice and
Latent Variable (LC-ICLV) model with a mixed Logit kernel, combined with an
ordered Probit model for attitudinal indicators, to capture unobserved
heterogeneity in women's nighttime ride-hailing decisions. Based on panel data
from 543 respondents across 29 provinces in China, the analysis identifies two
distinct female subgroups. The first, labeled the "Attribute-Sensitive Group",
consists mainly of young women and students from first- and second-tier cities.
Their choices are primarily influenced by observable service attributes such as
price and waiting time, but they exhibit reduced usage intention when matched
with female drivers, possibly reflecting deeper safety heuristics. The second,
the "Perception-Sensitive Group", includes older working women and residents of
less urbanized areas. Their decisions are shaped by perceived risk and safety
concerns; notably, high-frequency use or essential nighttime commuting needs
may reinforce rather than alleviate avoidance behaviors. The findings
underscore the need for differentiated strategies: platforms should tailor
safety features and user interfaces by subgroup, policymakers must develop
targeted interventions, and female users can benefit from more personalized
risk mitigation strategies. This study offers empirical evidence to advance
gender-responsive mobility policy and improve the inclusivity of ride-hailing
services in urban nighttime contexts.
arXiv link: http://arxiv.org/abs/2508.10951v1
Two-Way Mean Group Estimators for Heterogeneous Panel Models with Fixed T
fixed effects and interactive fixed effects in a fixed T framework. We propose
a two-way mean group (TW-MG) estimator for the expected value of the slope
coefficient and propose a leave-one-out jackknife method for valid inference.
We also consider a pooled estimator and provide a Hausman-type test for
poolability. Simulations demonstrate the excellent performance of our
estimators and inference methods in finite samples. We apply our new methods to
two datasets to examine the relationship between health-care expenditure and
income, and estimate a production function.
arXiv link: http://arxiv.org/abs/2508.10302v1
Machine Learning for Detecting Collusion and Capacity Withholding in Wholesale Electricity Markets
important mechanisms of market manipulation. This study applies a refined
machine learning-based cartel detection algorithm to two cartel cases in the
Italian electricity market and evaluates its out-of-sample performance.
Specifically, we consider an ensemble machine learning method that uses
statistical screens constructed from the offer price distribution as predictors
for the incidence of collusion among electricity providers in specific regions.
We propose novel screens related to the capacity-withholding behavior of
electricity providers and find that including such screens derived from the
day-ahead spot market as predictors can improve cartel detection. We find that,
under complete cartels - where collusion in a tender presumably involves all
suppliers - the method correctly classifies up to roughly 95% of tenders in our
data as collusive or competitive, improving classification accuracy compared to
using only previously available screens. However, when trained on larger
datasets including non-cartel members and applying algorithms tailored to
detect incomplete cartels, the previously existing screens are sufficient to
achieve 98% accuracy, and the addition of our newly proposed
capacity-withholding screens does not further improve performance. Overall,
this study highlights the promising potential of supervised machine learning
techniques for detecting and dismantling cartels in electricity markets.
arXiv link: http://arxiv.org/abs/2508.09885v1
Approximate Sparsity Class and Minimax Estimation
this project we consider a new class of functions that we call the approximate
sparsity class. This new class is characterized by the rate of decay of the
individual Fourier coefficients for a given orthonormal basis. We establish the
$L^2([0,1],\mu)$ metric entropy of such class, with which we show the minimax
rate of convergence. For the density subset in this class, we propose an
adaptive density estimator based on a hard-thresholding procedure that achieves
this minimax rate up to a $\log$ term.
arXiv link: http://arxiv.org/abs/2508.09278v1
Bias correction for Chatterjee's graph-based correlation coefficient
(NN) graph-based correlation coefficient that consistently detects both
independence and functional dependence. Specifically, it approximates a measure
of dependence that equals 0 if and only if the variables are independent, and 1
if and only if they are functionally dependent. However, this NN estimator
includes a bias term that may vanish at a rate slower than root-$n$, preventing
root-$n$ consistency in general. In this article, we propose a bias correction
approach that overcomes this limitation, yielding an NN-based estimator that is
both root-$n$ consistent and asymptotically normal.
arXiv link: http://arxiv.org/abs/2508.09040v1
Amazon Ads Multi-Touch Attribution
measure how each touchpoint across the marketing funnel contributes to a
conversion. This gives advertisers a more comprehensive view of their Amazon
Ads performance across objectives when multiple ads influence shopping
decisions. Amazon MTA uses a combination of randomized controlled trials (RCTs)
and machine learning (ML) models to allocate credit for Amazon conversions
across Amazon Ads touchpoints in proportion to their value, i.e., their likely
contribution to shopping decisions. ML models trained purely on observational
data are easy to scale and can yield precise predictions, but the models might
produce biased estimates of ad effects. RCTs yield unbiased ad effects but can
be noisy. Our MTA methodology combines experiments, ML models, and Amazon's
shopping signals in a thoughtful manner to inform attribution credit
allocation.
arXiv link: http://arxiv.org/abs/2508.08209v1
Treatment-Effect Estimation in Complex Designs under a Parallel-trends Assumption
panel data, in complex designs where the treatment may not be binary and may
not be absorbing. We first show that under no-anticipation and parallel-trends
assumptions, we can identify event-study effects comparing outcomes under the
actual treatment path and under the status-quo path where all units would have
kept their period-one treatment throughout the panel. Those effects can be
helpful to evaluate ex-post the policies that effectively took place, and once
properly normalized they estimate weighted averages of marginal effects of the
current and lagged treatments on the outcome. Yet, they may still be hard to
interpret, and they cannot be used to evaluate the effects of other policies
than the ones that were conducted. To make progress, we impose another
restriction, namely a random coefficients distributed-lag linear model, where
effects remain constant over time. Under this model, the usual distributed-lag
two-way-fixed-effects regression may be misleading. Instead, we show that this
random coefficients model can be estimated simply. We illustrate our findings
by revisiting Gentzkow, Shapiro and Sinkinson (2011).
arXiv link: http://arxiv.org/abs/2508.07808v2
Conceptual winsorizing: An application to the social cost of carbon
clear outliers, the result of poorly constrained models. Percentile winsorizing
is an option, but I here propose conceptual winsorizing: The social cost of
carbon is either a willingness to pay, which cannot exceed the ability to pay,
or a proposed carbon tax, which cannot raise more revenue than all other taxes
combined. Conceptual winsorizing successfully removes high outliers. It
slackens as economies decarbonize, slowly without climate policy, faster with.
arXiv link: http://arxiv.org/abs/2508.07384v2
Returns and Order Flow Imbalances: Intraday Dynamics and Macroeconomic News Effects
500 E-mini futures market using a structural VAR model identified through
heteroskedasticity. The model is estimated at one-second frequency for each
15-minute interval, capturing both intraday variation and endogeneity due to
time aggregation. We find that macroeconomic news announcements sharply reshape
price-flow dynamics: price impact rises, flow impact declines, return
volatility spikes, and flow volatility falls. Pooling across days, both price
and flow impacts are significant at the one-second horizon, with estimates
broadly consistent with stylized limit-order-book predictions. Impulse
responses indicate that shocks dissipate almost entirely within a second.
Structural parameters and volatilities also exhibit pronounced intraday
variation tied to liquidity, trading intensity, and spreads. These results
provide new evidence on high-frequency price formation and liquidity,
highlighting the role of public information and order submission in shaping
market quality.
arXiv link: http://arxiv.org/abs/2508.06788v4
Causal Mediation in Natural Experiments
settings for estimating causal effects with a compelling argument for treatment
randomisation, but give little indication of the mechanisms behind causal
effects. Causal Mediation (CM) is a framework for sufficiently identifying a
mechanism behind the treatment effect, decomposing it into an indirect effect
channel through a mediator mechanism and a remaining direct effect. By
contrast, a suggestive analysis of mechanisms gives necessary but not
sufficient evidence. Conventional CM methods require that the relevant mediator
mechanism is as-good-as-randomly assigned; when people choose the mediator
based on costs and benefits (whether to visit a doctor, to attend university,
etc.), this assumption fails and conventional CM analyses are at risk of bias.
I propose an alternative strategy that delivers unbiased estimates of CM
effects despite unobserved selection, using instrumental variation in mediator
take-up costs. The method identifies CM effects via the marginal effect of the
mediator, with parametric or semi-parametric estimation that is simple to
implement in two stages. Applying these methods to the Oregon Health Insurance
Experiment reveals a substantial portion of the Medicaid lottery's effect on
subjective health and well-being flows through increased healthcare usage -- an
effect that a conventional CM analysis would mistake. This approach gives
applied researchers an alternative method to estimate CM effects when an
initial treatment is quasi-randomly assigned, but a mediator mechanism is not,
as is common in natural experiments.
arXiv link: http://arxiv.org/abs/2508.05449v2
Weak Identification in Peer Effects Estimation
individuals' smoking habits often correlate with those of their peers. Such
correlations can have a variety of explanations, such as direct contagion or
shared socioeconomic circumstances. The network linear-in-means model is a
workhorse statistical model which incorporates these peer effects by including
average neighborhood characteristics as regressors. Although the model's
parameters are identifiable under mild structural conditions on the network, it
remains unclear whether identification ensures reliable estimation in the
"infill" asymptotic setting, where a single network grows in size. We show that
when covariates are i.i.d. and the average network degree of nodes increases
with the population size, standard estimators suffer from bias or slow
convergence rates due to asymptotic collinearity induced by network averaging.
As an alternative, we demonstrate that linear-in-sums models, which are based
on aggregate rather than average neighborhood characteristics, do not exhibit
such issues as long as the network degrees have some nontrivial variation, a
condition satisfied by most network models.
arXiv link: http://arxiv.org/abs/2508.04897v1
Assessing Dynamic Connectedness in Global Supply Chain Infrastructure Portfolios: The Impact of Risk Factors and Extreme Events
infrastructure: the energy market, investor sentiment, and global shipping
costs. It presents portfolio strategies associated with dynamic risks. A
time-varying parameter vector autoregression (TVP-VAR) model is used to study
the spillover and interconnectedness of the risk factors for global supply
chain infrastructure portfolios from January 5th, 2010, to June 29th, 2023,
which are associated with a set of environmental, social, and governance (ESG)
indexes. The effects of extreme events on risk spillovers and investment
strategy are calculated and compared before and after the COVID-19 outbreak.
The results of this study demonstrate that risk shocks influence the dynamic
connectedness between global supply chain infrastructure portfolios and three
risk factors and show the effects of extreme events on risk spillovers and
investment outcomes. Portfolios with higher ESG scores exhibit stronger dynamic
connectedness with other portfolios and factors. Net total directional
connectedness indicates that West Texas Intermediate (WTI), Baltic Exchange Dry
Index (BDI), and investor sentiment volatility index (VIX) consistently are net
receivers of spillover shocks. A portfolio with a ticker GLFOX appears to be a
time-varying net receiver and giver. The pairwise connectedness shows that WTI
and VIX are mostly net receivers. Portfolios with tickers CSUAX, GII, and FGIAX
are mostly net givers of spillover shocks. The COVID-19 outbreak changed the
structure of dynamic connectedness on portfolios. The mean value of HR and HE
indicates that the weights of long/short positions in investment strategy after
the COVID-19 outbreak have undergone structural changes compared to the period
before. The hedging ability of global supply chain infrastructure investment
portfolios with higher ESG scores is superior.
arXiv link: http://arxiv.org/abs/2508.04858v1
High-Dimensional Matrix-Variate Diffusion Index Models for Time Series Forecasting
predictors are high-dimensional matrix-valued time series. We apply an
$\alpha$-PCA method to extract low-dimensional matrix factors and build a
bilinear regression linking future outcomes to these factors, estimated via
iterative least squares. To handle weak factor structures, we introduce a
supervised screening step to select informative rows and columns. Theoretical
properties, including consistency and asymptotic normality, are established.
Simulations and real data show that our method significantly improves forecast
accuracy, with the screening procedure providing additional gains over standard
benchmarks in out-of-sample mean squared forecast error.
arXiv link: http://arxiv.org/abs/2508.04259v1
The Regression Discontinuity Design in Medical Science
design, and its application to empirical research in the medical sciences.
While the main focus of this article is on causal interpretation, key concepts
of estimation and inference are also briefly mentioned. A running medical
empirical example is provided.
arXiv link: http://arxiv.org/abs/2508.03878v1
Structural Extrapolation in Regression Discontinuity Designs with an Application to School Expenditure Referenda
from the cutoff in regression discontinuity designs (RDDs). Our focus is on
applications that exploit closely contested school district referenda to
estimate the effects of changes in education spending on local economic
outcomes. We embed these outcomes in a spatial equilibrium model of local
jurisdictions in which fiscal policy is determined by majority rule voting.
This integration provides a microfoundation for the running variable, the share
of voters who approve a ballot initiative, and enables identification of
structural parameters using RDD coefficients. We then leverage the model to
simulate the effects of counterfactual referenda over a broad range of proposed
spending changes. These scenarios imply realizations of the running variable
away from the threshold, allowing extrapolation of RDD estimates to nonmarginal
referenda. Applying the method to school expenditure ballot measures in
Wisconsin, we document substantial heterogeneity in housing price
capitalization across the approval margin.
arXiv link: http://arxiv.org/abs/2508.02658v1
Estimating Causal Effects with Observational Data: Guidelines for Agricultural and Applied Economists
nature, i.e., how one or more variables (e.g., policies, prices, the weather)
affect one or more other variables (e.g., income, crop yields, pollution). Only
some of these research questions can be studied experimentally. Most empirical
studies in agricultural and applied economics thus rely on observational data.
However, estimating causal effects with observational data requires appropriate
research designs and a transparent discussion of all identifying assumptions,
together with empirical evidence to assess the probability that they hold. This
paper provides an overview of various approaches that are frequently used in
agricultural and applied economics to estimate causal effects with
observational data. It then provides advice and guidelines for agricultural and
applied economists who are intending to estimate causal effects with
observational data, e.g., how to assess and discuss the chosen identification
strategies in their publications.
arXiv link: http://arxiv.org/abs/2508.02310v1
A difference-in-differences estimator by covariate balancing propensity score
treatment effects on the treated (ATT) in a difference-in-differences (DID)
research design when panel data are available. We show that the proposed
covariate balancing propensity score (CBPS) DID estimator possesses several
desirable properties: (i) local efficiency, (ii) double robustness in terms of
consistency, (iii) double robustness in terms of inference, and (iv) faster
convergence to the ATT compared to the augmented inverse probability weighting
(AIPW) DID estimators when both working models are locally misspecified. These
latter two characteristics set the CBPS DID estimator apart from the AIPW DID
estimator theoretically. Simulation studies and an empirical study demonstrate
the desirable finite sample performance of the proposed estimator.
arXiv link: http://arxiv.org/abs/2508.02097v1
A Relaxation Approach to Synthetic Control
counterfactual of a treated unit based on data from control units in a donor
pool. Allowing the donor pool contains more control units than time periods, we
propose a novel machine learning algorithm, named SCM-relaxation, for
counterfactual prediction. Our relaxation approach minimizes an
information-theoretic measure of the weights subject to a set of relaxed linear
inequality constraints in addition to the simplex constraint. When the donor
pool exhibits a group structure, SCM-relaxation approximates the equal weights
within each group to diversify the prediction risk. Asymptotically, the
proposed estimator achieves oracle performance in terms of out-of-sample
prediction accuracy. We demonstrate our method by Monte Carlo simulations and
by an empirical application that assesses the economic impact of Brexit on the
United Kingdom's real GDP.
arXiv link: http://arxiv.org/abs/2508.01793v1
Bayesian Smoothed Quantile Regression
distribution (ALD) has two fundamental limitations: its posterior mean yields
biased quantile estimates, and the non-differentiable check loss precludes
gradient-based MCMC methods. We propose Bayesian smoothed quantile regression
(BSQR), a principled reformulation that constructs a novel, continuously
differentiable likelihood from a kernel-smoothed check loss, simultaneously
ensuring a consistent posterior by aligning the inferential target with the
smoothed objective and enabling efficient Hamiltonian Monte Carlo (HMC)
sampling. Our theoretical analysis establishes posterior propriety for various
priors and examines the impact of kernel choice. Simulations show BSQR reduces
predictive check loss by up to 50% at extreme quantiles over ALD-based methods
and improves MCMC efficiency by 20-40% in effective sample size. An application
to financial risk during the COVID-19 era demonstrates superior tail risk
modeling. The BSQR framework offers a theoretically grounded, computationally
efficient solution to longstanding challenges in BQR, with uniform and
triangular kernels emerging as highly effective.
arXiv link: http://arxiv.org/abs/2508.01738v3
Multi-Band Variable-Lag Granger Causality: A Unified Framework for Causal Time Series Inference across Frequencies
domains, including neuroscience, economics, and behavioral science. Granger
causality is one of the well-known techniques for inferring causality in time
series. Typically, Granger causality frameworks have a strong fix-lag
assumption between cause and effect, which is often unrealistic in complex
systems. While recent work on variable-lag Granger causality (VLGC) addresses
this limitation by allowing a cause to influence an effect with different time
lags at each time point, it fails to account for the fact that causal
interactions may vary not only in time delay but also across frequency bands.
For example, in brain signals, alpha-band activity may influence another region
with a shorter delay than slower delta-band oscillations. In this work, we
formalize Multi-Band Variable-Lag Granger Causality (MB-VLGC) and propose a
novel framework that generalizes traditional VLGC by explicitly modeling
frequency-dependent causal delays. We provide a formal definition of MB-VLGC,
demonstrate its theoretical soundness, and propose an efficient inference
pipeline. Extensive experiments across multiple domains demonstrate that our
framework significantly outperforms existing methods on both synthetic and
real-world datasets, confirming its broad applicability to any type of time
series data. Code and datasets are publicly available.
arXiv link: http://arxiv.org/abs/2508.00658v1
Robust Econometrics for Growth-at-Risk
econometric literature, yet current approaches implicitly assume a constant
Pareto exponent. We introduce novel and robust econometrics to estimate the
tails of GaR based on a rigorous theoretical framework and establish validity
and effectiveness. Simulations demonstrate consistent outperformance relative
to existing alternatives in terms of predictive accuracy. We perform a
long-term GaR analysis that provides accurate and insightful predictions,
effectively capturing financial anomalies better than current methods.
arXiv link: http://arxiv.org/abs/2508.00263v1
Relative Bias Under Imperfect Identification in Observational Causal Inference
on certain identifying assumptions. In practice, these assumptions are unlikely
to hold exactly. This paper considers the bias of selection-on-observables,
instrumental variables, and proximal inference estimates under violations of
their identifying assumptions. We develop bias expressions for IV and proximal
inference that show how violations of their respective assumptions are
amplified by any unmeasured confounding in the outcome variable. We propose a
set of sensitivity tools that quantify the sensitivity of different
identification strategies, and an augmented bias contour plot visualizes the
relationship between these strategies. We argue that the act of choosing an
identification strategy implicitly expresses a belief about the degree of
violations that must be present in alternative identification strategies. Even
when researchers intend to conduct an IV or proximal analysis, a sensitivity
analysis comparing different identification strategies can help to better
understand the implications of each set of assumptions. Throughout, we compare
the different approaches on a re-analysis of the impact of state surveillance
on the incidence of protest in Communist Poland.
arXiv link: http://arxiv.org/abs/2507.23743v1
Inference on Common Trends in a Cointegrated Nonlinear SVAR
stochastic trends when data is generated by a cointegrated CKSVAR (a
two-regime, piecewise-linear SVAR; Mavroeidis, 2021), using a modified version
of the Breitung (2002) multivariate variance ratio test that is robust to the
presence of nonlinear cointegration (of a known form). To derive the
asymptotics of our test statistic, we prove a fundamental LLN-type result for a
class of stable but nonstationary autoregressive processes, using a novel dual
linear process approximation. We show that our modified test yields correct
inferences regarding the number of common trends in such a system, whereas the
unmodified test tends to infer a higher number of common trends than are
actually present, when cointegrating relations are nonlinear.
arXiv link: http://arxiv.org/abs/2507.22869v1
Generalized Optimal Transport
estimated by computing the value of an optimization program over all
distributions consistent with the model and the data. Existing tools apply when
the data is discrete, or when only disjoint marginals of the distribution are
identified, which is restrictive in many applications. We develop a general
framework that yields sharp bounds on a linear functional of the unknown true
distribution under i) an arbitrary collection of identified joint
subdistributions and ii) structural conditions, such as (conditional)
independence. We encode the identification restrictions as a continuous
collection of moments of characteristic kernels, and use duality and
approximation theory to rewrite the infinite-dimensional program over Borel
measures as a finite-dimensional program that is simple to compute. Our
approach yields a consistent estimator that is $n$-uniformly valid for
the sharp bounds. In the special case of empirical optimal transport with
Lipschitz cost, where the minimax rate is $n^{2/d}$, our method yields a
uniformly consistent estimator with an asymmetric rate, converging at
$n$ uniformly from one side.
arXiv link: http://arxiv.org/abs/2507.22422v1
Dimension Reduction for Conditional Density Estimation with Applications to High-Dimensional Causal Inference
conditional density estimation in high-dimensional settings that achieves
dimension reduction without imposing restrictive distributional or functional
form assumptions. To uncover the underlying sparsity structure of the data, we
develop an innovative conditional dependence measure and a modified
cross-validation procedure that enables data-driven variable selection, thereby
circumventing the need for subjective threshold selection. We demonstrate the
practical utility of our dimension-reduced conditional density estimation by
applying it to doubly robust estimators for average treatment effects. Notably,
our proposed procedure is able to select relevant variables for nonparametric
propensity score estimation and also inherently reduce the dimensionality of
outcome regressions through a refined ignorability condition. We evaluate the
finite-sample properties of our approach through comprehensive simulation
studies and an empirical study on the effects of 401(k) eligibility on savings
using SIPP data.
arXiv link: http://arxiv.org/abs/2507.22312v2
Testing for multiple change-points in macroeconometrics: an empirical guide and recent developments
change-points in time series models with exogenous and endogenous regressors,
panel data models, and factor models. This review differs from others in
multiple ways: (1) it focuses on inference about the change-points in slope
parameters, rather than in the mean of the dependent variable - the latter
being common in the statistical literature; (2) it focuses on detecting - via
sequential testing and other methods - multiple change-points, and only
discusses one change-point when methods for multiple change-points are not
available; (3) it is meant as a practitioner's guide for empirical
macroeconomists first, and as a result, it focuses only on the methods derived
under the most general assumptions relevant to macroeconomic applications.
arXiv link: http://arxiv.org/abs/2507.22204v1
Low-Rank Structured Nonparametric Prediction of Instantaneous Volatility
for forecasting intraday volatility using high-frequency financial data. These
approaches typically rely on restrictive parametric assumptions and are often
vulnerable to model misspecification. To address this issue, we introduce a
novel nonparametric prediction method for the future intraday instantaneous
volatility process during trading hours, which leverages both previous days'
data and the current day's observed intraday data. Our approach imposes an
interday-by-intraday matrix representation of the instantaneous volatility,
which is decomposed into a low-rank conditional expectation component and a
noise matrix. To predict the future conditional expected volatility vector, we
exploit this low-rank structure and propose the Structural Intraday-volatility
Prediction (SIP) procedure. We establish the asymptotic properties of the SIP
estimator and demonstrate its effectiveness through an out-of-sample prediction
study using real high-frequency trading data.
arXiv link: http://arxiv.org/abs/2507.22173v1
Regional Price Dynamics and Market Integration in the U.S. Beef Industry: An Econometric Analysis
to face substantial challenges in achieving price harmonization across its
regional markets. This paper evaluates the validity of the Law of One Price
(LOP) in the U.S. beef industry and investigates causal relationships among
regional price dynamics. Through a series of econometric tests, we establish
that regional price series are integrated of order one, displaying
non-stationarity in levels and stationarity in first differences. The analysis
reveals partial LOP compliance in the Northeast and West, while full
convergence remains elusive at the national level. Although no region
demonstrates persistent price leadership, Southern prices appear particularly
sensitive to exogenous shocks. These findings reflect asymmetrical integration
across U.S. beef markets and suggest the presence of structural frictions that
hinder complete market unification.
arXiv link: http://arxiv.org/abs/2507.21950v1
Nonlinear Treatment Effects in Shift-Share Designs
with exogenous shares. We employ a triangular model and correct for treatment
endogeneity using a control function. Our tools identify four target
parameters. Two of them capture the observable heterogeneity of treatment
effects, while one summarizes this heterogeneity in a single measure. The last
parameter analyzes counterfactual, policy-relevant treatment assignment
mechanisms. We propose flexible parametric estimators for these parameters and
apply them to reevaluate the impact of Chinese imports on U.S. manufacturing
employment. Our results highlight substantial treatment effect heterogeneity,
which is not captured by commonly used shift-share tools.
arXiv link: http://arxiv.org/abs/2507.21915v1
Can large language models assist choice modelling? Insights into prompting strategies and current models capabilities
across different disciplines, yet their potential in choice modelling remains
relatively unexplored. This work examines the potential of LLMs as assistive
agents in the specification and, where technically feasible, estimation of
Multinomial Logit models. We implement a systematic experimental framework
involving thirteen versions of six leading LLMs (ChatGPT, Claude, DeepSeek,
Gemini, Gemma, and Llama) evaluated under five experimental configurations.
These configurations vary along three dimensions: modelling goal (suggesting
vs. suggesting and estimating MNLs); prompting strategy (Zero-Shot vs.
Chain-of-Thoughts); and information availability (full dataset vs. data
dictionary only). Each LLM-suggested specification is implemented, estimated,
and evaluated based on goodness-of-fit metrics, behavioural plausibility, and
model complexity. Findings reveal that proprietary LLMs can generate valid and
behaviourally sound utility specifications, particularly when guided by
structured prompts. Open-weight models such as Llama and Gemma struggled to
produce meaningful specifications. Claude 4 Sonnet consistently produced the
best-fitting and most complex models, while GPT models suggested models with
robust and stable modelling outcomes. Some LLMs performed better when provided
with just data dictionary, suggesting that limiting raw data access may enhance
internal reasoning capabilities. Among all LLMs, GPT o3 was uniquely capable of
correctly estimating its own specifications by executing self-generated code.
Overall, the results demonstrate both the promise and current limitations of
LLMs as assistive agents in choice modelling, not only for model specification
but also for supporting modelling decision and estimation, and provide
practical guidance for integrating these tools into choice modellers'
workflows.
arXiv link: http://arxiv.org/abs/2507.21790v1
A Bayesian Ensemble Projection of Climate Change and Technological Impacts on Future Crop Yields
fully probabilistic setting for crop yield estimation, model selection, and
uncertainty forecasting under multiple future greenhouse gas emission
scenarios. By informing on regional agricultural impacts, this approach
addresses broader risks to global food security. Extending an established
multivariate econometric crop-yield model to incorporate country-specific error
variances, the framework systematically relaxes restrictive homogeneity
assumptions and enables transparent decomposition of predictive uncertainty
into contributions from climate models, emission scenarios, and crop model
parameters. In both in-sample and out-of-sample analyses focused on global
wheat production, the results demonstrate significant improvements in
calibration and probabilistic accuracy of yield projections. These advances
provide policymakers and stakeholders with detailed, risk-sensitive information
to support the development of more resilient and adaptive agricultural and
climate strategies in response to escalating climate-related risks.
arXiv link: http://arxiv.org/abs/2507.21559v1
Policy Learning under Unobserved Confounding: A Robust and Efficient Approach
observational data in the presence of unobserved confounding, complementing
existing instrumental variable (IV) based approaches. We employ the marginal
sensitivity model (MSM) to relax the commonly used yet restrictive
unconfoundedness assumption by introducing a sensitivity parameter that
captures the extent of selection bias induced by unobserved confounders.
Building on this framework, we consider two distributionally robust welfare
criteria, defined as the worst-case welfare and policy improvement functions,
evaluated over an uncertainty set of counterfactual distributions characterized
by the MSM. Closed-form expressions for both welfare criteria are derived.
Leveraging these identification results, we construct doubly robust scores and
estimate the robust policies by maximizing the proposed criteria. Our approach
accommodates flexible machine learning methods for estimating nuisance
components, even when these converge at moderately slow rate. We establish
asymptotic regret bounds for the resulting policies, providing a robust
guarantee against the most adversarial confounding scenario. The proposed
method is evaluated through extensive simulation studies and empirical
applications to the JTPA study and Head Start program.
arXiv link: http://arxiv.org/abs/2507.20550v1
Staggered Adoption DiD Designs with Misclassification and Anticipation
staggered adoption designs -- a common extension of the canonical
Difference-in-Differences (DiD) model to multiple groups and time-periods -- in
the presence of (time varying) misclassification of the treatment status as
well as of anticipation. We demonstrate that standard estimators are biased
with respect to commonly used causal parameters of interest under such forms of
misspecification. To address this issue, we provide modified estimators that
recover the Average Treatment Effect of observed and true switching units,
respectively. Additionally, we suggest a testing procedure aimed at detecting
the timing and extent of misclassification and anticipation effects. We
illustrate the proposed methods with an application to the effects of an
anti-cheating policy on school mean test scores in high stakes national exams
in Indonesia.
arXiv link: http://arxiv.org/abs/2507.20415v1
Dependency Network-Based Portfolio Design with Forecasting and VaR Constraints
statistical social network analysis with time series forecasting and risk
management. Using daily stock data from the S&P 500 (2020-2024), we construct
dependency networks via Vector Autoregression (VAR) and Forecast Error Variance
Decomposition (FEVD), transforming influence relationships into a cost-based
network. Specifically, FEVD breaks down the VAR's forecast error variance to
quantify how much each stock's shocks contribute to another's uncertainty
information we invert to form influence-based edge weights in our network. By
applying the Minimum Spanning Tree (MST) algorithm, we extract the core
inter-stock structure and identify central stocks through degree centrality. A
dynamic portfolio is constructed using the top-ranked stocks, with capital
allocated based on Value at Risk (VaR). To refine stock selection, we
incorporate forecasts from ARIMA and Neural Network Autoregressive (NNAR)
models. Trading simulations over a one-year period demonstrate that the
MST-based strategies outperform a buy-and-hold benchmark, with the tuned
NNAR-enhanced strategy achieving a 63.74% return versus 18.00% for the
benchmark. Our results highlight the potential of combining network structures,
predictive modeling, and risk metrics to improve adaptive financial
decision-making.
arXiv link: http://arxiv.org/abs/2507.20039v1
Semiparametric Identification of the Discount Factor and Payoff Function in Dynamic Discrete Choice Models
identified in stationary infinite-horizon dynamic discrete choice models. In
single-agent models, we show that common nonparametric assumptions on
per-period payoffs -- such as homogeneity of degree one, monotonicity,
concavity, zero cross-differences, and complementarity -- provide identifying
restrictions on the discount factor. These restrictions take the form of
polynomial equalities and inequalities with degrees bounded by the cardinality
of the state space. These restrictions also identify payoff functions under
standard normalization at one action. In dynamic game models, we show that
firm-specific discount factors can be identified using assumptions such as
irrelevance of other firms' lagged actions, exchangeability, and the
independence of adjustment costs from other firms' actions. Our results
demonstrate that widely used nonparametric assumptions in economic analysis can
provide substantial identifying power in dynamic structural models.
arXiv link: http://arxiv.org/abs/2507.19814v1
Binary Classification with the Maximum Score Model and Linear Programming
classification using Manski's (1975,1985) maximum score model when covariates
are discretely distributed and parameters are partially but not point
identified. We establish conditions under which it is minimax optimal to allow
for either non-classification or random classification and derive finite-sample
and asymptotic lower bounds on the probability of correct classification. We
also describe an extension of our method to continuous covariates. Our approach
avoids the computational difficulty of maximum score estimation by
reformulating the problem as two linear programs. Compared to parametric and
nonparametric methods, our method balances extrapolation ability with minimal
distributional assumptions. Monte Carlo simulations and empirical applications
demonstrate its effectiveness and practical relevance.
arXiv link: http://arxiv.org/abs/2507.19654v1
Beyond Bonferroni: Hierarchical Multiple Testing in Empirical Research
testing multiple hypotheses simultaneously, increasing the risk of false
positives due to chance. Classical multiple testing procedures, such as the
Bonferroni correction, control the family-wise error rate (FWER) but tend to be
overly conservative, reducing statistical power. Stepwise alternatives like the
Holm and Hochberg procedures offer improved power while maintaining error
control under certain dependence structures. However, these standard approaches
typically ignore hierarchical relationships among hypotheses -- structures that
are common in settings such as clinical trials and program evaluations, where
outcomes are often logically or causally linked. Hierarchical multiple testing
procedures -- including fixed sequence, fallback, and gatekeeping methods --
explicitly incorporate these relationships, providing more powerful and
interpretable frameworks for inference. This paper reviews key hierarchical
methods, compares their statistical properties and practical trade-offs, and
discusses implications for applied empirical research.
arXiv link: http://arxiv.org/abs/2507.19610v1
Uniform Critical Values for Likelihood Ratio Tests in Boundary Problems
discontinuous in the presence of nuisance parameters at the boundary of the
parameter space, which lead to size distortions when standard critical values
are used for testing. In this paper, we propose a new and simple way of
constructing critical values that yields uniformly correct asymptotic size,
regardless of whether nuisance parameters are at, near or far from the boundary
of the parameter space. Importantly, the proposed critical values are trivial
to compute and at the same time provide powerful tests in most settings. In
comparison to existing size-correction methods, the new approach exploits the
monotonicity of the two components of the limiting distribution of the
likelihood ratio statistic, in conjunction with rectangular confidence sets for
the nuisance parameters, to gain computational tractability. Uniform validity
is established for likelihood ratio tests based on the new critical values, and
we provide illustrations of their construction in two key examples: (i) testing
a coefficient of interest in the classical linear regression model with
non-negativity constraints on control coefficients, and, (ii) testing for the
presence of exogenous variables in autoregressive conditional heteroskedastic
models (ARCH) with exogenous regressors. Simulations confirm that the tests
have desirable size and power properties. A brief empirical illustration
demonstrates the usefulness of our proposed test in relation to testing for
spill-overs and ARCH effects.
arXiv link: http://arxiv.org/abs/2507.19603v1
Sequential Decision Problems with Missing Feedback
under missing data. State-of-the-art algorithms implicitly assume that rewards
are always observable. I show that when rewards are missing at random, the
Upper Confidence Bound (UCB) algorithm maintains optimal regret bounds;
however, it selects suboptimal policies with high probability as soon as this
assumption is relaxed. To overcome this limitation, I introduce a fully
nonparametric algorithm-Doubly-Robust Upper Confidence Bound (DR-UCB)-which
explicitly models the form of missingness through observable covariates and
achieves a nearly-optimal worst-case regret rate of $O(T)$.
To prove this result, I derive high-probability bounds for a class of
doubly-robust estimators that hold under broad dependence structures.
Simulation results closely match the theoretical predictions, validating the
proposed framework.
arXiv link: http://arxiv.org/abs/2507.19596v1
Interactive, Grouped and Non-separable Fixed Effects: A Practitioner's Guide to the New Panel Data Econometrics
heterogeneity in panel data. Interactive Fixed Effects (IFE) proved to be a
foundational framework, generalizing the standard one-way and two-way fixed
effects models by allowing the unit-specific unobserved heterogeneity to be
interacted with unobserved time-varying common factors, allowing for more
general forms of omitted variables. The IFE framework laid the theoretical
foundations for other forms of heterogeneity, such as grouped fixed effects
(GFE) and non-separable two-way fixed effects (NSTW). The existence of IFE, GFE
or NSTW has significant implications for identification, estimation, and
inference, leading to the development of many new estimators for panel data
models. This paper provides an accessible review of the new estimation methods
and their associated diagnostic tests, and offers a guide to empirical
practice. In two separate empirical investigations we demonstrate that there is
empirical support for the new forms of fixed effects and that the results can
differ significantly from those obtained using traditional fixed effects
estimators.
arXiv link: http://arxiv.org/abs/2507.19099v2
Flexible estimation of skill formation models
component in understanding human capital development and its effects on
individual outcomes. Existing estimators are either based on moment conditions
and only applicable in specific settings or rely on distributional
approximations that often do not align with the model. Our method employs an
iterative likelihood-based procedure, which flexibly estimates latent variable
distributions and recursively incorporates model restrictions across time
periods. This approach reduces computational complexity while accommodating
nonlinear production functions and measurement systems. Inference can be based
on a bootstrap procedure that does not require re-estimating the model for
bootstrap samples. Monte Carlo simulations and an empirical application
demonstrate that our estimator outperforms existing methods, whose estimators
can be substantially biased or noisy.
arXiv link: http://arxiv.org/abs/2507.18995v1
Batched Adaptive Network Formation
including workplace team formation, social platform recommendations, and
classroom friendship development. In these settings, networks are modeled as
graphs, with agents as nodes, agent pairs as edges, and edge weights capturing
pairwise production or interaction outcomes. This paper develops an adaptive,
or online, policy that learns to form increasingly effective networks
as data accumulates over time, progressively improving total network output
measured by the sum of edge weights.
Our approach builds on the weighted stochastic block model (WSBM), which
captures agents' unobservable heterogeneity through discrete latent types and
models their complementarities in a flexible, nonparametric manner. We frame
the online network formation problem as a non-standard batched
multi-armed bandit, where each type pair corresponds to an arm, and pairwise
reward depends on type complementarity. This strikes a balance between
exploration -- learning latent types and complementarities -- and exploitation
-- forming high-weighted networks. We establish two key results: a
batched local asymptotic normality result for the WSBM and an
asymptotic equivalence between maximum likelihood and variational estimates of
the intractable likelihood. Together, they provide a theoretical foundation for
treating variational estimates as normal signals, enabling principled Bayesian
updating across batches. The resulting posteriors are then incorporated into a
tailored maximum-weight matching problem to determine the policy for the next
batch. Simulations show that our algorithm substantially improves outcomes
within a few batches, yields increasingly accurate parameter estimates, and
remains effective even in nonstationary settings with evolving agent pools.
arXiv link: http://arxiv.org/abs/2507.18961v1
How weak are weak factors? Uniform inference for signal strength in signal plus noise models
spiked sample covariance matrices, the sum of a Wigner matrix and a low-rank
perturbation, and canonical correlation analysis with low-rank dependencies.
The objective is to construct confidence intervals for the signal strength that
are uniformly valid across all regimes - strong, weak, and critical signals. We
demonstrate that traditional Gaussian approximations fail in the critical
regime. Instead, we introduce a universal transitional distribution that
enables valid inference across the entire spectrum of signal strengths. The
approach is illustrated through applications in macroeconomics and finance.
arXiv link: http://arxiv.org/abs/2507.18554v1
Partitioned Wild Bootstrap for Panel Data Quantile Regression
have been a pervasive concern in empirical work, and can be especially
challenging when the panel is observed over many time periods and temporal
dependence needs to be taken into account. In this paper, we propose a new
bootstrap method that applies random weighting to a partition of the data --
partition-invariant weights are used in the bootstrap data generating process
-- to conduct statistical inference for conditional quantiles in panel data
that have significant time-series dependence. We demonstrate that the procedure
is asymptotically valid for approximating the distribution of the fixed effects
quantile regression estimator. The bootstrap procedure offers a viable
alternative to existing resampling methods. Simulation studies show numerical
evidence that the novel approach has accurate small sample behavior, and an
empirical application illustrates its use.
arXiv link: http://arxiv.org/abs/2507.18494v1
A general randomized test for Alpha
pricing errors of a panel of asset returns are jointly equal to zero in a
linear factor asset pricing model -- that is, the null of "zero alpha". We
consider, as a leading example, a model with observable, tradable factors, but
we also develop extensions to accommodate for non-tradable and latent factors.
The test is based on equation-by-equation estimation, using a randomized
version of the estimated alphas, which only requires rates of convergence. The
distinct features of the proposed methodology are that it does not require the
estimation of any covariance matrix, and that it allows for both N and T to
pass to infinity, with the former possibly faster than the latter. Further,
unlike extant approaches, the procedure can accommodate conditional
heteroskedasticity, non-Gaussianity, and even strong cross-sectional dependence
in the error terms. We also propose a de-randomized decision rule to choose in
favor or against the correct specification of a linear factor pricing model.
Monte Carlo simulations show that the test has satisfactory properties and it
compares favorably to several existing tests. The usefulness of the testing
procedure is illustrated through an application of linear factor pricing models
to price the constituents of the S&P 500.
arXiv link: http://arxiv.org/abs/2507.17599v1
Decoding Consumer Preferences Using Attention-Based Language Models
language models. An encoder-only language model is trained in a two-stage
process to analyze the natural language descriptions of used cars from a large
US-based online auction marketplace. The approach enables
semi-nonparametrically estimation for the demand primitives of a structural
model representing the private valuations and market size for each vehicle
listing. In the first stage, the language model is fine-tuned to encode the
target auction outcomes using the natural language vehicle descriptions. In the
second stage, the trained language model's encodings are projected into the
parameter space of the structural model. The model's capability to conduct
counterfactual analyses within the trained market space is validated using a
subsample of withheld auction data, which includes a set of unique "zero shot"
instances.
arXiv link: http://arxiv.org/abs/2507.17564v1
Adaptive Market Intelligence: A Mixture of Experts Framework for Volatility-Sensitive Stock Forecasting
framework for stock price prediction across heterogeneous volatility regimes
using real market data. The proposed model combines a Recurrent Neural Network
(RNN) optimized for high-volatility stocks with a linear regression model
tailored to stable equities. A volatility-aware gating mechanism dynamically
weights the contributions of each expert based on asset classification. Using a
dataset of 30 publicly traded U.S. stocks spanning diverse sectors, the MoE
approach consistently outperforms both standalone models.
Specifically, it achieves up to 33% improvement in MSE for volatile assets
and 28% for stable assets relative to their respective baselines. Stratified
evaluation across volatility classes demonstrates the model's ability to adapt
complexity to underlying market dynamics. These results confirm that no single
model suffices across market regimes and highlight the advantage of adaptive
architectures in financial prediction. Future work should explore real-time
gate learning, dynamic volatility segmentation, and applications to portfolio
optimization.
arXiv link: http://arxiv.org/abs/2508.02686v1
Can we have it all? Non-asymptotically valid and asymptotically exact confidence intervals for expectations and linear regressions
by studying confidence sets (CSs) that are both non-asymptotically valid and
asymptotically exact uniformly (NAVAE) over semi-parametric statistical models.
NAVAE CSs are not easily obtained; for instance, we show they do not exist over
the set of Bernoulli distributions. We first derive a generic sufficient
condition: NAVAE CSs are available as soon as uniform asymptotically exact CSs
are. Second, building on that connection, we construct closed-form NAVAE
confidence intervals (CIs) in two standard settings -- scalar expectations and
linear combinations of OLS coefficients -- under moment conditions only. For
expectations, our sole requirement is a bounded kurtosis. In the OLS case, our
moment constraints accommodate heteroskedasticity and weak exogeneity of the
regressors. Under those conditions, we enlarge the Central Limit Theorem-based
CIs, which are asymptotically exact, to ensure non-asymptotic guarantees. Those
modifications vanish asymptotically so that our CIs coincide with the classical
ones in the limit. We illustrate the potential and limitations of our approach
through a simulation study.
arXiv link: http://arxiv.org/abs/2507.16776v2
Dyadic data with ordered outcome variables
flexible sender and receiver fixed effects that can vary arbitrarily across
outcome categories. This structure poses a significant incidental parameter
problem, particularly challenging under network sparsity or when some outcome
categories are rare. We develop the first estimation method for this setting by
extending tetrad-differencing conditional maximum likelihood (CML) techniques
from binary choice network models. This approach yields conditional
probabilities free of the fixed effects, enabling consistent estimation even
under sparsity. Applying the CML principle to ordered data yields multiple
likelihood contributions corresponding to different outcome thresholds. We
propose and analyze two distinct estimators based on aggregating these
contributions: an Equally-Weighted Tetrad Logit Estimator (ETLE) and a Pooled
Tetrad Logit Estimator (PTLE). We prove PTLE is consistent under weaker
identification conditions, requiring only sufficient information when pooling
across categories, rather than sufficient information in each category. Monte
Carlo simulations confirm the theoretical preference for PTLE, and an empirical
application to friendship networks among Dutch university students demonstrates
the method's value. Our approach reveals significant positive homophily effects
for gender, smoking behavior, and academic program similarities, while standard
methods without fixed effects produce counterintuitive results.
arXiv link: http://arxiv.org/abs/2507.16689v1
Binary Response Forecasting under a Factor-Augmented Framework
model with a binary response variable. We develop a maximum likelihood
estimation method for the regression parameters and establish the asymptotic
properties of the resulting estimators. Monte Carlo simulation results show
that the proposed estimation method performs very well in finite samples.
Finally, we demonstrate the usefulness of the proposed model through an
application to U.S. recession forecasting. The proposed model consistently
outperforms conventional Probit regression across both in-sample and
out-of-sample exercises, by effectively utilizing high-dimensional information
through latent factors.
arXiv link: http://arxiv.org/abs/2507.16462v1
Volatility Spillovers and Interconnectedness in OPEC Oil Markets: A Network-Based log-ARCH Approach
capturing spillovers among OPEC oil-exporting countries by embedding novel
network structures into ARCH-type models. We apply a network-based log-ARCH
framework that incorporates weight matrices derived from time-series clustering
and model-implied distances into the conditional variance equation. These
weight matrices are constructed from return data and standard multivariate
GARCH model outputs (CCC, DCC, and GO-GARCH), enabling a comparative analysis
of volatility transmission across specifications. Through a rolling-window
forecast evaluation, the network-based models demonstrate competitive
forecasting performance relative to traditional specifications and uncover
intricate spillover effects. These results provide a deeper understanding of
the interconnectedness within the OPEC network, with important implications for
financial risk assessment, market integration, and coordinated policy among
oil-producing economies.
arXiv link: http://arxiv.org/abs/2507.15046v1
Testing Clustered Equal Predictive Ability with Unknown Clusters
predictive ability in panel data settings with unknown heterogeneity. The
framework allows predictive performance to vary across unobserved clusters and
accounts for the data-driven selection of these clusters using the Panel Kmeans
Algorithm. A post-selection Wald-type statistic is constructed, and valid
$p$-values are derived under general forms of autocorrelation and
cross-sectional dependence in forecast loss differentials. The method
accommodates conditioning on covariates or common factors and permits both
strong and weak dependence across units. Simulations demonstrate the
finite-sample validity of the procedure and show that it has very high power.
An empirical application to exchange rate forecasting using machine learning
methods illustrates the practical relevance of accounting for unknown clusters
in forecast evaluation.
arXiv link: http://arxiv.org/abs/2507.14621v2
A New Perspective of the Meese-Rogoff Puzzle: Application of Sparse Dynamic Shrinkage
the Dynamic Shrinkage Process (DSP) of Kowal et al. (2019). We revisit the
Meese-Rogoff puzzle (Meese and Rogoff, 1983a,b, 1988) by applying the MSDSP to
the economic models deemed inferior to the random walk model for exchange rate
predictions. The flexibility of the MSDSP model captures the possibility of
zero coefficients (sparsity), constant coefficient (dynamic shrinkage), as well
as sudden and gradual parameter movements (structural change) in the
time-varying parameter model setting. We also apply MSDSP in the context of
Bayesian predictive synthesis (BPS) (McAlinn and West, 2019), where dynamic
combination schemes exploit the information from the alternative economic
models. Our analysis provide a new perspective to the Meese-Rogoff puzzle,
illustrating that the economic models, enhanced with the parameter flexibility
of the MSDSP, produce predictive distributions that are superior to the random
walk model, even when stochastic volatility is considered.
arXiv link: http://arxiv.org/abs/2507.14408v1
Policy relevance of causal quantities in networks
has been a proliferation of ways to quantify effects of treatments on outcomes.
Here we describe how many proposed estimands can be represented as involving
one of two ways of averaging over units and treatment assignments. The more
common representation often results in quantities that are irrelevant, or at
least insufficient, for optimal choice of policies governing treatment
assignment. The other representation often yields quantities that lack an
interpretation as summaries of unit-level effects, but that we argue may still
be relevant to policy choice. Among various estimands, the expected average
outcome -- or its contrast between two different policies -- can be represented
both ways and, we argue, merits further attention.
arXiv link: http://arxiv.org/abs/2507.14391v1
Regional compositional trajectories and structural change: A spatiotemporal multivariate autoregressive framework
transactions, are central to understanding structural change in economic
systems across space and time. This paper introduces a spatiotemporal
multivariate autoregressive model tailored for panel data with
composition-valued responses at each areal unit and time point. The proposed
framework enables the joint modelling of temporal dynamics and spatial
dependence under compositional constraints and is estimated via a quasi maximum
likelihood approach. We build on recent theoretical advances to establish
identifiability and asymptotic properties of the estimator when both the number
of regions and time points grow. The utility and flexibility of the model are
demonstrated through two applications: analysing property transaction
compositions in an intra-city housing market (Berlin), and regional sectoral
compositions in Spain's economy. These case studies highlight how the proposed
framework captures key features of spatiotemporal economic processes that are
often missed by conventional methods.
arXiv link: http://arxiv.org/abs/2507.14389v1
Leveraging Covariates in Regression Discontinuity Designs
economics. In the context of Regression Discontinuity (RD) designs, covariate
adjustment plays multiple roles, making it essential to understand its impact
on analysis and conclusions. Typically implemented via local least squares
regressions, covariate adjustment can serve three main distinct purposes: (i)
improving the efficiency of RD average causal effect estimators, (ii) learning
about heterogeneous RD policy effects, and (iii) changing the RD parameter of
interest. This article discusses and illustrates empirically how to leverage
covariates effectively in RD designs.
arXiv link: http://arxiv.org/abs/2507.14311v1
Debiased Machine Learning for Unobserved Heterogeneity: High-Dimensional Panels and Measurement Error Models
Heterogeneity (UH) is both important and challenging. We propose novel Debiased
Machine Learning (DML) procedures for valid inference on functionals of UH,
allowing for partial identification of multivariate target and high-dimensional
nuisance parameters. Our main contribution is a full characterization of all
relevant Neyman-orthogonal moments in models with nonparametric UH, where
relevance means informativeness about the parameter of interest. Under
additional support conditions, orthogonal moments are globally robust to the
distribution of the UH. They may still involve other high-dimensional nuisance
parameters, but their local robustness reduces regularization bias and enables
valid DML inference. We apply these results to: (i) common parameters, average
marginal effects, and variances of UH in panel data models with
high-dimensional controls; (ii) moments of the common factor in the Kotlarski
model with a factor loading; and (iii) smooth functionals of teacher
value-added. Monte Carlo simulations show substantial efficiency gains from
using efficient orthogonal moments relative to ad-hoc choices. We illustrate
the practical value of our approach by showing that existing estimates of the
average and variance effects of maternal smoking on child birth weight are
robust.
arXiv link: http://arxiv.org/abs/2507.13788v1
Who With Whom? Learning Optimal Matching Policies
performance of institutions and policies depend on who matches with whom.
Examples include caseworkers and job seekers in job search assistance programs,
medical doctors and patients, teachers and students, attorneys and defendants,
and tax auditors and taxpayers, among others. Although reallocating individuals
through a change in matching policy can be less costly than training personnel
or introducing a new program, methods for learning optimal matching policies
and their statistical performance are less studied than methods for other
policy interventions. This paper develops a method to learn welfare optimal
matching policies for two-sided matching problems in which a planner matches
individuals based on the rich set of observable characteristics of the two
sides. We formulate the learning problem as an empirical optimal transport
problem with a match cost function estimated from training data, and propose
estimating an optimal matching policy by maximizing the entropy regularized
empirical welfare criterion. We derive a welfare regret bound for the estimated
policy and characterize its convergence. We apply our proposal to the problem
of matching caseworkers and job seekers in a job search assistance program, and
assess its welfare performance in a simulation study calibrated with French
administrative data.
arXiv link: http://arxiv.org/abs/2507.13567v1
Combining stated and revealed preferences
research proposes a novel approach to researchers who have access to both
stated choices in hypothetical scenarios and actual choices, matched or
unmatched. The key idea is to use stated choices to identify the distribution
of individual unobserved heterogeneity. If this unobserved heterogeneity is the
source of endogeneity, the researcher can correct for its influence in a demand
function estimation using actual choices and recover causal effects. Bounds on
causal effects are derived in the case, where stated choice and actual choices
are observed in unmatched data sets. These data combination bounds are of
independent interest. We derive a valid bootstrap inference for the bounds and
show its good performance in a simulation experiment.
arXiv link: http://arxiv.org/abs/2507.13552v1
Refining the Notion of No Anticipation in Difference-in-Differences Studies
difference-in-differences, which are widely applied in empirical research,
particularly in economics. The assumption commonly referred to as the
"no-anticipation assumption" states that treatment has no effect on outcomes
before its implementation. However, because standard causal models rely on a
temporal structure in which causes precede effects, such an assumption seems to
be inherently satisfied. This raises the question of whether the assumption is
repeatedly stated out of redundancy or because the formal statements fail to
capture the intended subject-matter interpretation. We argue that confusion
surrounding the no-anticipation assumption arises from ambiguity in the
intervention considered and that current formulations of the assumption are
ambiguous. Therefore, new definitions and identification results are proposed.
arXiv link: http://arxiv.org/abs/2507.12891v2
Placebo Discontinuity Design
of expected potential outcomes at the cutoff. The standard continuity
assumption can be violated by strategic manipulation of the running variable,
which is realistic when the cutoff is widely known and when the treatment of
interest is a social program or government benefit. In this work, we identify
the treatment effect despite such a violation, by leveraging a placebo
treatment and a placebo outcome. We introduce a local instrumental variable
estimator. Our estimator decomposes into two terms: the standard RDD estimator
of the target outcome's discontinuity, and a new adjustment term based on the
placebo outcome's discontinuity. We show that our estimator is consistent, and
we justify a robust bias-corrected inference procedure. Our method expands the
applicability of RDD to settings with strategic behavior around the cutoff,
which commonly arise in social science.
arXiv link: http://arxiv.org/abs/2507.12693v1
NA-DiD: Extending Difference-in-Differences with Capabilities
framework, which extends classical DiD by incorporating non-additive measures
the Choquet integral for effect aggregation. It serves as a novel econometric
tool for impact evaluation, particularly in settings with non-additive
treatment effects. First, we introduce the integral representation of the
classial DiD model, and then extend it to non-additive measures, therefore
deriving the formulae for NA-DiD estimation. Then, we give its theoretical
properties. Applying NA-DiD to a simulated hospital hygiene intervention, we
find that classical DiD can overestimate treatment effects, f.e. failing to
account for compliance erosion. In contrast, NA-DiD provides a more accurate
estimate by incorporating non-linear aggregation. The Julia implementation of
the techniques used and introduced in this article is provided in the
appendices.
arXiv link: http://arxiv.org/abs/2507.12690v1
Semiparametric Learning of Integral Functionals on Submanifolds
functionals on submanifolds, which arise naturally in a variety of econometric
settings. For linear integral functionals on a regular submanifold, we show
that the semiparametric plug-in estimator attains the minimax-optimal
convergence rate $n^{-s{2s+d-m}}$, where $s$ is the H\"{o}lder
smoothness order of the underlying nonparametric function, $d$ is the dimension
of the first-stage nonparametric estimation, $m$ is the dimension of the
submanifold over which the integral is taken. This rate coincides with the
standard minimax-optimal rate for a $(d-m)$-dimensional nonparametric
estimation problem, illustrating that integration over the $m$-dimensional
manifold effectively reduces the problem's dimensionality. We then provide a
general asymptotic normality theorem for linear/nonlinear submanifold
integrals, along with a consistent variance estimator. We provide simulation
evidence in support of our theoretical results.
arXiv link: http://arxiv.org/abs/2507.12673v1
Catching Bid-rigging Cartels with Graph Attention Neural Networks
graph neural network enhanced with attention mechanisms, to develop a deep
learning algorithm for detecting collusive behavior, leveraging predictive
features suggested in prior research. We test our approach on a large dataset
covering 13 markets across seven countries. Our results show that predictive
models based on GATs, trained on a subset of the markets, can be effectively
transferred to other markets, achieving accuracy rates between 80% and 90%,
depending on the hyperparameter settings. The best-performing configuration,
applied to eight markets from Switzerland and the Japanese region of Okinawa,
yields an average accuracy of 91% for cross-market prediction. When extended to
12 markets, the method maintains a strong performance with an average accuracy
of 84%, surpassing traditional ensemble approaches in machine learning. These
results suggest that GAT-based detection methods offer a promising tool for
competition authorities to screen markets for potential cartel activity.
arXiv link: http://arxiv.org/abs/2507.12369v2
Forecasting Climate Policy Uncertainty: Evidence from the United States
strive to balance economic growth with environmental goals. High levels of CPU
can slow down investments in green technologies, make regulatory planning more
difficult, and increase public resistance to climate reforms, especially during
times of economic stress. This study addresses the challenge of forecasting the
US CPU index by building the Bayesian Structural Time Series (BSTS) model with
a large set of covariates, including economic indicators, financial cycle data,
and public sentiments captured through Google Trends. The key strength of the
BSTS model lies in its ability to efficiently manage a large number of
covariates through its dynamic feature selection mechanism based on the
spike-and-slab prior. To validate the effectiveness of the selected features of
the BSTS model, an impulse response analysis is performed. The results show
that macro-financial shocks impact CPU in different ways over time. Numerical
experiments are performed to evaluate the performance of the BSTS model with
exogenous variables on the US CPU dataset over different forecasting horizons.
The empirical results confirm that BSTS consistently outperforms classical and
deep learning frameworks, particularly for semi-long-term and long-term
forecasts.
arXiv link: http://arxiv.org/abs/2507.12276v1
Data Synchronization at High Frequencies
significant biases into econometric analysis, distorting risk estimates and
leading to suboptimal portfolio decisions. Existing synchronization methods,
such as the previous-tick approach, suffer from information loss and create
artificial price staleness. We introduce a novel framework that recasts the
data synchronization challenge as a constrained matrix completion problem. Our
approach recovers the potential matrix of high-frequency price increments by
minimizing its nuclear norm -- capturing the underlying low-rank factor
structure -- subject to a large-scale linear system derived from observed,
asynchronous price changes. Theoretically, we prove the existence and
uniqueness of our estimator and establish its convergence rate. A key
theoretical insight is that our method accurately and robustly leverages
information from both frequently and infrequently traded assets, overcoming a
critical difficulty of efficiency loss in traditional methods. Empirically,
using extensive simulations and a large panel of S&P 500 stocks, we demonstrate
that our method substantially outperforms established benchmarks. It not only
achieves significantly lower synchronization errors, but also corrects the bias
in systematic risk estimates (i.e., eigenvalues) and the estimate of betas
caused by stale prices. Crucially, portfolios constructed using our
synchronized data yield consistently and economically significant higher
out-of-sample Sharpe ratios. Our framework provides a powerful tool for
uncovering the true dynamics of asset prices, with direct implications for
high-frequency risk management, algorithmic trading, and econometric inference.
arXiv link: http://arxiv.org/abs/2507.12220v1
Inference on Optimal Policy Values and Other Irregular Functionals via Smoothing
policy is an important problem in causal inference. Insight into the optimal
policy value can guide the development of reward-maximizing, individualized
treatment regimes. However, because the functional that defines the optimal
value is non-differentiable, standard semi-parametric approaches for performing
inference fail to be directly applicable. Existing approaches for handling this
non-differentiability fall roughly into two camps. In one camp are estimators
based on constructing smooth approximations of the optimal value. These
approaches are computationally lightweight, but typically place unrealistic
parametric assumptions on outcome regressions. In another camp are approaches
that directly de-bias the non-smooth objective. These approaches don't place
parametric assumptions on nuisance functions, but they either require the
computation of intractably-many nuisance estimates, assume unrealistic
$L^\infty$ nuisance convergence rates, or make strong margin assumptions that
prohibit non-response to a treatment. In this paper, we revisit the problem of
constructing smooth approximations of non-differentiable functionals. By
carefully controlling first-order bias and second-order remainders, we show
that a softmax smoothing-based estimator can be used to estimate parameters
that are specified as a maximum of scores involving nuisance components. In
particular, this includes the value of the optimal treatment policy as a
special case. Our estimator obtains $n$ convergence rates, avoids
parametric restrictions/unrealistic margin assumptions, and is often
statistically efficient.
arXiv link: http://arxiv.org/abs/2507.11780v1
FARS: Factor Augmented Regression Scenarios in R
provides a comprehensive framework in R for the construction of conditional
densities of the variable of interest based on the factor-augmented quantile
regressions (FA-QRs) methodology, with the factors extracted from multi-level
dynamic factor models (ML-DFMs) with potential overlapping group-specific
factors. Furthermore, the package also allows the construction of measures of
risk as well as modeling and designing economic scenarios based on the
conditional densities. In particular, the package enables users to: (i) extract
global and group-specific factors using a flexible multi-level factor
structure; (ii) compute asymptotically valid confidence regions for the
estimated factors, accounting for uncertainty in the factor loadings; (iii)
obtain estimates of the parameters of the FA-QRs together with their standard
deviations; (iv) recover full predictive conditional densities from estimated
quantiles; (v) obtain risk measures based on extreme quantiles of the
conditional densities; and (vi) estimate the conditional density and the
corresponding extreme quantiles when the factors are stressed.
arXiv link: http://arxiv.org/abs/2507.10679v3
Breakdown Analysis for Instrumental Variables with Binary Outcomes
Instrumental Variables (IV) settings with binary outcomes under violations of
independence. I derive the identified sets for the treatment parameters of
interest in the setting, as well as breakdown values for conclusions regarding
the true treatment effects. I derive $N$-consistent nonparametric
estimators for the bounds of treatment effects and for breakdown values. These
results can be used to assess the robustness of empirical conclusions obtained
under the assumption that the instrument is independent from potential
quantities, which is a pervasive concern in studies that use IV methods with
observational data. In the empirical application, I show that the conclusions
regarding the effects of family size on female unemployment using same-sex
siblings as the instrument are highly sensitive to violations of independence.
arXiv link: http://arxiv.org/abs/2507.10242v4
An Algorithm for Identifying Interpretable Subgroups With Elevated Treatment Effects
elevated treatment effects, given an estimate of individual or conditional
average treatment effects (CATE). Subgroups are characterized by “rule sets”
-- easy-to-understand statements of the form (Condition A AND Condition B) OR
(Condition C) -- which can capture high-order interactions while retaining
interpretability. Our method complements existing approaches for estimating the
CATE, which often produce high dimensional and uninterpretable results, by
summarizing and extracting critical information from fitted models to aid
decision making, policy implementation, and scientific understanding. We
propose an objective function that trades-off subgroup size and effect size,
and varying the hyperparameter that controls this trade-off results in a
“frontier” of Pareto optimal rule sets, none of which dominates the others
across all criteria. Valid inference is achievable through sample splitting. We
demonstrate the utility and limitations of our method using simulated and
empirical examples.
arXiv link: http://arxiv.org/abs/2507.09494v1
Propensity score with factor loadings: the effect of the Paris Agreement
with respect to a low-dimensional set of latent factor loadings, have become
increasingly popular for causal inference. Most existing approaches, however,
rely on a causal finite-sample approach or computationally intensive methods,
limiting their applicability and external validity. In this paper, we propose a
novel causal inference method for panel data based on inverse propensity score
weighting where the propensity score is a function of latent factor loadings
within a framework of causal inference from super-population. The approach
relaxes the traditional restrictive assumptions of causal panel methods, while
offering advantages in terms of causal interpretability, policy relevance, and
computational efficiency. Under standard assumptions, we outline a three-step
estimation procedure for the ATT and derive its large-sample properties using
Mestimation theory. We apply the method to assess the causal effect of the
Paris Agreement, a policy aimed at fostering the transition to a low-carbon
economy, on European stock returns. Our empirical results suggest a
statistically significant and negative short-run effect on the stock returns of
firms that issued green bonds.
arXiv link: http://arxiv.org/abs/2507.08764v1
Correlated Synthetic Controls
applications with only one treated unit. Their popularity is partly based on
the key insight that we can predict good synthetic counterfactuals for our
treated unit. However, this insight of predicting counterfactuals is
generalisable to microeconometric settings where we often observe many treated
units. We propose the Correlated Synthetic Controls (CSC) estimator for such
situations: intuitively, it creates synthetic controls that are correlated
across individuals with similar observables. When treatment assignment is
correlated with unobservables, we show that the CSC estimator has more
desirable theoretical properties than the difference-in-differences estimator.
We also utilise CSC in practice to obtain heterogeneous treatment effects in
the well-known Mariel Boatlift study, leveraging additional information from
the PSID.
arXiv link: http://arxiv.org/abs/2507.08918v1
Efficient and Scalable Estimation of Distributional Treatment Effects with Multi-Task Neural Networks
distributional treatment effects (DTE) in randomized experiments. While DTE
provides more granular insights into the experiment outcomes over conventional
methods focusing on the Average Treatment Effect (ATE), estimating it with
regression adjustment methods presents significant challenges. Specifically,
precision in the distribution tails suffers due to data imbalance, and
computational inefficiencies arise from the need to solve numerous regression
problems, particularly in large-scale datasets commonly encountered in
industry. To address these limitations, our method leverages multi-task neural
networks to estimate conditional outcome distributions while incorporating
monotonic shape constraints and multi-threshold label learning to enhance
accuracy. To demonstrate the practical effectiveness of our proposed method, we
apply our method to both simulated and real-world datasets, including a
randomized field experiment aimed at reducing water consumption in the US and a
large-scale A/B test from a leading streaming platform in Japan. The
experimental results consistently demonstrate superior performance across
various datasets, establishing our method as a robust and practical solution
for modern causal inference applications requiring a detailed understanding of
treatment effect heterogeneity.
arXiv link: http://arxiv.org/abs/2507.07738v1
Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting
integrates Galerkin projection techniques with the classical ARIMA model to
capture potentially nonlinear dependencies in lagged observations. By replacing
the fixed linear autoregressive component with a spline-based basis expansion,
Galerkin-ARIMA flexibly approximates the underlying relationship among past
values via ordinary least squares, while retaining the moving-average structure
and Gaussian innovation assumptions of ARIMA. We derive closed-form solutions
for both the AR and MA components using two-stage Galerkin projections,
establish conditions for asymptotic unbiasedness and consistency, and analyze
the bias-variance trade-off under basis-size growth. Complexity analysis
reveals that, for moderate basis dimensions, our approach can substantially
reduce computational cost compared to maximum-likelihood ARIMA estimation.
Through extensive simulations on four synthetic processes-including noisy ARMA,
seasonal, trend-AR, and nonlinear recursion series-we demonstrate that
Galerkin-ARIMA matches or closely approximates ARIMA's forecasting accuracy
while achieving orders-of-magnitude speedups in rolling forecasting tasks.
These results suggest that Galerkin-ARIMA offers a powerful, efficient
alternative for modeling complex time series dynamics in high-volume or
real-time applications.
arXiv link: http://arxiv.org/abs/2507.07469v2
Tracking the economy at high frequency
Dynamic Factor Model estimated with mixed-frequency data. The model
incorporates weekly, monthly, and quarterly official indicators, and allows for
dynamic heterogeneity and stochastic volatility. To ensure temporal consistency
and avoid irregular aggregation artifacts, we introduce a pseudo-week structure
that harmonizes the timing of observations. Our framework integrates dispersed
and asynchronous official statistics into a unified High-Frequency Economic
Index (HFEI), enabling real-time economic monitoring even in environments
characterized by severe data limitations. We apply this framework to construct
a high-frequency indicator for Ecuador, a country where official data are
sparse and highly asynchronous, and compute pseudo-weekly recession
probabilities using a time-varying mean regime-switching model fitted to the
resulting index.
arXiv link: http://arxiv.org/abs/2507.07450v1
Identifying Present-Biased Discount Functions in Dynamic Discrete Choice Models
sophisticated, quasi-hyperbolic time preferences under exclusion restrictions.
We consider both standard finite horizon problems and empirically useful
infinite horizon ones, which we prove to always have solutions. We reduce
identification to finding the present-bias and standard discount factors that
solve a system of polynomial equations with coefficients determined by the data
and use this to bound the cardinality of the identified set. The discount
factors are usually identified, but hard to precisely estimate, because
exclusion restrictions do not capture the defining feature of present bias,
preference reversals, well.
arXiv link: http://arxiv.org/abs/2507.07286v1
On a Debiased and Semiparametric Efficient Changes-in-Changes Estimator
framework of Athey and Imbens (2006) for estimating the average treatment
effect on the treated (ATT) and distributional causal effects in panel data
with unmeasured confounding. While CiC relaxes the parallel trends assumption
in difference-in-differences (DiD), existing methods typically assume a scalar
unobserved confounder and monotonic outcome relationships, and lack inference
tools that accommodate continuous covariates flexibly. Motivated by empirical
settings with complex confounding and rich covariate information, we make two
main contributions. First, we establish nonparametric identification under
relaxed assumptions that allow high-dimensional, non-monotonic unmeasured
confounding. Second, we derive semiparametrically efficient estimators that are
Neyman orthogonal to infinite-dimensional nuisance parameters, enabling valid
inference even with machine learning-based estimation of nuisance components.
We illustrate the utility of our approach in an empirical analysis of mass
shootings and U.S. electoral outcomes, where key confounders, such as political
mobilization or local gun culture, are typically unobserved and challenging to
quantify.
arXiv link: http://arxiv.org/abs/2507.07228v2
Equity Markets Volatility, Regime Dependence and Economic Uncertainty: The Case of Pacific Basin
iShares Asia 50 ETF (AIA) and economic and market sentiment indicators from the
United States, China, and globally during periods of economic uncertainty.
Specifically, it examines the association between AIA volatility and key
indicators such as the US Economic Uncertainty Index (ECU), the US Economic
Policy Uncertainty Index (EPU), China's Economic Policy Uncertainty Index
(EPUCH), the Global Economic Policy Uncertainty Index (GEPU), and the Chicago
Board Options Exchange's Volatility Index (VIX), spanning the years 2007 to
2023. Employing methodologies such as the two-covariate GARCH-MIDAS model,
regime-switching Markov Chain (MSR), and quantile regressions (QR), the study
explores the regime-dependent dynamics between AIA volatility and
economic/market sentiment, taking into account investors' sensitivity to market
uncertainties across different regimes. The findings reveal that the
relationship between realized volatility and sentiment varies significantly
between high- and low-volatility regimes, reflecting differences in investors'
responses to market uncertainties under these conditions. Additionally, a weak
association is observed between short-term volatility and economic/market
sentiment indicators, suggesting that these indicators may have limited
predictive power, especially during high-volatility regimes. The QR results
further demonstrate the robustness of MSR estimates across most quantiles.
Overall, the study provides valuable insights into the complex interplay
between market volatility and economic/market sentiment, offering practical
implications for investors and policymakers.
arXiv link: http://arxiv.org/abs/2507.05552v1
Identification of Causal Effects with a Bunching Design
distribution of a continuous treatment variable, without imposing any
parametric assumptions. This yields a new nonparametric method for overcoming
selection bias in the absence of instrumental variables, panel data, or other
popular research designs for causal inference. The method leverages the change
of variables theorem from integration theory, relating the selection bias to
the ratio of the density of the treatment and the density of the part of the
outcome that varies with confounders. At the bunching point, the treatment
level is constant, so the variation in the outcomes is due entirely to
unobservables, allowing us to identify the denominator. Our main result
identifies the average causal response to the treatment among individuals who
marginally select into the bunching point. We further show that under
additional smoothness assumptions on the selection bias, treatment effects away
from the bunching point may also be identified. We propose estimators based on
standard software packages and apply the method to estimate the effect of
maternal smoking during pregnancy on birth weight.
arXiv link: http://arxiv.org/abs/2507.05210v1
Blind Targeting: Personalization under Third-Party Privacy Constraints
limiting advertisers' access to individual-level data. Instead of providing
access to granular raw data, the platforms only allow a limited number of
aggregate queries to a dataset, which is further protected by adding
differentially private noise. This paper studies whether and how advertisers
can design effective targeting policies within these restrictive privacy
preserving data environments. To achieve this, I develop a probabilistic
machine learning method based on Bayesian optimization, which facilitates
dynamic data exploration. Since Bayesian optimization was designed to sample
points from a function to find its maximum, it is not applicable to aggregate
queries and to targeting. Therefore, I introduce two innovations: (i) integral
updating of posteriors which allows to select the best regions of the data to
query rather than individual points and (ii) a targeting-aware acquisition
function that dynamically selects the most informative regions for the
targeting task. I identify the conditions of the dataset and privacy
environment that necessitate the use of such a "smart" querying strategy. I
apply the strategic querying method to the Criteo AI Labs dataset for uplift
modeling (Diemert et al., 2018) that contains visit and conversion data from
14M users. I show that an intuitive benchmark strategy only achieves 33% of the
non-privacy-preserving targeting potential in some cases, while my strategic
querying method achieves 97-101% of that potential, and is statistically
indistinguishable from Causal Forest (Athey et al., 2019): a state-of-the-art
non-privacy-preserving machine learning targeting method.
arXiv link: http://arxiv.org/abs/2507.05175v1
Forward Variable Selection in Ultra-High Dimensional Linear Regression Using Gram-Schmidt Orthogonalization
regression using a Gram-Schmidt orthogonalization procedure. Unlike the
commonly used Forward Regression (FR) method, which computes regression
residuals using an increasing number of selected features, or the Orthogonal
Greedy Algorithm (OGA), which selects variables based on their marginal
correlations with the residuals, our proposed Gram-Schmidt Forward Regression
(GSFR) simplifies the selection process by evaluating marginal correlations
between the residuals and the orthogonalized new variables. Moreover, we
introduce a new model size selection criterion that determines the number of
selected variables by detecting the most significant change in their unique
contributions, effectively filtering out redundant predictors along the
selection path. While GSFR is theoretically equivalent to FR except for the
stopping rule, our refinement and the newly proposed stopping rule
significantly improve computational efficiency. In ultra-high dimensional
settings, where the dimensionality far exceeds the sample size and predictors
exhibit strong correlations, we establish that GSFR achieves a convergence rate
comparable to OGA and ensures variable selection consistency under mild
conditions. We demonstrate the proposed method {using} simulations and real
data examples. Extensive numerical studies show that GSFR outperforms commonly
used methods in ultra-high dimensional variable selection.
arXiv link: http://arxiv.org/abs/2507.04668v1
A General Class of Model-Free Dense Precision Matrix Estimators
estimators that have broad application in economics. Using quadratic form
concentration inequalities and novel algebraic characterizations of confounding
dimension reductions, we are able to: (i) obtain non-asymptotic bounds for
precision matrix estimation errors and also (ii) consistency in high
dimensions; (iii) uncover the existence of an intrinsic signal-to-noise --
underlying dimensions tradeoff; and (iv) avoid exact population sparsity
assumptions. In addition to its desirable theoretical properties, a thorough
empirical study of the S&P 500 index shows that a tuning parameter-free special
case of our general estimator exhibits a doubly ascending Sharpe Ratio pattern,
thereby establishing a link with the famous double descent phenomenon
dominantly present in recent statistical and machine learning literature.
arXiv link: http://arxiv.org/abs/2507.04663v1
A Test for Jumps in Metric-Space Conditional Means
applicable to outcomes that are complex, non-Euclidean objects like
distributions, networks, or covariance matrices. This article develops a
nonparametric test for jumps in conditional means when outcomes lie in a
non-Euclidean metric space. Using local Fr\'echet regression, the method
estimates a mean path on either side of a candidate cutoff. This extends
existing $k$-sample tests to a non-parametric regression setting with
metric-space valued outcomes. I establish the asymptotic distribution of the
test and its consistency against contiguous alternatives. For this, I derive a
central limit theorem for the local estimator of the conditional Fr\'echet
variance and a consistent estimator of its asymptotic variance. Simulations
confirm nominal size control and robust power in finite samples. Two empirical
illustrations demonstrate the method's ability to reveal discontinuities missed
by scalar-based tests. I find sharp changes in (i) work-from-home compositions
at an income threshold for non-compete enforceability and (ii) national
input-output networks following the loss of preferential U.S. trade access.
These findings show the value of analyzing regression outcomes in their native
metric spaces.
arXiv link: http://arxiv.org/abs/2507.04560v2
A New and Efficient Debiased Estimation of General Treatment Models by Balanced Neural Networks Weighting
assignments often suffer from bias and the `curse of dimensionality' due to the
nonparametric estimation of nuisance parameters for high-dimensional
confounders. Although debiased state-of-the-art methods have been proposed for
binary treatments under particular treatment models, they can be unstable for
small sample sizes. Moreover, directly extending them to general treatment
models can lead to computational complexity. We propose a balanced neural
networks weighting method for general treatment models, which leverages deep
neural networks to alleviate the curse of dimensionality while retaining
optimal covariate balance through calibration, thereby achieving debiased and
robust estimation. Our method accommodates a wide range of treatment models,
including average, quantile, distributional, and asymmetric least squares
treatment effects, for discrete, continuous, and mixed treatments. Under
regularity conditions, we show that our estimator achieves rate double
robustness and $N$-asymptotic normality, and its asymptotic variance
achieves the semiparametric efficiency bound. We further develop a statistical
inference procedure based on weighted bootstrap, which avoids estimating the
efficient influence/score functions. Simulation results reveal that the
proposed method consistently outperforms existing alternatives, especially when
the sample size is small. Applications to the 401(k) dataset and the Mother's
Significant Features dataset further illustrate the practical value of the
method for estimating both average and quantile treatment effects under binary
and continuous treatments, respectively.
arXiv link: http://arxiv.org/abs/2507.04044v1
Increasing Systemic Resilience to Socioeconomic Challenges: Modeling the Dynamics of Liquidity Flows and Systemic Risks Using Navier-Stokes Equations
systemic resilience and effective liquidity flow management essential.
Traditional models such as CAPM, VaR, and GARCH often fail to reflect real
market fluctuations and extreme events. This study develops and validates an
innovative mathematical model based on the Navier-Stokes equations, aimed at
the quantitative assessment, forecasting, and simulation of liquidity flows and
systemic risks. The model incorporates 13 macroeconomic and financial
parameters, including liquidity velocity, market pressure, internal stress,
stochastic fluctuations, and risk premiums, all based on real data and formally
included in the modified equation. The methodology employs econometric testing,
Fourier analysis, stochastic simulation, and AI-based calibration to enable
dynamic testing and forecasting. Simulation-based sensitivity analysis
evaluates the impact of parameter changes on financial balance. The model is
empirically tested using Georgian macroeconomic and financial data from
2010-2024, including GDP, inflation, the Gini index, CDS spreads, and LCR
metrics. Results show that the model effectively describes liquidity dynamics,
systemic risk, and extreme scenarios, while also offering a robust framework
for multifactorial analysis, crisis prediction, and countercyclical policy
planning.
arXiv link: http://arxiv.org/abs/2507.05287v1
Nonparametric regression for cost-effectiveness analyses with observational data -- a tutorial
under budget constraints, particularly when one option is more effective but
also more costly. Cost-effectiveness analysis (CEA) provides a framework for
evaluating whether the health benefits of a treatment justify its additional
costs. A key component of CEA is the estimation of treatment effects on both
health outcomes and costs, which becomes challenging when using observational
data, due to potential confounding. While advanced causal inference methods
exist for use in such circumstances, their adoption in CEAs remains limited,
with many studies relying on overly simplistic methods such as linear
regression or propensity score matching. We believe that this is mainly due to
health economists being generally unfamiliar with superior methodology. In this
paper, we address this gap by introducing cost-effectiveness researchers to
modern nonparametric regression models, with a particular focus on Bayesian
Additive Regression Trees (BART). We provide practical guidance on how to
implement BART in CEAs, including code examples, and discuss its advantages in
producing more robust and credible estimates from observational data.
arXiv link: http://arxiv.org/abs/2507.03511v2
Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions
improve dynamic pricing strategies in supply chains, particularly in contexts
where traditional ERP systems rely on static, rule-based approaches that
overlook strategic interactions among market actors. While recent research has
applied reinforcement learning to pricing, most implementations remain
single-agent and fail to model the interdependent nature of real-world supply
chains. This study addresses that gap by evaluating the performance of three
MARL algorithms: MADDPG, MADQN, and QMIX against static rule-based baselines,
within a simulated environment informed by real e-commerce transaction data and
a LightGBM demand prediction model. Results show that rule-based agents achieve
near-perfect fairness (Jain's Index: 0.9896) and the highest price stability
(volatility: 0.024), but they fully lack competitive dynamics. Among MARL
agents, MADQN exhibits the most aggressive pricing behaviour, with the highest
volatility and the lowest fairness (0.5844). MADDPG provides a more balanced
approach, supporting market competition (share volatility: 9.5 pp) while
maintaining relatively high fairness (0.8819) and stable pricing. These
findings suggest that MARL introduces emergent strategic behaviour not captured
by static pricing rules and may inform future developments in dynamic pricing.
arXiv link: http://arxiv.org/abs/2507.02698v1
Large-Scale Estimation under Unknown Heteroskedasticity
parameters framework that features unknown means and variances. We provide
extended Tweedie's formulae that express the (infeasible) optimal estimators of
heterogeneous parameters, such as unit-specific means or quantiles, in terms of
the density of certain sufficient statistics. These are used to propose
feasible versions with nearly parametric regret bounds of the order of $(\log
n)^\kappa / n$. The estimators are employed in a study of teachers'
value-added, where we find that allowing for heterogeneous variances across
teachers is crucial for delivery optimal estimates of teacher quality and
detecting low-performing teachers.
arXiv link: http://arxiv.org/abs/2507.02293v1
It's Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation
treatment effect given black-box machine learning estimates of nuisance
functions (like the impact of confounders on treatment and outcomes). Here, we
find that the answer depends in a surprising way on the distribution of the
treatment noise. Focusing on the partially linear model of
robinson1988root, we first show that the widely adopted double machine
learning (DML) estimator is minimax rate-optimal for Gaussian treatment noise,
resolving an open problem of mackey2018orthogonal. Meanwhile, for
independent non-Gaussian treatment noise, we show that DML is always suboptimal
by constructing new practical procedures with higher-order robustness to
nuisance errors. These ACE procedures use structure-agnostic cumulant
estimators to achieve $r$-th order insensitivity to nuisance errors whenever
the $(r+1)$-st treatment cumulant is non-zero. We complement these core results
with novel minimax guarantees for binary treatments in the partially linear
model. Finally, using synthetic demand estimation experiments, we demonstrate
the practical benefits of our higher-order robust estimators.
arXiv link: http://arxiv.org/abs/2507.02275v2
Meta-emulation: An application to the social cost of carbon
distribution of the social cost of carbon as a function of the underlying
assumptions. The literature on the social cost of carbon deviates in its
assumptions from the literatures on the impacts of climate change, discounting,
and risk aversion. The proposed meta-emulator corrects this. The social cost of
carbon is higher than reported in the literature.
arXiv link: http://arxiv.org/abs/2507.01804v1
Covariance Matrix Estimation for Positively Correlated Assets
with positively correlated asset returns. This paper addresses covariance
matrix estimation under such conditions, motivated by observations of
significant positive correlations in factor-sorted portfolio monthly returns.
We demonstrate that fine-tuning eigenvectors linked to weak factors within
rotation-equivariant frameworks produces well-conditioned covariance matrix
estimates. Our Eigenvector Rotation Shrinkage Estimator (ERSE) pairwise rotates
eigenvectors while preserving orthogonality, equivalent to performing multiple
linear shrinkage on two distinct eigenvalues. Empirical results on
factor-sorted portfolios from the Ken French data library demonstrate that ERSE
outperforms existing rotation-equivariant estimators in reducing out-of-sample
portfolio variance, achieving average risk reductions of 10.52% versus linear
shrinkage methods and 12.46% versus nonlinear shrinkage methods. Further
checks indicate that ERSE yields covariance matrices with lower condition
numbers, produces more concentrated and stable portfolio weights, and provides
consistent improvements across different subperiods and estimation windows.
arXiv link: http://arxiv.org/abs/2507.01545v1
Heterogeneity Analysis with Heterogeneous Treatments
empirical treatment evaluation research. However, treatments analyzed are often
aggregates of multiple underlying treatments which are themselves
heterogeneous, e.g. different modules of a training program or varying
exposures. In these settings, conventional approaches such as comparing
(adjusted) differences-in-means across groups can produce misleading
conclusions when underlying treatment propensities differ systematically
between groups. This paper develops a novel decomposition framework that
disentangles contributions of effect heterogeneity and qualitatively distinct
components of treatment heterogeneity to observed group-level differences. We
propose semiparametric debiased machine learning estimators that are robust to
complex treatments and limited overlap. We revisit a widely documented gender
gap in training returns of an active labor market policy. The decomposition
reveals that it is almost entirely driven by women being treated differently
than men and not by heterogeneous returns from identical treatments. In
particular, women are disproportionately targeted towards vocational training
tracks with lower unconditional returns.
arXiv link: http://arxiv.org/abs/2507.01517v1
Shrinkage-Based Regressions with Many Related Treatments
disentangle the effects of many related, partially-overlapping treatments.
Examples include estimating treatment effects of different marketing
touchpoints, ordering different types of products, or signing up for different
services. Common approaches that estimate separate treatment coefficients are
too noisy for practical decision-making. We propose a computationally light
model that uses a customized ridge regression to move between a heterogeneous
and a homogenous model: it substantially reduces MSE for the effects of each
individual sub-treatment while allowing us to easily reconstruct the effects of
an aggregated treatment. We demonstrate the properties of this estimator in
theory and simulation, and illustrate how it has unlocked targeted
decision-making at Wayfair.
arXiv link: http://arxiv.org/abs/2507.01202v1
Uniform Validity of the Subset Anderson-Rubin Test under Heteroskedasticity and Nonlinearity
moment restrictions. The statistic is based on the criterion function of the
continuous updating estimator (CUE) for a subset of parameters not constrained
under the Null. We treat the data distribution nonparametrically with
parametric moment restrictions imposed under the Null. We show that subset
tests and confidence intervals based on the AR statistic are uniformly valid
over a wide range of distributions that include moment restrictions with
general forms of heteroskedasticity. We show that the AR based tests have
correct asymptotic size when parameters are unidentified, partially identified,
weakly or strongly identified. We obtain these results by constructing an upper
bound that is using a novel perturbation and regularization approach applied to
the first order conditions of the CUE. Our theory applies to both
cross-sections and time series data and does not assume stationarity in time
series settings or homogeneity in cross-sectional settings.
arXiv link: http://arxiv.org/abs/2507.01167v1
rdhte: Conditional Average Treatment Effects in RD Designs
covariates is a crucial aspect of empirical work. Building on Calonico,
Cattaneo, Farrell, Palomba, and Titiunik (2025), this article discusses the
software package rdhte for estimation and inference of heterogeneous treatment
effects in sharp regression discontinuity (RD) designs. The package includes
three main commands: rdhte conducts estimation and robust bias-corrected
inference for heterogeneous RD treatment effects, for a given choice of the
bandwidth parameter; rdbwhte implements automatic bandwidth selection methods;
and rdhte lincom computes point estimates and robust bias-corrected confidence
intervals for linear combinations, a post-estimation command specifically
tailored to rdhte. We also provide an overview of heterogeneous effects for
sharp RD designs, give basic details on the methodology, and illustrate using
an empirical application. Finally, we discuss how the package rdhte
complements, and in specific cases recovers, the canonical RD package rdrobust
(Calonico, Cattaneo, Farrell, and Titiunik 2017).
arXiv link: http://arxiv.org/abs/2507.01128v1
Randomization Inference with Sample Attrition
from severe size distortion due to sample attrition. We propose new,
computationally efficient methods for randomization inference that remain valid
under a range of potentially informative missingness mechanisms. We begin by
constructing valid p-values for testing sharp null hypotheses, using the
worst-case p-value from the Fisher randomization test over all possible
imputations of missing outcomes. Leveraging distribution-free test statistics,
this worst-case p-value admits a closed-form solution, connecting naturally to
bounds in the partial identification literature. Our test statistics
incorporate both potential outcomes and missingness indicators, allowing us to
exploit structural assumptions-such as monotone missingness-for increased
power. We further extend our framework to test non-sharp null hypotheses
concerning quantiles of individual treatment effects. The methods are
illustrated through simulations and an empirical application.
arXiv link: http://arxiv.org/abs/2507.00795v1
Comparing Misspecified Models with Big Data: A Variational Bayesian Perspective
systems often requires prohibitively high computational complexity. A variety
of detection algorithms have been proposed in the literature, offering
different trade-offs between complexity and detection performance. In recent
years, Variational Bayes (VB) has emerged as a widely used method for
addressing statistical inference in the context of massive data. This study
focuses on misspecified models and examines the risk functions associated with
predictive distributions derived from variational posterior distributions.
These risk functions, defined as the expectation of the Kullback-Leibler (KL)
divergence between the true data-generating density and the variational
predictive distributions, provide a framework for assessing predictive
performance. We propose two novel information criteria for predictive model
comparison based on these risk functions. Under certain regularity conditions,
we demonstrate that the proposed information criteria are asymptotically
unbiased estimators of their respective risk functions. Through comprehensive
numerical simulations and empirical applications in economics and finance, we
demonstrate the effectiveness of these information criteria in comparing
misspecified models in the context of massive data.
arXiv link: http://arxiv.org/abs/2507.00763v1
Plausible GMM: A Quasi-Bayesian Approach
terms of moment conditions. While these moment conditions are generally
well-motivated, it is often unknown whether the moment restrictions hold
exactly. We consider a framework where researchers model their belief about the
potential degree of misspecification via a prior distribution and adopt a
quasi-Bayesian approach for performing inference on structural parameters. We
provide quasi-posterior concentration results, verify that quasi-posteriors can
be used to obtain approximately optimal Bayesian decision rules under the
maintained prior structure over misspecification, and provide a form of
frequentist coverage results. We illustrate the approach through empirical
examples where we obtain informative inference for structural objects allowing
for substantial relaxations of the requirement that moment conditions hold
exactly.
arXiv link: http://arxiv.org/abs/2507.00555v1
Robust Inference when Nuisance Parameters may be Partially Identified with Applications to Synthetic Controls
with a Synthetic Control Estimator, the vector of control weights is a nuisance
parameter which is often constrained, high-dimensional, and may be only
partially identified even when the average treatment effect on the treated is
point-identified. All three of these features of a nuisance parameter can lead
to failure of asymptotic normality for the estimate of the parameter of
interest when using standard methods. I provide a new method yielding
asymptotic normality for an estimate of the parameter of interest, even when
all three of these complications are present. This is accomplished by first
estimating the nuisance parameter using a regularization penalty to achieve a
form of identification, and then estimating the parameter of interest using
moment conditions that have been orthogonalized with respect to the nuisance
parameter. I present high-level sufficient conditions for the estimator and
verify these conditions in an example involving Synthetic Controls.
arXiv link: http://arxiv.org/abs/2507.00307v2
Extrapolation in Regression Discontinuity Design Using Comonotonicity
margin between treatment and non-treatment in sharp regression discontinuity
designs with multiple covariates. Our methods apply both to settings in which
treatment is a function of multiple observables and settings in which treatment
is determined based on a single running variable. Our key identifying
assumption is that conditional average treated and untreated potential outcomes
are comonotonic: covariate values associated with higher average untreated
potential outcomes are also associated with higher average treated potential
outcomes. We provide an estimation method based on local linear regression. Our
estimands are weighted average causal effects, even if comonotonicity fails. We
apply our methods to evaluate counterfactual mandatory summer school policies.
arXiv link: http://arxiv.org/abs/2507.00289v1
Minimax and Bayes Optimal Best-Arm Identification
best-arm identification. We consider an adaptive procedure consisting of a
sampling phase followed by a recommendation phase, and we design an adaptive
experiment within this framework to efficiently identify the best arm, defined
as the one with the highest expected outcome. In our proposed strategy, the
sampling phase consists of two stages. The first stage is a pilot phase, in
which we allocate each arm uniformly in equal proportions to eliminate clearly
suboptimal arms and estimate outcome variances. In the second stage, arms are
allocated in proportion to the variances estimated during the first stage.
After the sampling phase, the procedure enters the recommendation phase, where
we select the arm with the highest sample mean as our estimate of the best arm.
We prove that this single strategy is simultaneously asymptotically minimax and
Bayes optimal for the simple regret, with upper bounds that coincide exactly
with our lower bounds, including the constant terms.
arXiv link: http://arxiv.org/abs/2506.24007v3
Robust Inference with High-Dimensional Instruments
(IV) regressions with high-dimensional instruments, whose number is allowed to
exceed the sample size. In addition, our test is robust to general error
dependence, such as network dependence and spatial dependence. The test
statistic takes a self-normalized form and the asymptotic validity of the test
is established by using random matrix theory. Simulation studies are conducted
to assess the numerical performance of the test, confirming good size control
and satisfactory testing power across a range of various error dependence
structures.
arXiv link: http://arxiv.org/abs/2506.23834v1
Testing parametric additive time-varying GARCH models
(ATV-)GARCH models. In the model, the volatility equation of the GARCH model is
augmented by a deterministic time-varying intercept modeled as a linear
combination of logistic transition functions. The intercept is specified by a
sequence of tests, moving from specific to general. The first test is the test
of the standard stationary GARCH model against an ATV-GARCH model with one
transition. The alternative model is unidentified under the null hypothesis,
which makes the usual LM test invalid. To overcome this problem, we use the
standard method of approximating the transition function by a Taylor expansion
around the null hypothesis. Testing proceeds until the first non-rejection. We
investigate the small-sample properties of the tests in a comprehensive
simulation study. An application to the VIX index indicates that the volatility
of the index is not constant over time but begins a slow increase around the
2007-2008 financial crisis.
arXiv link: http://arxiv.org/abs/2506.23821v1
An Improved Inference for IV Regressions
IVs, such as the shift-share IV, together with many IVs. Could we combine these
results in an efficient way and take advantage of the information from both
sides? In this paper, we propose a combination inference procedure to solve the
problem. Specifically, we consider a linear combination of three test
statistics: a standard cluster-robust Wald statistic based on the
low-dimensional IVs, a leave-one-cluster-out Lagrangian Multiplier (LM)
statistic, and a leave-one-cluster-out Anderson-Rubin (AR) statistic. We first
establish the joint asymptotic normality of the Wald, LM, and AR statistics and
derive the corresponding limit experiment under local alternatives. Then, under
the assumption that at least the low-dimensional IVs can strongly identify the
parameter of interest, we derive the optimal combination test based on the
three statistics and establish that our procedure leads to the uniformly most
powerful (UMP) unbiased test among the class of tests considered. In
particular, the efficiency gain from the combined test is of “free lunch" in
the sense that it is always at least as powerful as the test that is only based
on the low-dimensional IVs or many IVs.
arXiv link: http://arxiv.org/abs/2506.23816v1
Overparametrized models with posterior drift
forecasting accuracy in overparametrized machine learning models. We document
the loss in performance when the loadings of the data generating process change
between the training and testing samples. This matters crucially in settings in
which regime changes are likely to occur, for instance, in financial markets.
Applied to equity premium forecasting, our results underline the sensitivity of
a market timing strategy to sub-periods and to the bandwidth parameters that
control the complexity of the model. For the average investor, we find that
focusing on holding periods of 15 years can generate very heterogeneous
returns, especially for small bandwidths. Large bandwidths yield much more
consistent outcomes, but are far less appealing from a risk-adjusted return
standpoint. All in all, our findings tend to recommend cautiousness when
resorting to large linear models for stock market predictions.
arXiv link: http://arxiv.org/abs/2506.23619v1
P-CRE-DML: A Novel Approach for Causal Inference in Non-Linear Panel Data
Machine Learning (P-CRE-DML) framework to estimate causal effects in panel data
with non-linearities and unobserved heterogeneity. Combining Double Machine
Learning (DML, Chernozhukov et al., 2018), Correlated Random Effects (CRE,
Mundlak, 1978), and lagged variables (Arellano & Bond, 1991) and innovating
within the CRE-DML framework (Chernozhukov et al., 2022; Clarke & Polselli,
2025; Fuhr & Papies, 2024), we apply P-CRE-DML to investigate the effect of
social trust on GDP growth across 89 countries (2010-2020). We find positive
and statistically significant relationship between social trust and economic
growth. This aligns with prior findings on trust-growth relationship (e.g.,
Knack & Keefer, 1997). Furthermore, a Monte Carlo simulation demonstrates
P-CRE-DML's advantage in terms of lower bias over CRE-DML and System GMM.
P-CRE-DML offers a robust and flexible alternative for panel data causal
inference, with applications beyond economic growth.
arXiv link: http://arxiv.org/abs/2506.23297v1
Modeling European Electricity Market Integration during turbulent times
model applied to a panel of nine European electricity markets. Our model
analyzes the impact of daily fossil fuel prices and hourly renewable energy
generation on hourly electricity prices, employing a hierarchical structure to
capture cross-country interdependencies and idiosyncratic factors. The
inclusion of random effects demonstrates that electricity market integration
both mitigates and amplifies shocks. Our results highlight that while renewable
energy sources consistently reduce electricity prices across all countries, gas
prices remain a dominant driver of cross-country electricity price disparities
and instability. This finding underscores the critical importance of energy
diversification, above all on renewable energy sources, and coordinated fossil
fuel supply strategies for bolstering European energy security.
arXiv link: http://arxiv.org/abs/2506.23289v1
Design-Based and Network Sampling-Based Uncertainties in Network Experiments
experiments to estimate spillover effects. We study the causal interpretation
of, and inference for the OLS estimator under both design-based uncertainty
from random treatment assignment and sampling-based uncertainty in network
links. We show that correlations among regressors that capture the exposure to
neighbors' treatments can induce contamination bias, preventing OLS from
aggregating heterogeneous spillover effects for a clear causal interpretation.
We derive the OLS estimator's asymptotic distribution and propose a
network-robust variance estimator. Simulations and an empirical application
demonstrate that contamination bias can be substantial, leading to inflated
spillover estimates.
arXiv link: http://arxiv.org/abs/2506.22989v3
Causal Inference for Aggregated Treatment
aggregation of multiple sub-treatment variables. Researchers often report
marginal causal effects for the aggregated treatment, implicitly assuming that
the target parameter corresponds to a well-defined average of sub-treatment
effects. We show that, even in an ideal scenario for causal inference such as
random assignment, the weights underlying this average have some key
undesirable properties: they are not unique, they can be negative, and, holding
all else constant, these issues become exponentially more likely to occur as
the number of sub-treatments increases and the support of each sub-treatment
grows. We propose approaches to avoid these problems, depending on whether or
not the sub-treatment variables are observed.
arXiv link: http://arxiv.org/abs/2506.22885v1
Doubly robust estimation of causal effects for random object outcomes with continuous treatments
researchers to identify cause-and-effect relationships beyond associations.
While traditionally studied within Euclidean spaces, contemporary applications
increasingly involve complex, non-Euclidean data structures that reside in
abstract metric spaces, known as random objects, such as images, shapes,
networks, and distributions. This paper introduces a novel framework for causal
inference with continuous treatments applied to non-Euclidean data. To address
the challenges posed by the lack of linear structures, we leverage Hilbert
space embeddings of the metric spaces to facilitate Fr\'echet mean estimation
and causal effect mapping. Motivated by a study on the impact of exposure to
fine particulate matter on age-at-death distributions across U.S. counties, we
propose a nonparametric, doubly-debiased causal inference approach for outcomes
as random objects with continuous treatments. Our framework can accommodate
moderately high-dimensional vector-valued confounders and derive efficient
influence functions for estimation to ensure both robustness and
interpretability. We establish rigorous asymptotic properties of the
cross-fitted estimators and employ conformal inference techniques for
counterfactual outcome prediction. Validated through numerical experiments and
applied to real-world environmental data, our framework extends causal
inference methodologies to complex data structures, broadening its
applicability across scientific disciplines.
arXiv link: http://arxiv.org/abs/2506.22754v1
Optimal Estimation of Two-Way Effects under Limited Mobility
sets based on a novel prior that leverages patterns of assortative matching
observed in the data. To capture limited mobility we model the bipartite graph
associated with the matched data in an asymptotic framework where its Laplacian
matrix has small eigenvalues that converge to zero. The prior hyperparameters
that control the shrinkage are determined by minimizing an unbiased risk
estimate. We show the proposed empirical Bayes estimator is asymptotically
optimal in compound loss, despite the weak connectivity of the bipartite graph
and the potential misspecification of the prior. We estimate teacher
values-added from a linked North Carolina Education Research Data Center
student-teacher data set.
arXiv link: http://arxiv.org/abs/2506.21987v1
Multilevel Decomposition of Generalized Entropy Measures Using Constrained Bayes Estimation: An Application to Japanese Regional Data
measures that explicitly accounts for nested population structures such as
national, regional, and subregional levels. Standard approaches that estimate
GE separately at each level do not guarantee compatibility with multilevel
decomposition. Our method constrains lower-level GE estimates to match
higher-level benchmarks while preserving hierarchical relationships across
layers. We apply the method to Japanese income data to estimate GE at the
national, prefectural, and municipal levels, decomposing national inequality
into between-prefecture and within-prefecture inequality, and further
decomposing prefectural GE into between-municipality and within-municipality
inequality.
arXiv link: http://arxiv.org/abs/2506.21213v1
Orthogonality conditions for convex regression
which usually state that the random error term is uncorrelated with the
explanatory variables. In convex regression, the orthogonality conditions for
identification are unknown. Applying Lagrangian duality theory, we establish
the sample orthogonality conditions for convex regression, including additive
and multiplicative formulations of the regression model, with and without
monotonicity and homogeneity constraints. We then propose a hybrid instrumental
variable control function approach to mitigate the impact of potential
endogeneity in convex regression. The superiority of the proposed approach is
shown in a Monte Carlo study and examined in an empirical application to
Chilean manufacturing data.
arXiv link: http://arxiv.org/abs/2506.21110v1
Heterogeneous Exposures to Systematic and Idiosyncratic Risk across Crypto Assets: A Divide-and-Conquer Approach
assets by estimating heterogeneous exposures to idiosyncratic and systematic
risk. A key challenge arises from the latent nature of broader economy-wide
risk sources: macro-financial proxies are unavailable at high-frequencies,
while the abundance of low-frequency candidates offers limited guidance on
empirical relevance. To address this, we develop a two-stage
“divide-and-conquer” approach. The first stage estimates exposures to
high-frequency idiosyncratic and market risk only, using asset-level IV
regressions. The second stage identifies latent economy-wide factors by
extracting the leading principal component from the model residuals and mapping
it to lower-frequency macro-financial uncertainty and sentiment-based
indicators via high-dimensional variable selection. Structured patterns of
heterogeneity in exposures are uncovered using Mean Group estimators across
asset categories. The method is applied to a broad sample of crypto assets,
covering more than 80% of total market capitalization. We document short-term
mean reversion and significant average exposures to idiosyncratic volatility
and illiquidity. Green and DeFi assets are, on average, more exposed to
market-level and economy-wide risk than their non-Green and non-DeFi
counterparts. By contrast, stablecoins are less exposed to idiosyncratic,
market-level, and economy-wide risk factors relative to non-stablecoins. At a
conceptual level, our study develops a coherent framework for isolating
distinct layers of risk in crypto markets. Empirically, it sheds light on how
return sensitivities vary across digital asset categories -- insights that are
important for both portfolio design and regulatory oversight.
arXiv link: http://arxiv.org/abs/2506.21100v1
Wild Bootstrap Inference for Linear Regressions with Many Covariates
establish its asymptotic validity for linear regression models with many
covariates and heteroskedastic errors. Monte Carlo simulations show that the
modified wild bootstrap has excellent finite sample performance compared with
alternative methods that are based on standard normal critical values,
especially when the sample size is small and/or the number of controls is of
the same order of magnitude as the sample size.
arXiv link: http://arxiv.org/abs/2506.20972v1
Analytic inference with two-way clustering
such setups, the commonly used approach has two drawbacks. First, the
corresponding variance estimator is not necessarily positive. Second, inference
is invalid in non-Gaussian regimes, namely when the estimator of the parameter
of interest is not asymptotically Gaussian. We consider a simple fix that
addresses both issues. In Gaussian regimes, the corresponding tests are
asymptotically exact and equivalent to usual ones. Otherwise, the new tests are
asymptotically conservative. We also establish their uniform validity over a
certain class of data generating processes. Independently of our tests, we
highlight potential issues with multiple testing and nonlinear estimators under
two-way clustering. Finally, we compare our approach with existing ones through
simulations.
arXiv link: http://arxiv.org/abs/2506.20749v1
Anytime-Valid Inference in Adaptive Experiments: Covariate Adjustment and Balanced Power
traditional randomized experiments but pose two major challenges: invalid
inference on the Average Treatment Effect (ATE) due to adaptive sampling and
low statistical power for sub-optimal treatments. We address both issues by
extending the Mixture Adaptive Design framework (arXiv:2311.05794). First, we
propose MADCovar, a covariate-adjusted ATE estimator that is unbiased and
preserves anytime-valid inference guarantees while substantially improving ATE
precision. Second, we introduce MADMod, which dynamically reallocates samples
to underpowered arms, enabling more balanced statistical power across
treatments without sacrificing valid inference. Both methods retain MAD's core
advantage of constructing asymptotic confidence sequences (CSs) that allow
researchers to continuously monitor ATE estimates and stop data collection once
a desired precision or significance criterion is met. Empirically, we validate
both methods using simulations and real-world data. In simulations, MADCovar
reduces CS width by up to $60%$ relative to MAD. In a large-scale political
RCT with $\approx32,000$ participants, MADCovar achieves similar precision
gains. MADMod improves statistical power and inferential precision across all
treatment arms, particularly for suboptimal treatments. Simulations show that
MADMod sharply reduces Type II error while preserving the efficiency benefits
of adaptive allocation. Together, MADCovar and MADMod make adaptive experiments
more practical, reliable, and efficient for applied researchers across many
domains. Our proposed methods are implemented through an open-source software
package.
arXiv link: http://arxiv.org/abs/2506.20523v3
Daily Fluctuations in Weather and Economic Growth at the Subnational Level: Evidence from Thailand
subnational economic growth in Thailand. Using annual gross provincial product
(GPP) per capita data from 1982 to 2022 and high-resolution reanalysis weather
data, I estimate fixed-effects panel regressions that isolate plausibly
exogenous within-province year-to-year variation in temperature. The results
indicate a statistically significant inverted-U relationship between
temperature and annual growth in GPP per capita, with adverse effects
concentrated in the agricultural sector. Industrial and service outputs appear
insensitive to short-term weather variation. Distributed lag models suggest
that temperature shocks have persistent effects on growth trajectories,
particularly in lower-income provinces with higher average temperatures. I
combine these estimates with climate projections under RCP4.5 and RCP8.5
emission scenarios to evaluate province-level economic impacts through 2090.
Without adjustments for biases in climate projections or lagged temperature
effects, climate change is projected to reduce per capita output for 63-86% of
Thai population, with median GDP per capita impacts ranging from -4% to +56%
for RCP4.5 and from -52% to -15% for RCP8.5. When correcting for projected
warming biases - but omitting lagged dynamics - median losses increase to
57-63% (RCP4.5) and 80-86% (RCP8.5). Accounting for delayed temperature effects
further raises the upper-bound estimates to near-total loss. These results
highlight the importance of accounting for model uncertainty and temperature
dynamics in subnational climate impact assessments. All projections should be
interpreted with appropriate caution.
arXiv link: http://arxiv.org/abs/2506.20105v2
A Sharp and Robust Test for Selective Reporting
of selective reporting and remains interpretable even when the t-scores are not
exactly normal. The test statistic is the distance between the smoothed
empirical t-curve and the set of all t-curves that would be possible in the
absence of any selective reporting. This novel projection test can only be
evaded in large meta-samples by selective reporting that also evades all other
valid tests of restrictions on the t-curve. A second benefit of the projection
test is that under the null we can interpret the projection residual as noise
plus bias incurred from approximating the t-score's exact distribution with the
normal. Applying the test to the Brodeur et al. (2020) meta-data, we find that
the t-curves for RCTs, IVs, and DIDs are more distorted than could arise by
chance. But an Edgeworth Expansion reveals that these distortions are small
enough to be plausibly explained by the only approximate normality of the
individual t-scores. The detection of selective reporting in this meta-sample
is therefore more fragile than previously known.
arXiv link: http://arxiv.org/abs/2506.20035v1
Single-Index Quantile Factor Model with Observed Characteristics
unknown factor loading functions are linked to a large set of observed
individual-level (e.g., bond- or stock-specific) covariates via a single-index
projection. The single-index specification offers a parsimonious,
interpretable, and statistically efficient way to nonparametrically
characterize the time-varying loadings, while avoiding the curse of
dimensionality in flexible nonparametric models. Using a three-step sieve
estimation procedure, the QCF model demonstrates high in-sample and
out-of-sample accuracy in simulations. We establish asymptotic properties for
estimators of the latent factor, loading functions, and index parameters. In an
empirical study, we analyze the dynamic distributional structure of U.S.
corporate bond returns from 2003 to 2020. Our method outperforms the benchmark
quantile Fama-French five-factor model and quantile latent factor model,
particularly in the tails ($\tau=0.05, 0.95$). The model reveals
state-dependent risk exposures driven by characteristics such as bond and
equity volatility, coupon, and spread. Finally, we provide economic
interpretations of the latent factors.
arXiv link: http://arxiv.org/abs/2506.19586v1
100-Day Analysis of USD/IDR Exchange Rate Dynamics Around the 2025 U.S. Presidential Inauguration
inauguration, non-parametric statistical methods with bootstrap resampling
(10,000 iterations) analyze distributional properties and anomalies. Results
indicate a statistically significant 3.61% Indonesian rupiah depreciation
post-inauguration, with a large effect size (Cliff's Delta $= -0.9224$, CI:
$[-0.9727, -0.8571]$). Central tendency shifted markedly, yet volatility
remained stable (variance ratio $= 0.9061$, $p = 0.504$). Four significant
anomalies exhibiting temporal clustering are detected. These findings provide
quantitative evidence of political transition effects on emerging market
currencies, highlighting implications for monetary policy and currency risk
management.
arXiv link: http://arxiv.org/abs/2506.18738v1
The Persistent Effects of Peru's Mining MITA: Double Machine Learning Approach
in Peru, building on Melissa Dell's foundational work on the enduring effects
of forced labor institutions. The Mita, imposed by the Spanish colonial
authorities from 1573 to 1812, required indigenous communities within a
designated boundary to supply labor to mines, primarily near Potosi. Dell's
original regression discontinuity design (RDD) analysis, leveraging the Mita
boundary to estimate the Mita's legacy on modern economic outcomes, indicates
that regions subjected to the Mita exhibit lower household consumption levels
and higher rates of child stunting. In this paper, I replicate Dell's results
and extend this analysis. I apply Double Machine Learning (DML) methods--the
Partially Linear Regression (PLR) model and the Interactive Regression Model
(IRM)--to further investigate the Mita's effects. DML allows for the inclusion
of high-dimensional covariates and enables more flexible, non-linear modeling
of treatment effects, potentially capturing complex relationships that a
polynomial-based approach may overlook. While the PLR model provides some
additional flexibility, the IRM model allows for fully heterogeneous treatment
effects, offering a nuanced perspective on the Mita's impact across regions and
district characteristics. My findings suggest that the Mita's economic legacy
is more substantial and spatially heterogeneous than originally estimated. The
IRM results reveal that proximity to Potosi and other district-specific factors
intensify the Mita's adverse impact, suggesting a deeper persistence of
regional economic inequality. These findings underscore that machine learning
addresses the realistic non-linearity present in complex, real-world systems.
By modeling hypothetical counterfactuals more accurately, DML enhances my
ability to estimate the true causal impact of historical interventions.
arXiv link: http://arxiv.org/abs/2506.18947v1
Poverty Targeting with Imperfect Information
that policymakers must rely on estimated rather than observed income, which
leads to substantial targeting errors. I propose a statistical decision
framework in which a benevolent planner, subject to a budget constraint and
equipped only with noisy income estimates, allocates cash transfers to the
poorest individuals. In this setting, the commonly used plug-in rule, which
allocates transfers based on point estimates, is inadmissible and uniformly
dominated by a shrinkage-based alternative. Building on this result, I propose
an empirical Bayes (EB) targeting rule. I show that the regret of the empirical
Bayes rule converges at the same rate as that of the posterior mean estimator,
despite applying a nonsmooth transformation to it. Simulations show that the EB
rule delivers large improvements over the plug-in approach in an idealized
setting and modest but consistent gains in a more realistic application.
arXiv link: http://arxiv.org/abs/2506.18188v1
Beyond utility: incorporating eye-tracking, skin conductance and heart rate data into cognitive and econometric travel behaviour models
economic theories (e.g. utility maximisation) that establish relationships
between the choices of individuals, their characteristics, and the attributes
of the alternatives. In a parallel stream, choice models in cognitive
psychology have focused on modelling the decision-making process, but typically
in controlled scenarios. Recent research developments have attempted to bridge
the modelling paradigms, with choice models that are based on psychological
foundations, such as decision field theory (DFT), outperforming traditional
econometric choice models for travel mode and route choice behaviour. The use
of physiological data, which can provide indications about the choice-making
process and mental states, opens up the opportunity to further advance the
models. In particular, the use of such data to enrich 'process' parameters
within a cognitive theory-driven choice model has not yet been explored. This
research gap is addressed by incorporating physiological data into both
econometric and DFT models for understanding decision-making in two different
contexts: stated-preference responses (static) of accomodation choice and
gap-acceptance decisions within a driving simulator experiment (dynamic).
Results from models for the static scenarios demonstrate that both models can
improve substantially through the incorporation of eye-tracking information.
Results from models for the dynamic scenarios suggest that stress measurement
and eye-tracking data can be linked with process parameters in DFT, resulting
in larger improvements in comparison to simpler methods for incorporating this
data in either DFT or econometric models. The findings provide insights into
the value added by physiological data as well as the performance of different
candidate modelling frameworks for integrating such data.
arXiv link: http://arxiv.org/abs/2506.18068v1
An Empirical Comparison of Weak-IV-Robust Procedures in Just-Identified Models
methods for causal inference, as identified by Angrist and Pischke (2008). This
paper compares two leading approaches to inference under weak identification
for just-identified IV models: the classical Anderson-Rubin (AR) procedure and
the recently popular tF method proposed by Lee et al. (2022). Using replication
data from the American Economic Review (AER) and Monte Carlo simulation
experiments, we evaluate the two procedures in terms of statistical
significance testing and confidence interval (CI) length. Empirically, we find
that the AR procedure typically offers higher power and yields shorter CIs than
the tF method. Nonetheless, as noted by Lee et al. (2022), tF has a theoretical
advantage in terms of expected CI length. Our findings suggest that the two
procedures may be viewed as complementary tools in empirical applications
involving potentially weak instruments.
arXiv link: http://arxiv.org/abs/2506.18001v1
Efficient Difference-in-Differences and Event Study Estimators
Study (ES) estimation using short panel data sets within the heterogeneous
treatment effect framework, free from parametric functional form assumptions
and allowing for variation in treatment timing. We provide an equivalent
characterization of the DiD potential outcome model using sequential
conditional moment restrictions on observables, which shows that the DiD
identification assumptions typically imply nonparametric overidentification
restrictions. We derive the semiparametric efficient influence function (EIF)
in closed form for DiD and ES causal parameters under commonly imposed parallel
trends assumptions. The EIF is automatically Neyman orthogonal and yields the
smallest variance among all asymptotically normal, regular estimators of the
DiD and ES parameters. Leveraging the EIF, we propose simple-to-compute
efficient estimators. Our results highlight how to optimally explore different
pre-treatment periods and comparison groups to obtain the tightest (asymptotic)
confidence intervals, offering practical tools for improving inference in
modern DiD and ES applications even in small samples. Calibrated simulations
and an empirical application demonstrate substantial precision gains of our
efficient estimators in finite samples.
arXiv link: http://arxiv.org/abs/2506.17729v1
Leave No One Undermined: Policy Targeting with Regret Aversion
personalized implementation remains rare in practice. We study the problem of
policy targeting for a regret-averse planner when training data gives a rich
set of observable characteristics while the assignment rules can only depend on
its subset. Grounded in decision theory, our regret-averse criterion reflects a
planner's concern about regret inequality across the population, which
generally leads to a fractional optimal rule due to treatment effect
heterogeneity beyond the average treatment effects conditional on the subset
characteristics. We propose a debiased empirical risk minimization approach to
learn the optimal rule from data. Viewing our debiased criterion as a weighted
least squares problem, we establish new upper and lower bounds for the excess
risk, indicating a convergence rate of 1/n and asymptotic efficiency in certain
cases. We apply our approach to the National JTPA Study and the International
Stroke Trial.
arXiv link: http://arxiv.org/abs/2506.16430v1
Fast Learning of Optimal Policy Trees
and Wager, 2021) using discrete optimisation techniques. We test the
performance of our algorithm in finite samples and find an improvement in the
runtime of optimal policy tree learning by a factor of nearly 50 compared to
the original version. We provide an R package, "fastpolicytree", for public
use.
arXiv link: http://arxiv.org/abs/2506.15435v1
On the relationship between prediction intervals, tests of sharp nulls and inference on realized treatment effects in settings with few treated units
on treatment effect homogeneity extend to alternative inferential targets when
treatment effects are heterogeneous -- namely, tests of sharp null hypotheses,
inference on realized treatment effects, and prediction intervals. We show that
inference methods for these alternative targets are deeply interconnected: they
are either equivalent or become equivalent under additional assumptions. Our
results show that methods designed under treatment effect homogeneity can
remain valid for these alternative targets when treatment effects are
stochastic, offering new theoretical justifications and insights on their
applicability.
arXiv link: http://arxiv.org/abs/2506.14998v1
Heterogeneous economic growth vulnerability across Euro Area countries under stressed scenarios
countries under stressed macroeconomic and financial conditions. Vulnerability,
measured as a lower quantile of the growth distribution conditional on EA-wide
and country-specific underlying factors, is found to be higher in Germany,
which is more exposed to EA-wide economic conditions, and in Spain, which has
large country-specific sectoral dynamics. We show that, under stress, financial
factors amplify adverse macroeconomic conditions. Furthermore, even severe
sectoral (financial or macro) shocks, whether common or country-specific, fail
to fully explain the vulnerability observed under overall stress. Our results
underscore the importance of monitoring both local and EA-wide macro-financial
conditions to design effective policies for mitigating growth vulnerability.
arXiv link: http://arxiv.org/abs/2506.14321v1
A break from the norm? Parametric representations of preference heterogeneity for discrete choice models in health
preferences for choices that they make. Discrete choice models try to capture
these distributions. Mixed logits are by far the most commonly used choice
model in health. A raft of parametric model specifications for these models are
available. We test a range of alternatives assumptions, and model averaging, to
test if or how model outputs are impacted. Design: Scoping review of current
modelling practices. Seven alternative distributions, and model averaging over
all distributional assumptions, were compared on four datasets: two were stated
preference, one was revealed preference, and one was simulated. Analyses
examined model fit, preference distributions, willingness-to-pay, and
forecasting. Results: Almost universally, using normal distributions is the
standard practice in health. Alternative distributional assumptions
outperformed standard practice. Preference distributions and the mean
willingness-to-pay varied significantly across specifications, and were seldom
comparable to those derived from normal distributions. Model averaging offered
distributions allowed for greater flexibility, further gains in fit, reproduced
underlying distributions in simulations, and mitigated against analyst bias
arising from distribution selection. There was no evidence that distributional
assumptions impacted predictions from models. Limitations: Our focus was on
mixed logit models since these models are the most common in health, though
latent class models are also used. Conclusions: The standard practice of using
all normal distributions appears to be an inferior approach for capturing
random preference heterogeneity. Implications: Researchers should test
alternative assumptions to normal distributions in their models.
arXiv link: http://arxiv.org/abs/2506.14099v1
Machine Learning-Based Estimation of Monthly GDP
machine learning methods. We apply Multi-Layer Perceptron (MLP), Long
Short-Term Memory networks (LSTM), Extreme Gradient Boosting (XGBoost), and
Elastic Net regression to map monthly indicators to quarterly GDP growth, and
reconcile the outputs with actual aggregates. Using data from China, Germany,
the UK, and the US, our method delivers robust performance across varied data
environments. Benchmark comparisons with prior US studies and UK official
statistics validate its accuracy. We also explore nighttime light as a proxy,
finding its usefulness varies by economic structure. The approach offers a
flexible and data-driven tool for high-frequency macroeconomic monitoring and
policy analysis.
arXiv link: http://arxiv.org/abs/2506.14078v2
Causal Mediation Analysis with Multiple Mediators: A Simulation Approach
relatedly, multiple mediators. In such applications, researchers aim to
estimate a variety of different quantities, including interventional direct and
indirect effects, multivariate natural direct and indirect effects, and/or
path-specific effects. This study introduces a general approach to estimating
all these quantities by simulating potential outcomes from a series of
distribution models for each mediator and the outcome. Building on similar
methods developed for analyses with only a single mediator (Imai et al. 2010),
we first outline how to implement this approach with parametric models. The
parametric implementation can accommodate linear and nonlinear relationships,
both continuous and discrete mediators, and many different types of outcomes.
However, it depends on correct specification of each model used to simulate the
potential outcomes. To address the risk of misspecification, we also introduce
an alternative implementation using a novel class of nonparametric models,
which leverage deep neural networks to approximate the relevant distributions
without relying on strict assumptions about functional form. We illustrate both
methods by reanalyzing the effects of media framing on attitudes toward
immigration (Brader et al. 2008) and the effects of prenatal care on preterm
birth (VanderWeele et al. 2014).
arXiv link: http://arxiv.org/abs/2506.14019v1
High-Dimensional Spatial-Plus-Vertical Price Relationships and Price Transmission: A Machine Learning Approach
through the lens of spatial and vertical price relationships. Classical time
series econometric techniques suffer from the "curse of dimensionality" and are
applied almost exclusively to small sets of price series, either prices of one
commodity in a few regions or prices of a few commodities in one region.
However, an agrifood supply chain usually contains several commodities (e.g.,
cattle and beef) and spans numerous regions. Failing to jointly examine
multi-region, multi-commodity price relationships limits researchers' ability
to derive insights from increasingly high-dimensional price datasets of
agrifood supply chains. We apply a machine-learning method - specifically,
regularized regression - to augment the classical vector error correction model
(VECM) and study large spatial-plus-vertical price systems. Leveraging weekly
provincial-level data on the piglet-hog-pork supply chain in China, we uncover
economically interesting changes in price relationships in the system before
and after the outbreak of a major hog disease. To quantify price transmission
in the large system, we rely on the spatial-plus-vertical price relationships
identified by the regularized VECM to visualize comprehensive spatial and
vertical price transmission of hypothetical shocks through joint impulse
response functions. Price transmission shows considerable heterogeneity across
regions and commodities as the VECM outcomes imply and display different
dynamics over time.
arXiv link: http://arxiv.org/abs/2506.13967v1
Gradient Boosting for Spatial Regression Models with Autoregressive Disturbances
that reflects geographic location and spatial relationships. As a framework for
dealing with the unique nature of spatial data, various spatial regression
models have been introduced. In this article, a novel model-based gradient
boosting algorithm for spatial regression models with autoregressive
disturbances is proposed. Due to the modular nature, the approach provides an
alternative estimation procedure which is feasible even in high-dimensional
settings where established quasi-maximum likelihood or generalized method of
moments estimators do not yield unique solutions. The approach additionally
enables data-driven variable and model selection in low- as well as
high-dimensional settings. Since the bias-variance trade-off is also controlled
in the algorithm, implicit regularization is imposed which improves prediction
accuracy on out-of-sample spatial data. Detailed simulation studies regarding
the performance of estimation, prediction and variable selection in low- and
high-dimensional settings confirm proper functionality of the proposed
methodology. To illustrative the functionality of the model-based gradient
boosting algorithm, a case study is presented where the life expectancy in
German districts is modeled incorporating a potential spatial dependence
structure.
arXiv link: http://arxiv.org/abs/2506.13682v1
Identification of Impulse Response Functions for Nonlinear Dynamic Models
Functions in nonlinear dynamic models and discuss the settings in which the
problem can be mitigated. In particular, we introduce the nonlinear
autoregressive representation with Gaussian innovations and characterize the
identified set. This set arises from the multiplicity of nonlinear innovations
and transformations which leave invariant the standard normal density. We then
discuss possible identifying restrictions, such as non-Gaussianity of
independent sources, or identifiable parameters by means of learning
algorithms, and the possibility of identification in nonlinear dynamic factor
models when the underlying latent factors have different dynamics. We also
explain how these identification results depend ultimately on the set of series
under consideration.
arXiv link: http://arxiv.org/abs/2506.13531v2
Production Function Estimation without Invertibility: Imperfectly Competitive Environments and Demand Shocks
show that the invertibility assumption at its heart is testable. We
characterize what goes wrong if invertibility fails and what can still be done.
We show that rethinking how the estimation procedure is implemented either
eliminates or mitigates the bias that arises if invertibility fails. In
particular, a simple change to the first step of the estimation procedure
provides a first-order bias correction for the GMM estimator in the second
step. Furthermore, a modification of the moment condition in the second step
ensures Neyman orthogonality and enhances efficiency and robustness by
rendering the asymptotic distribution of the GMM estimator invariant to
estimation noise from the first step.
arXiv link: http://arxiv.org/abs/2506.13520v2
Joint Quantile Shrinkage: A State-Space Approach toward Non-Crossing Bayesian Quantile Models
regression models. We propose a new Bayesian modelling framework that penalises
multiple quantile regression functions toward the desired non-crossing space.
We achieve this by estimating multiple quantiles jointly with a prior on
variation across quantiles, a fused shrinkage prior with quantile adaptivity.
The posterior is derived from a decision-theoretic general Bayes perspective,
whose form yields a natural state-space interpretation aligned with
Time-Varying Parameter (TVP) models. Taken together our approach leads to a
Quantile-Varying Parameter (QVP) model, for which we develop efficient sampling
algorithms. We demonstrate that our proposed modelling framework provides
superior parameter recovery and predictive performance compared to competing
Bayesian and frequentist quantile regression estimators in simulated
experiments and a real-data application to multivariate quantile estimation in
macroeconomics.
arXiv link: http://arxiv.org/abs/2506.13257v2
Quantile Peer Effect Models
quantiles of the peer outcome distribution. The model allows peers with low,
intermediate, and high outcomes to exert distinct influences, thereby capturing
more nuanced patterns of peer effects than standard approaches that are based
on aggregate measures. I establish the existence and uniqueness of the Nash
equilibrium and demonstrate that the model parameters can be estimated using a
straightforward instrumental variable strategy. Applying the model to a range
of outcomes that are commonly studied in the literature, I uncover diverse and
rich patterns of peer influences that challenge assumptions inherent in
standard models. These findings carry important policy implications: key player
status in a network depends not only on network structure, but also on the
distribution of outcomes within the population.
arXiv link: http://arxiv.org/abs/2506.12920v1
Rethinking Distributional IVs: KAN-Powered D-IV-LATE & Model Choice
of modern causal inference, allowing researchers to utilise flexible machine
learning models for the estimation of nuisance functions without introducing
first-order bias into the final parameter estimate. However, the choice of
machine learning model for the nuisance functions is often treated as a minor
implementation detail. In this paper, we argue that this choice can have a
profound impact on the substantive conclusions of the analysis. We demonstrate
this by presenting and comparing two distinct Distributional Instrumental
Variable Local Average Treatment Effect (D-IV-LATE) estimators. The first
estimator leverages standard machine learning models like Random Forests for
nuisance function estimation, while the second is a novel estimator employing
Kolmogorov-Arnold Networks (KANs). We establish the asymptotic properties of
these estimators and evaluate their performance through Monte Carlo
simulations. An empirical application analysing the distributional effects of
401(k) participation on net financial assets reveals that the choice of machine
learning model for nuisance functions can significantly alter substantive
conclusions, with the KAN-based estimator suggesting more complex treatment
effect heterogeneity. These findings underscore a critical "caveat emptor". The
selection of nuisance function estimators is not a mere implementation detail.
Instead, it is a pivotal choice that can profoundly impact research outcomes in
causal inference.
arXiv link: http://arxiv.org/abs/2506.12765v2
Dynamic allocation: extremes, tail dependence, and regime Shifts
asset return distribution, we build a sophisticated model to predict the
downside risk of the global financial market. We further develop a dynamic
regime switching model that can forecast real-time risk regime of the market.
Our GARCH-DCC-Copula risk model can significantly improve both risk- and
alpha-based global tactical asset allocation strategies. Our risk regime has
strong predictive power of quantitative equity factor performance, which can
help equity investors to build better factor models and asset allocation
managers to construct more efficient risk premia portfolios.
arXiv link: http://arxiv.org/abs/2506.12587v1
Moment Restrictions for Nonlinear Panel Data Models with Feedback
covariates and time-invariant agent-specific heterogeneity, place strong a
priori restrictions on feedback: how past outcomes, covariates, and
heterogeneity map into future covariate levels. Ruling out feedback entirely,
as often occurs in practice, is unattractive in many dynamic economic settings.
We provide a general characterization of all feedback and heterogeneity robust
(FHR) moment conditions for nonlinear panel data models and present
constructive methods to derive feasible moment-based estimators for specific
models. We also use our moment characterization to compute semiparametric
efficiency bounds, allowing for a quantification of the information loss
associated with accommodating feedback, as well as providing insight into how
to construct estimators with good efficiency properties in practice. Our
results apply both to the finite dimensional parameter indexing the parametric
part of the model as well as to estimands that involve averages over the
distribution of unobserved heterogeneity. We illustrate our methods by
providing a complete characterization of all FHR moment functions in the
multi-spell mixed proportional hazards model. We compute efficient moment
functions for both model parameters and average effects in this setting.
arXiv link: http://arxiv.org/abs/2506.12569v2
Optimal treatment assignment rules under capacity constraints
planner aims to maximize social welfare by assigning treatments based on
observable covariates. Such constraints, common when treatments are costly or
limited in supply, introduce nontrivial challenges for deriving optimal
statistical assignment rules because the planner needs to coordinate treatment
assignment probabilities across the entire covariate distribution. To address
these challenges, we reformulate the planner's constrained maximization problem
as an optimal transport problem, which makes the problem effectively
unconstrained. We then establish local asymptotic optimality results of
assignment rules using a limits of experiments framework. Finally, we
illustrate our method with a voucher assignment problem for private secondary
school attendance using data from Angrist et al. (2006)
arXiv link: http://arxiv.org/abs/2506.12225v2
Partial identification via conditional linear programs: estimation and policy learning
observable data: the data can limit them to a set of plausible values, but not
uniquely determine them. This paper develops a unified framework for
covariate-assisted estimation, inference, and decision making in partial
identification problems where the parameter of interest satisfies a series of
linear constraints, conditional on covariates. In such settings, bounds on the
parameter can be written as expectations of solutions to conditional linear
programs that optimize a linear function subject to linear constraints, where
both the objective function and the constraints may depend on covariates and
need to be estimated from data. Examples include estimands involving the joint
distributions of potential outcomes, policy learning with inequality-aware
value functions, and instrumental variable settings. We propose two de-biased
estimators for bounds defined by conditional linear programs. The first
directly solves the conditional linear programs with plugin estimates and uses
output from standard LP solvers to de-bias the plugin estimate, avoiding the
need for computationally demanding vertex enumeration of all possible solutions
for symbolic bounds. The second uses entropic regularization to create smooth
approximations to the conditional linear programs, trading a small amount of
approximation error for improved estimation and computational efficiency. We
establish conditions for asymptotic normality of both estimators, show that
both estimators are robust to first-order errors in estimating the conditional
constraints and objectives, and construct Wald-type confidence intervals for
the partially identified parameters. These results also extend to policy
learning problems where the value of a decision policy is only partially
identified. We apply our methods to a study on the effects of Medicaid
enrollment.
arXiv link: http://arxiv.org/abs/2506.12215v2
Evaluating Program Sequences with Double Machine Learning: An Application to Labor Market Policies
structure, where individuals may be assigned to various programs over time.
While this complexity is often simplified by analyzing programs at single
points in time, this paper reviews, explains, and applies methods for program
evaluation within a sequential framework. It outlines the assumptions required
for identification under dynamic confounding and demonstrates how extending
sequential estimands to dynamic policies enables the construction of more
realistic counterfactuals. Furthermore, the paper explores recently developed
methods for estimating effects across multiple treatments and time periods,
utilizing Double Machine Learning (DML), a flexible estimator that avoids
parametric assumptions while preserving desirable statistical properties. Using
Swiss administrative data, the methods are demonstrated through an empirical
application assessing the participation of unemployed individuals in active
labor market policies, where assignment decisions by caseworkers can be
reconsidered between two periods. The analysis identifies a temporary wage
subsidy as the most effective intervention, on average, even after adjusting
for its extended duration compared to other programs. Overall, DML-based
analysis of dynamic policies proves to be a useful approach within the program
evaluation toolkit.
arXiv link: http://arxiv.org/abs/2506.11960v1
Structural Representations and Identification of Marginal Policy Effects
effect (MPE) within nonseparable models. We demonstrate that, for a smooth
functional of the outcome distribution, the MPE equals its functional
derivative evaluated at the outcome-conditioned weighted average structural
derivative. This equivalence is definitional rather than identification-based.
Building on this theoretical result, we propose an alternative identification
strategy for the MPE that complements existing methods.
arXiv link: http://arxiv.org/abs/2506.11694v2
Identification and Inference of Partial Effects in Sharp Regression Kink Designs
the distribution of an outcome variable Y . This study examines the
identification and inference of a wide range of partial effects at the
threshold in the sharp regression kink (RK) design under general policy
interventions. We establish a unifying framework for conducting inference on
the effect of an infinitesimal change in D on smooth functionals of the
distribution of Y, particularly when D is endogenous and instrumental variables
are unavailable. This framework yields a general formula that clarifies the
causal interpretation of numerous existing sharp RK estimands in the
literature.
We develop the relevant asymptotic theory, introduce a multiplier bootstrap
procedure for inference, and provide practical implementation guidelines.
Applying our method to the effect of unemployment insurance (UI) benefits on
unemployment duration, we find that while higher benefits lead to longer
durations, they also tend to reduce their dispersion. Furthermore, our results
show that the magnitude of the partial effect can change substantially
depending on the specific form of the policy intervention.
arXiv link: http://arxiv.org/abs/2506.11663v1
Let the Tree Decide: FABART A Non-Parametric Factor Model
Regression Trees (BART) into a Factor-Augmented Vector Autoregressive (FAVAR)
model to forecast macro-financial variables and examine asymmetries in the
transmission of oil price shocks. By employing nonparametric techniques for
dimension reduction, the model captures complex, nonlinear relationships
between observables and latent factors that are often missed by linear
approaches. A simulation experiment comparing FABART to linear alternatives and
a Monte Carlo experiment demonstrate that the framework accurately recovers the
relationship between latent factors and observables in the presence of
nonlinearities, while remaining consistent under linear data-generating
processes. The empirical application shows that FABART substantially improves
forecast accuracy for industrial production relative to linear benchmarks,
particularly during periods of heightened volatility and economic stress. In
addition, the model reveals pronounced sign asymmetries in the transmission of
oil supply news shocks to the U.S. economy, with positive shocks generating
stronger and more persistent contractions in real activity and inflation than
the expansions triggered by negative shocks. A similar pattern emerges at the
U.S. federal state level, where negative shocks lead to modest declines in
employment compared to the substantially larger contractions observed after
positive shocks.
arXiv link: http://arxiv.org/abs/2506.11551v1
Inference on panel data models with a generalized factor structure
models when both factors and factor loadings are accounted for by a
nonparametric function. This general specification encompasses rather popular
models such as the two-way fixed effects and the interactive fixed effects
ones. By applying a conditional mean independence assumption between unobserved
heterogeneity and the covariates, we obtain consistent estimators of the
parameters of interest at the optimal rate of convergence, for fixed and large
$T$. We also provide a specification test for the modeling assumption based on
the methodology of conditional moment tests and nonparametric estimation
techniques. Using degenerate and nondegenerate theories of U-statistics we show
its convergence and asymptotic distribution under the null, and that it
diverges under the alternative at a rate arbitrarily close to $NT$.
Finite sample inference is based on bootstrap. Simulations reveal an excellent
performance of our methods and an empirical application is conducted.
arXiv link: http://arxiv.org/abs/2506.10690v1
Nowcasting the euro area with social media data
context-sensitive signals related to inflation and unemployment in the euro
area from millions of Reddit submissions and comments. We develop daily
indicators that incorporate, in addition to posts, the social interaction among
users. Our empirical results show consistent gains in out-of-sample nowcasting
accuracy relative to daily newspaper sentiment and financial variables,
especially in unusual times such as the (post-)COVID-19 period. We conclude
that the application of AI tools to the analysis of social media, specifically
Reddit, provides useful signals about inflation and unemployment in Europe at
daily frequency and constitutes a useful addition to the toolkit available to
economic forecasters and nowcasters.
arXiv link: http://arxiv.org/abs/2506.10546v1
How much is too much? Measuring divergence from Benford's Law with the Equivalent Contamination Proportion (ECP)
numerical datasets, particularly in accounting, finance, and economics.
However, the statistical tools commonly used for this purpose (such as
Chi-squared, MAD, or KS) suffer from three key limitations: sensitivity to
sample size, lack of interpretability of their scale, and the absence of a
common metric that allows for comparison across different statistics. This
paper introduces the Equivalent Contamination Proportion (ECP) to address these
issues. Defined as the proportion of contamination in a hypothetical
Benford-conforming sample such that the expected value of the divergence
statistic matches the one observed in the actual data, the ECP provides a
continuous and interpretable measure of deviation (ranging from 0 to 1), is
robust to sample size, and offers consistent results across different
divergence statistics under mild conditions. Closed-form and simulation-based
methods are developed for estimating the ECP, and, through a retrospective
analysis of three influential studies, it is shown how the ECP can complement
the information provided by traditional divergence statistics and enhance the
interpretation of results.
arXiv link: http://arxiv.org/abs/2506.09915v2
Estimating the Number of Components in Panel Data Finite Mixture Regression Models with an Application to Production Function Heterogeneity
components in panel data finite mixture regression models with regression
errors independently distributed as normal or more flexible normal mixtures. We
analyze the asymptotic properties of the likelihood ratio test (LRT) and
information criteria (AIC and BIC) for model selection in both conditionally
independent and dynamic panel settings. Unlike cross-sectional normal mixture
models, we show that panel data structures eliminate higher-order degeneracy
problems while retaining issues of unbounded likelihood and infinite Fisher
information. Addressing these challenges, we derive the asymptotic null
distribution of the LRT statistic as the maximum of random variables and
develop a sequential testing procedure for consistent selection of the number
of components. Our theoretical analysis also establishes the consistency of BIC
and the inconsistency of AIC. Empirical application to Chilean manufacturing
data reveals significant heterogeneity in production technology, with
substantial variation in output elasticities of material inputs and
factor-augmented technological processes within narrowly defined industries,
indicating plant-specific variation in production functions beyond
Hicks-neutral technological differences. These findings contrast sharply with
the standard practice of assuming a homogeneous production function and
highlight the necessity of accounting for unobserved plant heterogeneity in
empirical production analysis.
arXiv link: http://arxiv.org/abs/2506.09666v1
Diffusion index forecasts under weaker loadings: PCA, ridge regression, and random projections
possibly weak loadings. The default option to construct forecasts is to
estimate the factors through principal component analysis (PCA) on the
available predictor matrix, and use the estimated factors to forecast the
outcome variable. Alternatively, we can directly relate the outcome variable to
the predictors through either ridge regression or random projections. We
establish that forecasts based on PCA, ridge regression and random projections
are consistent for the conditional mean under the same assumptions on the
strength of the loadings. However, under weaker loadings the convergence rate
is lower for ridge and random projections if the time dimension is small
relative to the cross-section dimension. We assess the relevance of these
findings in an empirical setting by comparing relative forecast accuracy for
monthly macroeconomic and financial variables using different window sizes. The
findings support the theoretical results, and at the same time show that
regularization-based procedures may be more robust in settings not covered by
the developed theory.
arXiv link: http://arxiv.org/abs/2506.09575v1
Fragility in Average Treatment Effect on the Treated under Limited Covariate Support
treated (ATT) under unconfoundedness when covariate overlap is partial. A
formal diagnostic is proposed to characterize empirical support -- the subset
of the covariate space where ATT is point-identified due to the presence of
comparable untreated units. Where support is absent, standard estimators remain
computable but cease to identify meaningful causal parameters. A general
sensitivity framework is developed, indexing identified sets by curvature
constraints on the selection mechanism. This yields a structural selection
frontier tracing the trade-off between assumption strength and inferential
precision. Two diagnostic statistics are introduced: the minimum assumption
strength for sign identification (MAS-SI), and a fragility index that
quantifies the minimal deviation from ignorability required to overturn
qualitative conclusions. Applied to the LaLonde (1986) dataset, the framework
reveals that nearly half the treated strata lack empirical support, rendering
the ATT undefined in those regions. Simulations confirm that ATT estimates may
be stable in magnitude yet fragile in epistemic content. These findings reframe
overlap not as a regularity condition but as a prerequisite for identification,
and recast sensitivity analysis as integral to empirical credibility rather
than auxiliary robustness.
arXiv link: http://arxiv.org/abs/2506.08950v1
Testing Shape Restrictions with Continuous Treatment: A Transformation Model Approach
the dependent variable in a semiparametric transformation model. These tests
can be used to verify monotonicity of the treatment effect, or, equivalently,
concavity/convexity of the outcome with respect to the treatment, in
(quasi-)experimental settings. Our procedure does not require estimation of the
transformation or the distribution of the error terms, thus it is easy to
implement. The statistic takes the form of a U statistic or a localised U
statistic, and we show that critical values can be obtained by bootstrapping.
In our application we test the convexity of loan demand with respect to the
interest rate using experimental data from South Africa.
arXiv link: http://arxiv.org/abs/2506.08914v1
Enterprise value, economic and policy uncertainties: the case of US air carriers
encompasses not only equity but also assets and liabilities, offering a
comprehensive measure of total value, especially for companies with diverse
capital structures. The relationship between economic uncertainty and firm
value is rooted in economic theory, with early studies dating back to Sandmo's
work in 1971 and further elaborated upon by John Kenneth Galbraith in 1977.
Subsequent significant events have underscored the pivotal role of uncertainty
in the financial and economic realm. Using a VAR-MIDAS methodology, analysis of
accumulated impulse responses reveals that the EV of air carrier firms responds
heterogeneously to financial and economic uncertainties, suggesting unique
coping strategies. Most firms exhibit negative reactions to recessionary risks
and economic policy uncertainties. Financial shocks also elicit varied
responses, with positive impacts observed on EV in response to increases in the
current ratio and operating income after depreciation. However, high debt
levels are unfavorably received by the market, leading to negative EV responses
to debt-to-asset ratio shocks. Other financial shocks show mixed or
indeterminate impacts on EV.
arXiv link: http://arxiv.org/abs/2506.07766v1
Economic and Policy Uncertainties and Firm Value: The Case of Consumer Durable Goods
represented by the Tobin's Q (Q) for a group of twelve U.S. durable goods
producers to uncertainties in the US Economy. The results, based on an
estimated panel quantile regressions (PQR) and panel vector autoregressive
MIDAS model (PVM), show that Q for these firms reacts negatively to the
positive shocks to the current ratio, and debt-to-asset ratio and positively to
operating income after depreciation and the quick ratio in most quantiles. The
Q of the firms under study reacts negatively to the economic policy
uncertainty, risk of recession, and inflationary expectation, but positively to
consumer confidence in most quantiles of its distribution. Finally, Granger
causality tests confirm that the uncertainty indicators considered in the study
are significant predictors of changes in the value of these companies as
reflected by Q.
arXiv link: http://arxiv.org/abs/2506.07476v1
Individual Treatment Effect: Prediction Intervals and Sharp Bounds
inference in causal analyses and has been the focus of several recent studies.
In this paper, we describe the intrinsic limits regarding what can be learned
concerning ITEs given data from large randomized experiments. We consider when
a valid prediction interval for the ITE is informative and when it can be
bounded away from zero. The joint distribution over potential outcomes is only
partially identified from a randomized trial. Consequently, to be valid, an ITE
prediction interval must be valid for all joint distribution consistent with
the observed data and hence will in general be wider than that resulting from
knowledge of this joint distribution. We characterize prediction intervals in
the binary treatment and outcome setting, and extend these insights to models
with continuous and ordinal outcomes. We derive sharp bounds on the probability
mass function (pmf) of the individual treatment effect (ITE). Finally, we
contrast prediction intervals for the ITE and confidence intervals for the
average treatment effect (ATE). This also leads to the consideration of Fisher
versus Neyman null hypotheses. While confidence intervals for the ATE shrink
with increasing sample size due to its status as a population parameter,
prediction intervals for the ITE generally do not vanish, leading to scenarios
where one may reject the Neyman null yet still find evidence consistent with
the Fisher null, highlighting the challenges of individualized decision-making
under partial identification.
arXiv link: http://arxiv.org/abs/2506.07469v1
Does Residuals-on-Residuals Regression Produce Representative Estimates of Causal Effects?
observational datasets. The "residuals-on-residuals" regression estimator
(RORR) is especially popular for its simplicity and computational tractability.
However, when treatment effects are heterogeneous, the proper interpretation of
RORR may not be well understood. We show that, for many-valued treatments with
continuous dose-response functions, RORR converges to a conditional
variance-weighted average of derivatives evaluated at points not in the
observed dataset, which generally differs from the Average Causal Derivative
(ACD). Hence, even if all units share the same dose-response function, RORR
does not in general converge to an average treatment effect in the population
represented by the sample. We propose an alternative estimator suitable for
large datasets. We demonstrate the pitfalls of RORR and the favorable
properties of the proposed estimator in both an illustrative numerical example
and an application to real-world data from Netflix.
arXiv link: http://arxiv.org/abs/2506.07462v1
Quantile-Optimal Policy Learning under Unmeasured Confounding
whose reward distribution has the largest $\alpha$-quantile for some $\alpha
\in (0, 1)$. We focus on the offline setting whose generating process involves
unobserved confounders. Such a problem suffers from three main challenges: (i)
nonlinearity of the quantile objective as a functional of the reward
distribution, (ii) unobserved confounding issue, and (iii) insufficient
coverage of the offline dataset. To address these challenges, we propose a
suite of causal-assisted policy learning methods that provably enjoy strong
theoretical guarantees under mild conditions. In particular, to address (i) and
(ii), using causal inference tools such as instrumental variables and negative
controls, we propose to estimate the quantile objectives by solving nonlinear
functional integral equations. Then we adopt a minimax estimation approach with
nonparametric models to solve these integral equations, and propose to
construct conservative policy estimates that address (iii). The final policy is
the one that maximizes these pessimistic estimates. In addition, we propose a
novel regularized policy learning method that is more amenable to computation.
Finally, we prove that the policies learned by these methods are
$\mathscr{O}(n^{-1/2})$ quantile-optimal under a mild coverage
assumption on the offline dataset. Here, $\mathscr{O}(\cdot)$ omits
poly-logarithmic factors. To the best of our knowledge, we propose the first
sample-efficient policy learning algorithms for estimating the quantile-optimal
policy when there exist unmeasured confounding.
arXiv link: http://arxiv.org/abs/2506.07140v1
Inference on the value of a linear program
the objective function and constraints are possibly unknown and must be
estimated from data. We show that many inference problems in partially
identified models can be reformulated in this way. Building on Shapiro (1991)
and Fang and Santos (2019), we develop a pointwise valid inference procedure
for the value of an LP. We modify this pointwise inference procedure to
construct one-sided inference procedures that are uniformly valid over large
classes of data-generating processes. Our results provide alternative testing
procedures for problems considered in Andrews et al. (2023), Cox and Shi
(2023), and Fang et al. (2023) (in the low-dimensional case), and remain valid
when key components--such as the coefficient matrix--are unknown and must be
estimated. Moreover, our framework also accommodates inference on the
identified set of a subvector, in models defined by linear moment inequalities,
and does so under weaker constraint qualifications than those in Gafarov
(2025).
arXiv link: http://arxiv.org/abs/2506.06776v2
Practically significant differences between conditional distribution functions
problem of comparing the conditional distribution functions corresponding to
two samples. In contrast to testing for exact equality, we are interested in
the (null) hypothesis that the $L^2$ distance between the conditional
distribution functions does not exceed a certain threshold in absolute value.
The consideration of these hypotheses is motivated by the observation that in
applications, it is rare, and perhaps impossible, that a null hypothesis of
exact equality is satisfied and that the real question of interest is to detect
a practically significant deviation between the two conditional distribution
functions.
The consideration of a composite null hypothesis makes the testing problem
challenging, and in this paper we develop a pivotal test for such hypotheses.
Our approach is based on self-normalization and therefore requires neither the
estimation of (complicated) variances nor bootstrap approximations. We derive
the asymptotic limit distribution of the (appropriately normalized) test
statistic and show consistency under local alternatives. A simulation study and
an application to German SOEP data reveal the usefulness of the method.
arXiv link: http://arxiv.org/abs/2506.06545v1
Statistical significance in choice modelling: computation, usage and reporting
significance in choice modelling. We argue that, as in many other areas of
science, there is an over-reliance on 95% confidence levels, and
misunderstandings of the meaning of significance. We also observe a lack of
precision in the reporting of measures of uncertainty in many studies,
especially when using p-values and even more so with star measures. The paper
provides a precise discussion on the computation of measures of uncertainty and
confidence intervals, discusses the use of statistical tests, and also stresses
the importance of considering behavioural or policy significance in addition to
statistical significance.
arXiv link: http://arxiv.org/abs/2506.05996v2
On Efficient Estimation of Distributional Treatment Effects under Covariate-Adaptive Randomization
randomized experiments that use covariate-adaptive randomization (CAR). These
include designs such as Efron's biased-coin design and stratified block
randomization, where participants are first grouped into strata based on
baseline covariates and assigned treatments within each stratum to ensure
balance across groups. In practice, datasets often contain additional
covariates beyond the strata indicators. We propose a flexible distribution
regression framework that leverages off-the-shelf machine learning methods to
incorporate these additional covariates, enhancing the precision of
distributional treatment effect estimates. We establish the asymptotic
distribution of the proposed estimator and introduce a valid inference
procedure. Furthermore, we derive the semiparametric efficiency bound for
distributional treatment effects under CAR and demonstrate that our
regression-adjusted estimator attains this bound. Simulation studies and
empirical analyses of microcredit programs highlight the practical advantages
of our method.
arXiv link: http://arxiv.org/abs/2506.05945v1
Admissibility of Completely Randomized Trials: A Large-Deviation Approach
admissible to ignore this option and run a non-adaptive trial instead? We
provide a negative answer to this question in the best-arm identification
problem, where the experimenter aims to allocate measurement efforts
judiciously to confidently deploy the most effective treatment arm. We find
that, whenever there are at least three treatment arms, there exist simple
adaptive designs that universally and strictly dominate non-adaptive completely
randomized trials. This dominance is characterized by a notion called
efficiency exponent, which quantifies a design's statistical efficiency when
the experimental sample is large. Our analysis focuses on the class of batched
arm elimination designs, which progressively eliminate underperforming arms at
pre-specified batch intervals. We characterize simple sufficient conditions
under which these designs universally and strictly dominate completely
randomized trials. These results resolve the second open problem posed in Qin
[2022].
arXiv link: http://arxiv.org/abs/2506.05329v1
Enhancing the Merger Simulation Toolkit with ML/AI
horizontal mergers using ML/AI methods. While standard merger simulation
techniques rely on restrictive assumptions about firm conduct, we propose a
data-driven framework that relaxes these constraints when rich market data are
available. We develop and identify a flexible nonparametric model of supply
that nests a broad range of conduct models and cost functions. To overcome the
curse of dimensionality, we adapt the Variational Method of Moments (VMM)
(Bennett and Kallus, 2023) to estimate the model, allowing for various forms of
strategic interaction. Monte Carlo simulations show that our method
significantly outperforms an array of misspecified models and rivals the
performance of the true model, both in predictive performance and
counterfactual merger simulations. As a way to interpret the economics of the
estimated function, we simulate pass-through and reveal that the model learns
markup and cost functions that imply approximately correct pass-through
behavior. Applied to the American Airlines-US Airways merger, our method
produces more accurate post-merger price predictions than traditional
approaches. The results demonstrate the potential for machine learning
techniques to enhance merger analysis while maintaining economic structure.
arXiv link: http://arxiv.org/abs/2506.05225v1
The Spurious Factor Dilemma: Robust Inference in Heavy-Tailed Elliptical Factor Models
particularly in economics and finance. However, standard methods for
determining the number of factors often overestimate the true number when data
exhibit heavy-tailed randomness, misinterpreting noise-induced outliers as
genuine factors. This paper addresses this challenge within the framework of
Elliptical Factor Models (EFM), which accommodate both heavy tails and
potential non-linear dependencies common in real-world data. We demonstrate
theoretically and empirically that heavy-tailed noise generates spurious
eigenvalues that mimic true factor signals. To distinguish these, we propose a
novel methodology based on a fluctuation magnification algorithm. We show that
under magnifying perturbations, the eigenvalues associated with real factors
exhibit significantly less fluctuation (stabilizing asymptotically) compared to
spurious eigenvalues arising from heavy-tailed effects. This differential
behavior allows the identification and detection of the true and spurious
factors. We develop a formal testing procedure based on this principle and
apply it to the problem of accurately selecting the number of common factors in
heavy-tailed EFMs. Simulation studies and real data analysis confirm the
effectiveness of our approach compared to existing methods, particularly in
scenarios with pronounced heavy-tailedness.
arXiv link: http://arxiv.org/abs/2506.05116v1
Finite-Sample Distortion in Kernel Specification Tests: A Perturbation Analysis of Empirical Directional Components
finite-sample performance of kernel-based specification tests, such as the
Kernel Conditional Moment (KCM) test. Rather than introducing a fundamentally
new test, we isolate and rigorously analyze the finite-sample distortion
arising from the discrepancy between the empirical and population eigenspaces
of the kernel operator. Using perturbation theory for compact operators, we
demonstrate that the estimation error in directional components is governed by
local eigengaps: components associated with small eigenvalues are highly
unstable and contribute primarily noise rather than signal under fixed
alternatives. Although this error vanishes asymptotically under the null, it
can substantially degrade power in finite samples. This insight explains why
the effective power of omnibus kernel tests is often concentrated in a
low-dimensional subspace. We illustrate how truncating unstable high-frequency
components--a natural consequence of our analysis--can improve finite-sample
performance, but emphasize that the core contribution is the diagnostic
understanding of why and when such instability occurs. The
analysis is largely non-asymptotic and applies broadly to reproducing kernel
Hilbert space-based inference.
arXiv link: http://arxiv.org/abs/2506.04900v2
Latent Variable Autoregression with Exogenous Inputs
(C)LARX: a (constrained) latent variable autoregressive model with exogenous
inputs. Two additional contributions are made as a side effect: First, a new
matrix operator is introduced for matrices and vectors with blocks along one
dimension; Second, a new latent variable regression (LVR) framework is proposed
for economics and finance. The empirical section examines how well the stock
market predicts real economic activity in the United States. (C)LARX models
outperform the baseline OLS specification in out-of-sample forecasts and offer
novel analytical insights about the underlying functional relationship.
arXiv link: http://arxiv.org/abs/2506.04488v2
What Makes Treatment Effects Identifiable? Characterizations and Estimators Beyond Unconfoundedness
causal inference rely on the assumptions of unconfoundedness and overlap.
Unconfoundedness requires that the observed covariates account for all
correlations between the outcome and treatment. Overlap requires the existence
of randomness in treatment decisions for all individuals. Nevertheless, many
types of studies frequently violate unconfoundedness or overlap, for instance,
observational studies with deterministic treatment decisions - popularly known
as Regression Discontinuity designs - violate overlap.
In this paper, we initiate the study of general conditions that enable the
identification of the average treatment effect, extending beyond
unconfoundedness and overlap. In particular, following the paradigm of
statistical learning theory, we provide an interpretable condition that is
sufficient and necessary for the identification of ATE. Moreover, this
condition also characterizes the identification of the average treatment effect
on the treated (ATT) and can be used to characterize other treatment effects as
well. To illustrate the utility of our condition, we present several
well-studied scenarios where our condition is satisfied and, hence, we prove
that ATE can be identified in regimes that prior works could not capture. For
example, under mild assumptions on the data distributions, this holds for the
models proposed by Tan (2006) and Rosenbaum (2002), and the Regression
Discontinuity design model introduced by Thistlethwaite and Campbell (1960).
For each of these scenarios, we also show that, under natural additional
assumptions, ATE can be estimated from finite samples.
We believe these findings open new avenues for bridging learning-theoretic
insights and causal inference methodologies, particularly in observational
studies with complex treatment mechanisms.
arXiv link: http://arxiv.org/abs/2506.04194v2
Evaluating Large Language Model Capabilities in Assessing Spatial Econometrics Research
economic soundness and theoretical consistency of empirical findings in spatial
econometrics. We created original and deliberately altered "counterfactual"
summaries from 28 published papers (2005-2024), which were evaluated by a
diverse set of LLMs. The LLMs provided qualitative assessments and structured
binary classifications on variable choice, coefficient plausibility, and
publication suitability. The results indicate that while LLMs can expertly
assess the coherence of variable choices (with top models like GPT-4o achieving
an overall F1 score of 0.87), their performance varies significantly when
evaluating deeper aspects such as coefficient plausibility and overall
publication suitability. The results further revealed that the choice of LLM,
the specific characteristics of the paper and the interaction between these two
factors significantly influence the accuracy of the assessment, particularly
for nuanced judgments. These findings highlight LLMs' current strengths in
assisting with initial, more surface-level checks and their limitations in
performing comprehensive, deep economic reasoning, suggesting a potential
assistive role in peer review that still necessitates robust human oversight.
arXiv link: http://arxiv.org/abs/2506.06377v1
High-Dimensional Learning in Finance
financial prediction using large, over-parameterized models. This paper
provides theoretical foundations and empirical validation for understanding
when and how these methods achieve predictive success. I examine two key
aspects of high-dimensional learning in finance. First, I prove that
within-sample standardization in Random Fourier Features implementations
fundamentally alters the underlying Gaussian kernel approximation, replacing
shift-invariant kernels with training-set dependent alternatives. Second, I
establish information-theoretic lower bounds that identify when reliable
learning is impossible no matter how sophisticated the estimator. A detailed
quantitative calibration of the polynomial lower bound shows that with typical
parameter choices, e.g., 12,000 features, 12 monthly observations, and R-square
2-3%, the required sample size to escape the bound exceeds 25-30 years of
data--well beyond any rolling-window actually used. Thus, observed
out-of-sample success must originate from lower-complexity artefacts rather
than from the intended high-dimensional mechanism.
arXiv link: http://arxiv.org/abs/2506.03780v3
Conventional and Fuzzy Data Envelopment Analysis with deaR
that implements a large number of conventional and fuzzy models, along with
super-efficiency models, cross-efficiency analysis, Malmquist index,
bootstrapping, and metafrontier analysis. It should be noted that deaR is the
only package to date that incorporates Kao-Liu, Guo-Tanaka and possibilistic
fuzzy models. The versatility of the package allows the user to work with
different returns to scale and orientations, as well as to consider special
features, namely non-controllable, non-discretionary or undesirable variables.
Moreover, it includes novel graphical representations that can help the user to
display the results. This paper is a comprehensive description of deaR,
reviewing all implemented models and giving examples of use.
arXiv link: http://arxiv.org/abs/2506.03766v1
Combine and conquer: model averaging for out-of-distribution forecasting
their disposal, ranging from traditional econometric structures to models from
mathematical psychology and data-driven approaches from machine learning. A key
question arises as to how well these different models perform in prediction,
especially when considering trips of different characteristics from those used
in estimation, i.e. out-of-distribution prediction, and whether better
predictions can be obtained by combining insights from the different models.
Across two case studies, we show that while data-driven approaches excel in
predicting mode choice for trips within the distance bands used in estimation,
beyond that range, the picture is fuzzy. To leverage the relative advantages of
the different model families and capitalise on the notion that multiple `weak'
models can result in more robust models, we put forward the use of a model
averaging approach that allocates weights to different model families as a
function of the distance between the characteristics of the trip for
which predictions are made, and those used in model estimation. Overall, we see
that the model averaging approach gives larger weight to models with stronger
behavioural or econometric underpinnings the more we move outside the interval
of trip distances covered in estimation. Across both case studies, we show that
our model averaging approach obtains improved performance both on the
estimation and validation data, and crucially also when predicting mode choices
for trips of distances outside the range used in estimation.
arXiv link: http://arxiv.org/abs/2506.03693v1
Deep Learning Enhanced Multivariate GARCH
named Long Short-Term Memory enhanced BEKK (LSTM-BEKK), that integrates deep
learning into multivariate GARCH processes. By combining the flexibility of
recurrent neural networks with the econometric structure of BEKK models, our
approach is designed to better capture nonlinear, dynamic, and high-dimensional
dependence structures in financial return data. The proposed model addresses
key limitations of traditional multivariate GARCH-based methods, particularly
in capturing persistent volatility clustering and asymmetric co-movement across
assets. Leveraging the data-driven nature of LSTMs, the framework adapts
effectively to time-varying market conditions, offering improved robustness and
forecasting performance. Empirical results across multiple equity markets
confirm that the LSTM-BEKK model achieves superior performance in terms of
out-of-sample portfolio risk forecast, while maintaining the interpretability
from the BEKK models. These findings highlight the potential of hybrid
econometric-deep learning models in advancing financial risk management and
multivariate volatility forecasting.
arXiv link: http://arxiv.org/abs/2506.02796v1
Orthogonality-Constrained Deep Instrumental Variable Model for Causal Effect Estimation
It characterizes heterogeneity by adding interaction features and reduces
redundancy through orthogonal constraints. The model includes two feature
extractors, one for the instrumental variable Z and the other for the covariate
X*. The training process is divided into two stages: the first stage uses the
mean squared error (MSE) loss function, and the second stage incorporates
orthogonal regularization. Experimental results show that this model
outperforms DeepIV and DML in terms of accuracy and stability. Future research
directions include applying the model to real-world problems and handling
scenarios with multiple processing variables.
arXiv link: http://arxiv.org/abs/2506.02790v1
Get me out of this hole: a profile likelihood approach to identifying and avoiding inferior local optima in choice models
local optima when using structures other than a simple linear-in-parameters
logit model. At the same time, there is no consensus on appropriate mechanisms
for addressing this issue. Most analysts seem to ignore the problem, while
others try a set of different starting values, or put their faith in what they
believe to be more robust estimation approaches. This paper puts forward the
use of a profile likelihood approach that systematically analyses the parameter
space around an initial maximum likelihood estimate and tests for the existence
of better local optima in that space. We extend this to an iterative algorithm
which then progressively searches for the best local optimum under given
settings for the algorithm. Using a well known stated choice dataset, we show
how the approach identifies better local optima for both latent class and mixed
logit, with the potential for substantially different policy implications. In
the case studies we conduct, an added benefit of the approach is that the new
solutions exhibit properties that more closely adhere to the property of
asymptotic normality, also highlighting the benefits of the approach in
analysing the statistical properties of a solution.
arXiv link: http://arxiv.org/abs/2506.02722v1
Analysis of Multiple Long-Run Relations in Panel Data Models
sets where the cross section dimension, $n$, is larger than the time series
dimension $T$. This paper proposes a novel methodology that filters out the
short run dynamics using sub-sample time averages as deviations from their
full-sample counterpart, and estimates the number of long-run relations and
their coefficients using eigenvalues and eigenvectors of the pooled covariance
matrix of these sub-sample deviations. We refer to this procedure as pooled
minimum eigenvalue (PME). We show that PME estimator is consistent and
asymptotically normal as $n$ and $T \rightarrow \infty$ jointly, such that
$T\approx n^{d}$, with $d>0$ for consistency and $d>1/2$ for asymptotic
normality. Extensive Monte Carlo studies show that the number of long-run
relations can be estimated with high precision, and the PME estimators have
good size and power properties. The utility of our approach is illustrated by
micro and macro applications using Compustat and Penn World Tables.
arXiv link: http://arxiv.org/abs/2506.02135v3
Stock Market Telepathy: Graph Neural Networks Predicting the Secret Conversations between MINT and G7 Countries
Nigeria, and T\"urkiye), are gaining influence in global stock markets,
although they remain susceptible to the economic conditions of developed
countries like the G7 (Canada, France, Germany, Italy, Japan, the United
Kingdom, and the United States). This interconnectedness and sensitivity of
financial markets make understanding these relationships crucial for investors
and policymakers to predict stock price movements accurately. To this end, we
examined the main stock market indices of G7 and MINT countries from 2012 to
2024, using a recent graph neural network (GNN) algorithm called multivariate
time series forecasting with graph neural network (MTGNN). This method allows
for considering complex spatio-temporal connections in multivariate time
series. In the implementations, MTGNN revealed that the US and Canada are the
most influential G7 countries regarding stock indices in the forecasting
process, and Indonesia and T\"urkiye are the most influential MINT countries.
Additionally, our results showed that MTGNN outperformed traditional methods in
forecasting the prices of stock market indices for MINT and G7 countries.
Consequently, the study offers valuable insights into economic blocks' markets
and presents a compelling empirical approach to analyzing global stock market
dynamics using MTGNN.
arXiv link: http://arxiv.org/abs/2506.01945v1
Life Sequence Transformer: Generative Modelling for Counterfactual Simulation
administrative data, generally depending on strong assumptions or the existence
of suitable control groups, to evaluate policy interventions and estimate
causal effects. We propose a novel approach that leverages the Transformer
architecture to simulate counterfactual life trajectories from large-scale
administrative records. Our contributions are: the design of a novel encoding
method that transforms longitudinal administrative data to sequences and the
proposal of a generative model tailored to life sequences with overlapping
events across life domains. We test our method using data from the Istituto
Nazionale di Previdenza Sociale (INPS), showing that it enables the realistic
and coherent generation of life trajectories. This framework offers a scalable
alternative to classical counterfactual identification strategy, such as
difference-in-differences and synthetic controls, particularly in contexts
where these methods are infeasible or their assumptions unverifiable. We
validate the model's utility by comparing generated life trajectories against
established findings from causal studies, demonstrating its potential to enrich
labour market research and policy evaluation through individual-level
simulations.
arXiv link: http://arxiv.org/abs/2506.01874v1
Spillovers and Effect Attenuation in Firearm Policy Research in the United States
health issue. Because of limited federal action, state policies are
particularly important, and their evaluation informs the actions of other
policymakers. The movement of firearms across state and local borders, however,
can undermine the effectiveness of these policies and have statistical
consequences for their empirical evaluation. This movement causes spillover and
bypass effects of policies, wherein interventions affect nearby control states
and the lack of intervention in nearby states reduces the effectiveness in the
intervention states. While some causal inference methods exist to account for
spillover effects and reduce bias, these do not necessarily align well with the
data available for firearm research or with the most policy-relevant estimands.
Integrated data infrastructure and new methods are necessary for a better
understanding of the effects these policies would have if widely adopted. In
the meantime, appropriately understanding and interpreting effect estimates
from quasi-experimental analyses is crucial for ensuring that effective
policies are not dismissed due to these statistical challenges.
arXiv link: http://arxiv.org/abs/2506.01695v1
Large Bayesian VARs for Binary and Censored Variables
and continuous variables, and develop an efficient estimation approach that
scales well to high-dimensional settings. In an out-of-sample forecasting
exercise, we show that the proposed VARs forecast recessions and short-term
interest rates well. We demonstrate the utility of the proposed framework using
a wide rage of empirical applications, including conditional forecasting and a
structural analysis that examines the dynamic effects of a financial shock on
recession probabilities.
arXiv link: http://arxiv.org/abs/2506.01422v1
Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks
requiring human expertise? This paper evaluates AI agents' capability to master
econometrics, focusing on empirical analysis performance. We develop an
“Econometrics AI Agent” built on the open-source MetaGPT framework. This
agent exhibits outstanding performance in: (1) planning econometric tasks
strategically, (2) generating and executing code, (3) employing error-based
reflection for improved robustness, and (4) allowing iterative refinement
through multi-round conversations. We construct two datasets from academic
coursework materials and published research papers to evaluate performance
against real-world challenges. Comparative testing shows our domain-specialized
AI agent significantly outperforms both benchmark large language models (LLMs)
and general-purpose AI agents. This work establishes a testbed for exploring
AI's impact on social science research and enables cost-effective integration
of domain expertise, making advanced econometric methods accessible to users
with minimal coding skills. Furthermore, our AI agent enhances research
reproducibility and offers promising pedagogical applications for econometrics
teaching.
arXiv link: http://arxiv.org/abs/2506.00856v2
Learning from Double Positive and Unlabeled Data for Potential-Customer Identification
targeted marketing by applying learning from positive and unlabeled data (PU
learning). We consider a scenario in which a company sells a product and can
observe only the customers who purchased it. Decision-makers seek to market
products effectively based on whether people have loyalty to the company.
Individuals with loyalty are those who are likely to remain interested in the
company even without additional advertising. Consequently, those loyal
customers would likely purchase from the company if they are interested in the
product. In contrast, people with lower loyalty may overlook the product or buy
similar products from other companies unless they receive marketing attention.
Therefore, by focusing marketing efforts on individuals who are interested in
the product but do not have strong loyalty, we can achieve more efficient
marketing. To achieve this goal, we consider how to learn, from limited data, a
classifier that identifies potential customers who (i) have interest in the
product and (ii) do not have loyalty to the company. Although our algorithm
comprises a single-stage optimization, its objective function implicitly
contains two losses derived from standard PU learning settings. For this
reason, we refer to our approach as double PU learning. We verify the validity
of the proposed algorithm through numerical experiments, confirming that it
functions appropriately for the problem at hand.
arXiv link: http://arxiv.org/abs/2506.00436v2
Residual Income Valuation and Stock Returns: Evidence from a Value-to-Price Investment Strategy
returns and consist of companies that are undervalued for prolonged periods.
Results, for the US market show that high V/P portfolios outperform low V/P
portfolios across horizons extending from one to three years. The V/P ratio is
positively correlated to future stock returns after controlling for firm
characteristics, which are well known risk proxies. Findings also indicate that
profitability and investment add explanatory power to the Fama and French three
factor model and for stocks with V/P ratio close to 1. However, these factors
cannot explain all variation in excess returns especially for years two and
three and for stocks with high V/P ratio. Finally, portfolios with the highest
V/P stocks select companies that are significantly mispriced relative to their
equity (investment) and profitability growth persistence in the future.
arXiv link: http://arxiv.org/abs/2506.00206v1
Aligning Language Models with Observational Data: Opportunities and Risks from a Causal Perspective
content that contributes directly to key performance metrics, such as
conversion rates. Pretrained models, however, often fall short when it comes to
aligning with human preferences or optimizing for business objectives. As a
result, fine-tuning with good-quality labeled data is essential to guide models
to generate content that achieves better results. Controlled experiments, like
A/B tests, can provide such data, but they are often expensive and come with
significant engineering and logistical challenges. Meanwhile, companies have
access to a vast amount of historical (observational) data that remains
underutilized. In this work, we study the challenges and opportunities of
fine-tuning LLMs using observational data. We show that while observational
outcomes can provide valuable supervision, directly fine-tuning models on such
data can lead them to learn spurious correlations. We present empirical
evidence of this issue using various real-world datasets and propose
DeconfoundLM, a method that explicitly removes the effect of known confounders
from reward signals. Using simulation experiments, we demonstrate that
DeconfoundLM improves the recovery of causal relationships and mitigates
failure modes found in fine-tuning methods that ignore or naively incorporate
confounding variables. Our findings highlight that while observational data
presents risks, with the right causal corrections, it can be a powerful source
of signal for LLM alignment. Please refer to the project page for code and
related resources.
arXiv link: http://arxiv.org/abs/2506.00152v1
Data Fusion for Partial Identification of Causal Effects
to improve learning, generalization, and decision making across data sciences.
In causal inference, these methods leverage rich observational data to improve
causal effect estimation, while maintaining the trustworthiness of randomized
controlled trials. Existing approaches often relax the strong no unobserved
confounding assumption by instead assuming exchangeability of counterfactual
outcomes across data sources. However, when both assumptions simultaneously
fail - a common scenario in practice - current methods cannot identify or
estimate causal effects. We address this limitation by proposing a novel
partial identification framework that enables researchers to answer key
questions such as: Is the causal effect positive or negative? and How severe
must assumption violations be to overturn this conclusion? Our approach
introduces interpretable sensitivity parameters that quantify assumption
violations and derives corresponding causal effect bounds. We develop doubly
robust estimators for these bounds and operationalize breakdown frontier
analysis to understand how causal conclusions change as assumption violations
increase. We apply our framework to the Project STAR study, which investigates
the effect of classroom size on students' third-grade standardized test
performance. Our analysis reveals that the Project STAR results are robust to
simultaneous violations of key assumptions, both on average and across various
subgroups of interest. This strengthens confidence in the study's conclusions
despite potential unmeasured biases in the data.
arXiv link: http://arxiv.org/abs/2505.24296v1
A Gibbs Sampler for Efficient Bayesian Inference in Sign-Identified SVARs
autoregressions (SVARs) identified with sign restrictions. The key insight of
our algorithm is to break apart from the accept-reject tradition associated
with sign-identified SVARs. We show that embedding an elliptical slice sampling
within a Gibbs sampler approach can deliver dramatic gains in speed and turn
previously infeasible applications into feasible ones. We provide a tractable
example to illustrate the power of the elliptical slice sampling applied to
sign-identified SVARs. We demonstrate the usefulness of our algorithm by
applying it to a well-known small-SVAR model of the oil market featuring a
tight identified set, as well as to a large SVAR model with more than 100 sign
restrictions.
arXiv link: http://arxiv.org/abs/2505.23542v2
Evaluating financial tail risk forecasts: Testing Equal Predictive Ability
properties of the Diebold-Mariano (DM) test by Diebold and Mariano (1995) and
the model confidence set (MCS) testing procedure by Hansen et al. (2011)
applied to the asymmetric loss functions specific to financial tail risk
forecasts, such as Value-at-Risk (VaR) and Expected Shortfall (ES). We focus on
statistical loss functions that are strictly consistent in the sense of
Gneiting (2011a). We find that the tests show little power against models that
underestimate the tail risk at the most extreme quantile levels, while the
finite sample properties generally improve with the quantile level and the
out-of-sample size. For the small quantile levels and out-of-sample sizes of up
to two years, we observe heavily skewed test statistics and non-negligible type
III errors, which implies that researchers should be cautious about using
standard normal or bootstrapped critical values. We demonstrate both
empirically and theoretically how these unfavorable finite sample results
relate to the asymmetric loss functions and the time varying volatility
inherent in financial return data.
arXiv link: http://arxiv.org/abs/2505.23333v1
A Synthetic Business Cycle Approach to Counterfactual Analysis with Nonstationary Macroeconomic Data
inference in macroeconomic settings when dealing with possibly nonstationary
data. While the synthetic control approach has gained popularity for estimating
counterfactual outcomes, we caution researchers against assuming a common
nonstationary trend factor across units for macroeconomic outcomes, as doing so
may result in misleading causal estimation-a pitfall we refer to as the
spurious synthetic control problem. To address this issue, we propose a
synthetic business cycle framework that explicitly separates trend and cyclical
components. By leveraging the treated unit's historical data to forecast its
trend and using control units only for cyclical fluctuations, our
divide-and-conquer strategy eliminates spurious correlations and improves the
robustness of counterfactual prediction in macroeconomic applications. As
empirical illustrations, we examine the cases of German reunification and the
handover of Hong Kong, demonstrating the advantages of the proposed approach.
arXiv link: http://arxiv.org/abs/2505.22388v1
Causal Inference for Experiments with Latent Outcomes: Key Results and Their Implications for Design and Analysis
outcome is measured in multiple ways but each measure contains some degree of
error? We describe modeling approaches that enable researchers to identify
causal parameters of interest, suggest ways that experimental designs can be
augmented so as to make linear latent variable models more credible, and
discuss empirical tests of key modeling assumptions. We show that when
experimental researchers invest appropriately in multiple outcome measures, an
optimally weighted index of the outcome measures enables researchers to obtain
efficient and interpretable estimates of causal parameters by applying standard
regression methods, and that weights may be obtained using instrumental
variables regression. Maximum likelihood and generalized method of moments
estimators can be used to obtain estimates and standard errors in a single
step. An empirical application illustrates the gains in precision and
robustness that multiple outcome measures can provide.
arXiv link: http://arxiv.org/abs/2505.21909v2
Learning Individual Behavior in Agent-Based Models with Graph Diffusion Networks
in complex systems. In ABMs, agent behaviors are governed by local interactions
and stochastic rules. However, these rules are, in general, non-differentiable,
limiting the use of gradient-based methods for optimization, and thus
integration with real-world data. We propose a novel framework to learn a
differentiable surrogate of any ABM by observing its generated data. Our method
combines diffusion models to capture behavioral stochasticity and graph neural
networks to model agent interactions. Distinct from prior surrogate approaches,
our method introduces a fundamental shift: rather than approximating
system-level outputs, it models individual agent behavior directly, preserving
the decentralized, bottom-up dynamics that define ABMs. We validate our
approach on two ABMs (Schelling's segregation model and a Predator-Prey
ecosystem) showing that it replicates individual-level patterns and accurately
forecasts emergent dynamics beyond training. Our results demonstrate the
potential of combining diffusion models and graph learning for data-driven ABM
simulation.
arXiv link: http://arxiv.org/abs/2505.21426v1
Conditional Method Confidence Set
to select the best subset of forecasting methods with equal predictive ability
conditional on a specific economic regime. The test resembles the Model
Confidence Set by Hansen et al. (2011) and is adapted for conditional forecast
evaluation. We show the asymptotic validity of the proposed test and illustrate
its properties in a simulation study. The proposed testing procedure is
particularly suitable for stress-testing of financial risk models required by
the regulators. We showcase the empirical relevance of the CMCS using the
stress-testing scenario of Expected Shortfall. The empirical evidence suggests
that the proposed CMCS procedure can be used as a robust tool for forecast
evaluation of market risk models for different economic regimes.
arXiv link: http://arxiv.org/abs/2505.21278v1
Nonparametric "rich covariates" without saturation
variables estimators satisfy the rich-covariates condition emphasized by
Blandhol et al. (2025), even when the instrument is not unconditionally
randomly assigned and the model is not saturated. Both approaches start with a
nonparametric estimate of the expectation of the instrument conditional on the
covariates, and ensure that the rich-covariates condition is satisfied either
by using as the instrument the difference between the original instrument and
its estimated conditional expectation, or by adding the estimated conditional
expectation to the set of regressors. We derive asymptotic properties when the
first step uses kernel regression, and assess finite-sample performance in
simulations where we also use neural networks in the first step. Finally, we
present an empirical illustration that highlights some significant advantages
of the proposed methods.
arXiv link: http://arxiv.org/abs/2505.21213v2
Debiased Ill-Posed Regression
restricted by the statistical model only through a conditional moment
restriction. Prominent examples include the nonparametric instrumental variable
framework for estimating the structural function of the outcome variable, and
the proximal causal inference framework for estimating the bridge functions. A
common strategy in the literature is to find the minimizer of the projected
mean squared error. However, this approach can be sensitive to misspecification
or slow convergence rate of the estimators of the involved nuisance components.
In this work, we propose a debiased estimation strategy based on the influence
function of a modification of the projected error and demonstrate its
finite-sample convergence rate. Our proposed estimator possesses a second-order
bias with respect to the involved nuisance functions and a desirable robustness
property with respect to the misspecification of one of the nuisance functions.
The proposed estimator involves a hyper-parameter, for which the optimal value
depends on potentially unknown features of the underlying data-generating
process. Hence, we further propose a hyper-parameter selection approach based
on cross-validation and derive an error bound for the resulting estimator. This
analysis highlights the potential rate loss due to hyper-parameter selection
and underscore the importance and advantages of incorporating debiasing in this
setting. We also study the application of our approach to the estimation of
regular parameters in a specific parameter class, which are linear functionals
of the solutions to the conditional moment restrictions and provide sufficient
conditions for achieving root-n consistency using our debiased estimator.
arXiv link: http://arxiv.org/abs/2505.20787v1
Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models
causal panel data models, in the presence of covariate effects. We propose a
novel Covariate-Adjusted Deep Causal Learning (CoDEAL) for panel data models,
that employs flexible model structures and powerful neural network
architectures to cohesively deal with the underlying heterogeneity and
nonlinearity of both panel units and covariate effects. The proposed CoDEAL
integrates nonlinear covariate effect components (parameterized by a
feed-forward neural network) with nonlinear factor structures (modeled by a
multi-output autoencoder) to form a heterogeneous causal panel model. The
nonlinear covariate component offers a flexible framework for capturing the
complex influences of covariates on outcomes. The nonlinear factor analysis
enables CoDEAL to effectively capture both cross-sectional and temporal
dependencies inherent in the data panel. This latent structural information is
subsequently integrated into a customized matrix completion algorithm, thereby
facilitating more accurate imputation of missing counterfactual outcomes.
Moreover, the use of a multi-output autoencoder explicitly accounts for
heterogeneity across units and enhances the model interpretability of the
latent factors. We establish theoretical guarantees on the convergence of the
estimated counterfactuals, and demonstrate the compelling performance of the
proposed method using extensive simulation studies and a real data application.
arXiv link: http://arxiv.org/abs/2505.20536v1
Intraday Functional PCA Forecasting of Cryptocurrency Returns
functions of intraday returns on Bitcoin. We show that improved interval
forecasts of future return functions are obtained when the conditional
heteroscedasticity of return functions is taken into account. The
Karhunen-Loeve (KL) dynamic factor model is introduced to bridge the functional
and discrete time dynamic models. It offers a convenient framework for
functional time series analysis. For intraday forecasting, we introduce a new
algorithm based on the FPCA applied by rolling, which can be used for any data
observed continuously 24/7. The proposed FPCA forecasting methods are applied
to return functions computed from data sampled hourly and at 15-minute
intervals. Next, the functional forecasts evaluated at discrete points in time
are compared with the forecasts based on other methods, including machine
learning and a traditional ARMA model. The proposed FPCA-based methods perform
well in terms of forecast accuracy and outperform competitors in terms of
directional (sign) of return forecasts at fixed points in time.
arXiv link: http://arxiv.org/abs/2505.20508v1
Large structural VARs with multiple linear shock and impact inequality restrictions
features a factor structure in the error terms and accommodates a large number
of linear inequality restrictions on impact impulse responses, structural
shocks, and their element-wise products. In particular, we demonstrate that
narrative restrictions can be imposed via constraints on the structural shocks,
which can be used to sharpen inference and disentangle structurally
interpretable shocks. To estimate the model, we develop a highly efficient
sampling algorithm that scales well with both the model dimension and the
number of inequality restrictions on impact responses and structural shocks. It
remains computationally feasible even in settings where existing algorithms may
break down. To illustrate the practical utility of our approach, we identify
five structural shocks and examine the dynamic responses of thirty
macroeconomic variables, highlighting the model's flexibility and feasibility
in complex empirical applications. We provide empirical evidence that financial
shocks are the most important driver of business cycle dynamics.
arXiv link: http://arxiv.org/abs/2505.19244v2
Comparative analysis of financial data differentiation techniques using LSTM neural network
fractional differencing method and its tempered extension as methods of data
preparation before their usage in advanced machine learning models.
Differencing parameters are estimated using multiple techniques. The empirical
investigation is conducted on data from four major stock indices covering the
most recent 10-year period. The set of explanatory variables is additionally
extended with technical indicators. The effectiveness of the differencing
methods is evaluated using both forecast error metrics and risk-adjusted return
trading performance metrics. The findings suggest that fractional
differentiation methods provide a suitable data transformation technique,
improving the predictive model forecasting performance. Furthermore, the
generated predictions appeared to be effective in constructing profitable
trading strategies for both individual assets and a portfolio of stock indices.
These results underline the importance of appropriate data transformation
techniques in financial time series forecasting, supporting the application of
memory-preserving techniques.
arXiv link: http://arxiv.org/abs/2505.19243v1
Potential Outcome Modeling and Estimation in DiD Designs with Staggered Treatments
designs with multiple time periods and variation in treatment timing.
Importantly, the modeling respects the two key identifying assumptions:
parallel trends and noanticipation. We then introduce a straightforward
Bayesian approach for estimation and inference of the time-varying group
specific Average Treatment Effects on the Treated (ATT). To improve parsimony
and guide prior elicitation, we reparametrize the model in a way that reduces
the effective number of parameters. Prior information about the ATT's is
incorporated through black-box training sample priors and, in small-sample
settings, by thick-tailed t-priors that shrink ATT's of small magnitudes toward
zero. We provide a computationally efficient Bayesian estimation procedure and
establish a Bernstein-von Mises-type result that justifies posterior inference
for the treatment effects. Simulation studies confirm that our method performs
well in both large and small samples, offering credible uncertainty
quantification even in settings that challenge standard estimators. We
illustrate the practical value of the method through an empirical application
that examines the effect of minimum wage increases on teen employment in the
United States.
arXiv link: http://arxiv.org/abs/2505.18391v2
Bayesian Deep Learning for Discrete Choice
in contexts such as transportation choices, political elections, and consumer
preferences. DCMs play a central role in applied econometrics by enabling
inference on key economic variables, such as marginal rates of substitution,
rather than focusing solely on predicting choices on new unlabeled data.
However, while traditional DCMs offer high interpretability and support for
point and interval estimation of economic quantities, these models often
underperform in predictive tasks compared to deep learning (DL) models. Despite
their predictive advantages, DL models remain largely underutilized in discrete
choice due to concerns about their lack of interpretability, unstable parameter
estimates, and the absence of established methods for uncertainty
quantification. Here, we introduce a deep learning model architecture
specifically designed to integrate with approximate Bayesian inference methods,
such as Stochastic Gradient Langevin Dynamics (SGLD). Our proposed model
collapses to behaviorally informed hypotheses when data is limited, mitigating
overfitting and instability in underspecified settings while retaining the
flexibility to capture complex nonlinear relationships when sufficient data is
available. We demonstrate our approach using SGLD through a Monte Carlo
simulation study, evaluating both predictive metrics--such as out-of-sample
balanced accuracy--and inferential metrics--such as empirical coverage for
marginal rates of substitution interval estimates. Additionally, we present
results from two empirical case studies: one using revealed mode choice data in
NYC, and the other based on the widely used Swiss train choice stated
preference data.
arXiv link: http://arxiv.org/abs/2505.18077v1
Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions
emulate individual human behavior, holds great promise for research in AI,
social science, and digital experimentation. However, progress in this area has
been hindered by the scarcity of real, individual-level datasets that are both
large and publicly available. This lack of high-quality ground truth limits
both the development and validation of digital twin methodologies. To address
this gap, we introduce a large-scale, public dataset designed to capture a rich
and holistic view of individual human behavior. We survey a representative
sample of $N = 2,058$ participants (average 2.42 hours per person) in the US
across four waves with 500 questions in total, covering a comprehensive battery
of demographic, psychological, economic, personality, and cognitive measures,
as well as replications of behavioral economics experiments and a pricing
survey. The final wave repeats tasks from earlier waves to establish a
test-retest accuracy baseline. Initial analyses suggest the data are of high
quality and show promise for constructing digital twins that predict human
behavior well at the individual and aggregate levels. By making the full
dataset publicly available, we aim to establish a valuable testbed for the
development and benchmarking of LLM-based persona simulations. Beyond LLM
applications, due to its unique breadth and scale the dataset also enables
broad social science research, including studies of cross-construct
correlations and heterogeneous treatment effects.
arXiv link: http://arxiv.org/abs/2505.17479v1
Analysis of Distributional Dynamics for Repeated Cross-Sectional and Intra-Period Observations
distributions, which accommodate both cross-sectional distributions of repeated
panels and intra-period distributions of a time series observed at high
frequency. In our approach, densities of the state distributions are regarded
as functional elements in a Hilbert space, and are assumed to follow a
functional autoregressive model. We propose an estimator for the autoregressive
operator, establish its consistency, and provide tools and asymptotics to
analyze the forecast of state density and the moment dynamics of state
distributions. We apply our methodology to study the time series of
distributions of the GBP/USD exchange rate intra-month returns and the time
series of cross-sectional distributions of the NYSE stocks monthly returns.
Finally, we conduct simulations to evaluate the density forecasts based on our
model.
arXiv link: http://arxiv.org/abs/2505.15763v1
SplitWise Regression: Stepwise Modeling with Adaptive Dummy Encoding
remains a persistent challenge in regression modeling. We introduce SplitWise,
a novel framework that enhances stepwise regression. It adaptively transforms
numeric predictors into threshold-based binary features using shallow decision
trees, but only when such transformations improve model fit, as assessed by the
Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC).
This approach preserves the transparency of linear models while flexibly
capturing nonlinear effects. Implemented as a user-friendly R package,
SplitWise is evaluated on both synthetic and real-world datasets. The results
show that it consistently produces more parsimonious and generalizable models
than traditional stepwise and penalized regression techniques.
arXiv link: http://arxiv.org/abs/2505.15423v1
Dynamic Decision-Making under Model Misspecification
parameter space when the functional form of conditional expected rewards is
misspecified. Traditional algorithms, such as Thompson Sampling, guarantee
neither an $O(e^{-T})$ rate of posterior parameter concentration nor an
$O(T^{-1})$ rate of average regret. However, under mild conditions, we can
still achieve an exponential convergence rate of the parameter to a pseudo
truth set, an extension of the pseudo truth parameter concept introduced by
White (1982). I further characterize the necessary conditions for the
convergence of the expected posterior within this pseudo-truth set. Simulations
demonstrate that while the maximum a posteriori (MAP) estimate of the
parameters fails to converge under misspecification, the algorithm's average
regret remains relatively robust compared to the correctly specified case.
These findings suggest opportunities to design simple yet robust algorithms
that achieve desirable outcomes even in the presence of model
misspecifications.
arXiv link: http://arxiv.org/abs/2505.14913v1
Bubble Detection with Application to Green Bubbles: A Noncausal Approach
and noncausal processes and their tail process representation during explosive
episodes. Departing from traditional definitions of bubbles as nonstationary
and temporarily explosive processes, we adopt a perspective in which prices are
viewed as following a strictly stationary process, with the bubble considered
an intrinsic component of its non-linear dynamics. We illustrate our approach
on the phenomenon referred to as the "green bubble" in the field of renewable
energy investment.
arXiv link: http://arxiv.org/abs/2505.14911v1
The Post Double LASSO for Efficiency Analysis
milieus. One area that has not seen as much attention to these important topics
yet is efficiency analysis. We show how the availability of big (wide) data can
actually make detection of inefficiency more challenging. We then show how
machine learning methods can be leveraged to adequately estimate the primitives
of the frontier itself as well as inefficiency using the `post double LASSO' by
deriving Neyman orthogonal moment conditions for this problem. Finally, an
application is presented to illustrate key differences of the post-double LASSO
compared to other approaches.
arXiv link: http://arxiv.org/abs/2505.14282v1
The Impact of Research and Development (R&D) Expenditures on the Value Added in the Agricultural Sector of Iran
the value added of the agricultural sector in Iran was investigated for the
period 1971-2021. For data analysis, the researchers utilized the ARDL
econometric model and EViews software. The results indicated that R&D
expenditures, both in the short and long run, have a significant positive
effect on the value added in the agricultural sector. The estimated elasticity
coefficient for R&D expenditures in the short run was 0.45 and in the long run
was 0.35, indicating that with a 1 percent increase in research and development
expenditures, the value added in the agricultural sector would increase by 0.45
percent in the short run and by 0.35 percent in the long run. Moreover,
variables such as capital stock, number of employees in the agricultural
sector, and working days also had a significant and positive effect on the
value added in the agricultural sector.
arXiv link: http://arxiv.org/abs/2505.14746v1
Adaptive stable distribution and Hurst exponent by method of moments moving estimator for nonstationary time series
classical approaches like ARMA-ARCH there is assumed some arbitrarily chosen
dependence type. To avoid their bias, we will focus on novel more agnostic
approach: moving estimator, which estimates parameters separately for every
time $t$: optimizing $F_t=\sum_{\tau<t} (1-\eta)^{t-\tau} \ln(\rho_\theta
(x_\tau))$ local log-likelihood with exponentially weakening weights of the old
values. In practice such moving estimates can be found by EMA (exponential
moving average) of some parameters, like $m_p=E[|x-\mu|^p]$ absolute central
moments, updated by $m_{p,t+1} = m_{p,t} + \eta (|x_t-\mu_t|^p-m_{p,t})$. We
will focus here on its applications for alpha-Stable distribution, which also
influences Hurst exponent, hence can be used for its adaptive estimation. Its
application will be shown on financial data as DJIA time series - beside
standard estimation of evolution of center $\mu$ and scale parameter $\sigma$,
there is also estimated evolution of $\alpha$ parameter allowing to
continuously evaluate market stability - tails having $\rho(x) \sim
1/|x|^{\alpha+1}$ behavior, controlling probability of potentially dangerous
extreme events.
arXiv link: http://arxiv.org/abs/2506.05354v1
Quantum Reservoir Computing for Realized Volatility Forecasting
significantly enhance the analysis and forecasting of complex classical data.
Among these, quantum reservoir computing has emerged as a particularly powerful
approach, combining quantum computation with machine learning for modeling
nonlinear temporal dependencies in high-dimensional time series. As with many
data-driven disciplines, quantitative finance and econometrics can hugely
benefit from emerging quantum technologies. In this work, we investigate the
application of quantum reservoir computing for realized volatility forecasting.
Our model employs a fully connected transverse-field Ising Hamiltonian as the
reservoir with distinct input and memory qubits to capture temporal
dependencies. The quantum reservoir computing approach is benchmarked against
several econometric models and standard machine learning algorithms. The models
are evaluated using multiple error metrics and the model confidence set
procedures. To enhance interpretability and mitigate current quantum hardware
limitations, we utilize wrapper-based forward selection for feature selection,
identifying optimal subsets, and quantifying feature importance via Shapley
values. Our results indicate that the proposed quantum reservoir approach
consistently outperforms benchmark models across various metrics, highlighting
its potential for financial forecasting despite existing quantum hardware
constraints. This work serves as a proof-of-concept for the applicability of
quantum computing in econometrics and financial analysis, paving the way for
further research into quantum-enhanced predictive modeling as quantum hardware
capabilities continue to advance.
arXiv link: http://arxiv.org/abs/2505.13933v1
Valid Post-Contextual Bandit Inference
stochastic contextual multi-armed bandit problem (CMAB), which is widely
employed in adaptively randomized experiments across various fields. While
algorithms for maximizing rewards or, equivalently, minimizing regret have
received considerable attention, our focus centers on statistical inference
with adaptively collected data under the CMAB model. To this end we derive the
limit experiment (in the Hajek-Le Cam sense). This limit experiment is highly
nonstandard and, applying Girsanov's theorem, we obtain a structural
representation in terms of stochastic differential equations. This structural
representation, and a general weak convergence result we develop, allow us to
obtain the asymptotic distribution of statistics for the CMAB problem. In
particular, we obtain the asymptotic distributions for the classical t-test
(non-Gaussian), Adaptively Weighted tests, and Inverse Propensity Weighted
tests (non-Gaussian). We show that, when comparing both arms, validity of these
tests requires the sampling scheme to be translation invariant in a way we make
precise. We propose translation-invariant versions of Thompson, tempered
greedy, and tempered Upper Confidence Bound sampling. Simulation results
corroborate our asymptotic analysis.
arXiv link: http://arxiv.org/abs/2505.13897v1
Characterization of Efficient Influence Function for Off-Policy Evaluation Under Optimal Policies
value of a counterfactual policy using observational data, without the need for
additional experimentation. Despite recent progress in robust and efficient OPE
across various settings, rigorous efficiency analysis of OPE under an estimated
optimal policy remains limited. In this paper, we establish a concise
characterization of the efficient influence function (EIF) for the value
function under optimal policy within canonical Markov decision process models.
Specifically, we provide the sufficient conditions for the existence of the EIF
and characterize its expression. We also give the conditions under which the
EIF does not exist.
arXiv link: http://arxiv.org/abs/2505.13809v3
Machine learning the first stage in 2SLS: Practical guidance from bias decomposition and simulation
first stage of two-stage least squares (2SLS) is a prediction problem,
suggesting potential gains from ML first-stage assistance. However, little
guidance exists on when ML helps 2SLS$x2014$or when it hurts. We
investigate the implications of inserting ML into 2SLS, decomposing the bias
into three informative components. Mechanically, ML-in-2SLS procedures face
issues common to prediction and causal-inference settings$x2014$and
their interaction. Through simulation, we show linear ML methods (e.g.,
post-Lasso) work well, while nonlinear methods (e.g., random forests, neural
nets) generate substantial bias in second-stage
estimates$x2014$potentially exceeding the bias of endogenous OLS.
arXiv link: http://arxiv.org/abs/2505.13422v1
From What Ifs to Insights: Counterfactuals in Causal Inference vs. Explainable AI
of causal inference (CI) and explainable artificial intelligence (XAI). While
the core idea behind counterfactuals remains the same in both fields--the
examination of what would have happened under different circumstances--there
are key differences in how they are used and interpreted. We introduce a formal
definition that encompasses the multi-faceted concept of the counterfactual in
CI and XAI. We then discuss how counterfactuals are used, evaluated, generated,
and operationalized in CI vs. XAI, highlighting conceptual and practical
differences. By comparing and contrasting the two, we hope to identify
opportunities for cross-fertilization across CI and XAI.
arXiv link: http://arxiv.org/abs/2505.13324v1
CATS: Clustering-Aggregated and Time Series for Business Customer Purchase Intention Prediction
success of a business strategy. Current researches mainly focus on analyzing
the specific types of products that customers are likely to purchase in the
future, little attention has been paid to the critical factor of whether
customers will engage in repurchase behavior. Predicting whether a customer
will make the next purchase is a classic time series forecasting task. However,
in real-world purchasing behavior, customer groups typically exhibit imbalance
- i.e., there are a large number of occasional buyers and a small number of
loyal customers. This head-to-tail distribution makes traditional time series
forecasting methods face certain limitations when dealing with such problems.
To address the above challenges, this paper proposes a unified Clustering and
Attention mechanism GRU model (CAGRU) that leverages multi-modal data for
customer purchase intention prediction. The framework first performs customer
profiling with respect to the customer characteristics and clusters the
customers to delineate the different customer clusters that contain similar
features. Then, the time series features of different customer clusters are
extracted by GRU neural network and an attention mechanism is introduced to
capture the significance of sequence locations. Furthermore, to mitigate the
head-to-tail distribution of customer segments, we train the model separately
for each customer segment, to adapt and capture more accurately the differences
in behavioral characteristics between different customer segments, as well as
the similar characteristics of the customers within the same customer segment.
We constructed four datasets and conducted extensive experiments to demonstrate
the superiority of the proposed CAGRU approach.
arXiv link: http://arxiv.org/abs/2505.13558v1
Opening the Black Box of Local Projections
estimate impulse responses to policy interventions. Yet, in many ways, they are
black boxes. It is often unclear what mechanism or historical episodes drive a
particular estimate. We introduce a new decomposition of LP estimates into the
sum of contributions of historical events, which is the product, for each time
stamp, of a weight and the realization of the response variable. In the least
squares case, we show that these weights admit two interpretations. First, they
represent purified and standardized shocks. Second, they serve as proximity
scores between the projected policy intervention and past interventions in the
sample. Notably, this second interpretation extends naturally to machine
learning methods, many of which yield impulse responses that, while nonlinear
in predictors, still aggregate past outcomes linearly via proximity-based
weights. Applying this framework to shocks in monetary and fiscal policy,
global temperature, and the excess bond premium, we find that easily
identifiable events-such as Nixon's interference with the Fed, stagflation,
World War II, and the Mount Agung volcanic eruption-emerge as dominant drivers
of often heavily concentrated impulse response estimates.
arXiv link: http://arxiv.org/abs/2505.12422v2
Multivariate Affine GARCH with Heavy Tails: A Unified Framework for Portfolio Optimization and Option Valuation
Normal Inverse Gaussian innovations that captures time-varying volatility,
heavy tails, and dynamic correlation across asset returns. We generalize the
Heston-Nandi framework to a multivariate setting and apply it to 30 Dow Jones
Industrial Average stocks. The model jointly supports three core financial
applications: dynamic portfolio optimization, wealth path simulation, and
option pricing. Closed-form solutions are derived for a Constant Relative Risk
Aversion (CRRA) investor's intertemporal asset allocation, and we implement a
forward-looking risk-adjusted performance comparison against Merton-style
constant strategies. Using the model's conditional volatilities, we also
construct implied volatility surfaces for European options, capturing skew and
smile features. Empirically, we document substantial wealth-equivalent utility
losses from ignoring time-varying correlation and tail risk. These findings
underscore the value of a unified econometric framework for analyzing joint
asset dynamics and for managing portfolio and derivative exposures under
non-Gaussian risks.
arXiv link: http://arxiv.org/abs/2505.12198v1
(Visualizing) Plausible Treatment Effect Paths
policy. Examples include dynamic treatment effects in microeconomics, impulse
response functions in macroeconomics, and event study paths in finance. We
present two sets of plausible bounds to quantify and visualize the uncertainty
associated with this object. Both plausible bounds are often substantially
tighter than traditional confidence intervals, and can provide useful insights
even when traditional (uniform) confidence bands appear uninformative. Our
bounds can also lead to markedly different conclusions when there is
significant correlation in the estimates, reflecting the fact that traditional
confidence bands can be ineffective at visualizing the impact of such
correlation. Our first set of bounds covers the average (or overall) effect
rather than the entire treatment path. Our second set of bounds imposes
data-driven smoothness restrictions on the treatment path. Post-selection
Inference (Berk et al. [2013]) provides formal coverage guarantees for these
bounds. The chosen restrictions also imply novel point estimates that perform
well across our simulations.
arXiv link: http://arxiv.org/abs/2505.12014v1
A New Bayesian Bootstrap for Quantitative Trade and Spatial Models
predictions. Because such predictions often inform policy decisions, it is
important to communicate the uncertainty surrounding them. Three key challenges
arise in this setting: the data are dyadic and exhibit complex dependence; the
number of interacting units is typically small; and counterfactual predictions
depend on the data in two distinct ways-through the estimation of structural
parameters and through their role as inputs into the model's counterfactual
equilibrium. I address these challenges by proposing a new Bayesian bootstrap
procedure tailored to this context. The method is simple to implement and
provides both finite-sample Bayesian and asymptotic frequentist guarantees.
Revisiting the results in Waugh (2010), Caliendo and Parro (2015), and
Artuc et al. (2010) illustrates the practical advantages of the approach.
arXiv link: http://arxiv.org/abs/2505.11967v1
IISE PG&E Energy Analytics Challenge 2025: Hourly-Binned Regression Models Beat Transformers in Load Forecasting
resource optimization, and renewable energy integration. While
transformer-based deep learning models like TimeGPT have gained traction in
time-series forecasting, their effectiveness in long-term electricity load
prediction remains uncertain. This study evaluates forecasting models ranging
from classical regression techniques to advanced deep learning architectures
using data from the ESD 2025 competition. The dataset includes two years of
historical electricity load data, alongside temperature and global horizontal
irradiance (GHI) across five sites, with a one-day-ahead forecasting horizon.
Since actual test set load values remain undisclosed, leveraging predicted
values would accumulate errors, making this a long-term forecasting challenge.
We employ (i) Principal Component Analysis (PCA) for dimensionality reduction
and (ii) frame the task as a regression problem, using temperature and GHI as
covariates to predict load for each hour, (iii) ultimately stacking 24 models
to generate yearly forecasts.
Our results reveal that deep learning models, including TimeGPT, fail to
consistently outperform simpler statistical and machine learning approaches due
to the limited availability of training data and exogenous variables. In
contrast, XGBoost, with minimal feature engineering, delivers the lowest error
rates across all test cases while maintaining computational efficiency. This
highlights the limitations of deep learning in long-term electricity
forecasting and reinforces the importance of model selection based on dataset
characteristics rather than complexity. Our study provides insights into
practical forecasting applications and contributes to the ongoing discussion on
the trade-offs between traditional and modern forecasting methods.
arXiv link: http://arxiv.org/abs/2505.11390v1
A Cautionary Tale on Integrating Studies with Disparate Outcome Measures for Causal Inference
and generalizability of studies. However, a key limitation of these methods is
the assumption that outcome measures are identical across datasets -- an
assumption that often does not hold in practice. Consider the following opioid
use disorder (OUD) studies: the XBOT trial and the POAT study, both evaluating
the effect of medications for OUD on withdrawal symptom severity (not the
primary outcome of either trial). While XBOT measures withdrawal severity using
the subjective opiate withdrawal scale, POAT uses the clinical opiate
withdrawal scale. We analyze this realistic yet challenging setting where
outcome measures differ across studies and where neither study records both
types of outcomes. Our paper studies whether and when integrating studies with
disparate outcome measures leads to efficiency gains. We introduce three sets
of assumptions -- with varying degrees of strength -- linking both outcome
measures. Our theoretical and empirical results highlight a cautionary tale:
integration can improve asymptotic efficiency only under the strongest
assumption linking the outcomes. However, misspecification of this assumption
leads to bias. In contrast, a milder assumption may yield finite-sample
efficiency gains, yet these benefits diminish as sample size increases. We
illustrate these trade-offs via a case study integrating the XBOT and POAT
datasets to estimate the comparative effect of two medications for opioid use
disorder on withdrawal symptoms. By systematically varying the assumptions
linking the SOW and COW scales, we show potential efficiency gains and the
risks of bias. Our findings emphasize the need for careful assumption selection
when fusing datasets with differing outcome measures, offering guidance for
researchers navigating this common challenge in modern data integration.
arXiv link: http://arxiv.org/abs/2505.11014v1
Tractable Unified Skew-t Distribution and Copula for Heterogeneous Asymmetries
important building blocks in many econometric and statistical models. The
Unified Skew-t (UST) is a promising choice because it is both scalable and
allows for a high level of flexibility in the asymmetry in the distribution.
However, it suffers from parameter identification and computational hurdles
that have to date inhibited its use for modeling data. In this paper we propose
a new tractable variant of the unified skew-t (TrUST) distribution that
addresses both challenges. Moreover, the copula of this distribution is shown
to also be tractable, while allowing for greater heterogeneity in asymmetric
dependence over variable pairs than the popular skew-t copula. We show how
Bayesian posterior inference for both the distribution and its copula can be
computed using an extended likelihood derived from a generative representation
of the distribution. The efficacy of this Bayesian method, and the enhanced
flexibility of both the TrUST distribution and its implicit copula, is first
demonstrated using simulated data. Applications of the TrUST distribution to
highly skewed regional Australian electricity prices, and the TrUST copula to
intraday U.S. equity returns, demonstrate how our proposed distribution and its
copula can provide substantial increases in accuracy over the popular skew-t
and its copula in practice.
arXiv link: http://arxiv.org/abs/2505.10849v1
Distribution Regression with Censored Selection
offering a semi-parametric generalization of the Heckman selection model. Our
approach applies to the entire distribution, extending beyond the mean or
median, accommodates non-Gaussian error structures, and allows for
heterogeneous effects of covariates on both the selection and outcome
distributions. By employing a censored selection rule, our model can uncover
richer selection patterns according to both outcome and selection variables,
compared to the binary selection case. We analyze identification, estimation,
and inference of model functionals such as sorting parameters and distributions
purged of sample selection. An application to labor supply using data from the
UK reveals different selection patterns into full-time and overtime work across
gender, marital status, and time. Additionally, decompositions of wage
distributions by gender show that selection effects contribute to a decrease in
the observed gender wage gap at low quantiles and an increase in the gap at
high quantiles for full-time workers. The observed gender wage gap among
overtime workers is smaller, which may be driven by different selection
behaviors into overtime work across genders.
arXiv link: http://arxiv.org/abs/2505.10814v1
Statistically Significant Linear Regression Coefficients Solely Driven By Outliers In Finite-sample Inference
significance of coefficients in linear regression. We demonstrate, through
numerical simulation using R, that a single outlier can cause an otherwise
insignificant coefficient to appear statistically significant. We compare this
with robust Huber regression, which reduces the effects of outliers.
Afterwards, we approximate the influence of a single outlier on estimated
regression coefficients and discuss common diagnostic statistics to detect
influential observations in regression (e.g., studentized residuals).
Furthermore, we relate this issue to the optional normality assumption in
simple linear regression [14], required for exact finite-sample inference but
asymptotically justified for large n by the Central Limit Theorem (CLT). We
also address the general dangers of relying solely on p-values without
performing adequate regression diagnostics. Finally, we provide a brief
overview of regression methods and discuss how they relate to the assumptions
of the Gauss-Markov theorem.
arXiv link: http://arxiv.org/abs/2505.10738v2
Optimal Post-Hoc Theorizing
they are strong. For these questions, theorizing before the results are known
is not always optimal. Instead, the optimal sequencing of theory and empirics
trades off a “Darwinian Learning” effect from theorizing first with a
“Statistical Learning” effect from examining the data first. This short paper
formalizes the tradeoff in a Bayesian model. In the modern era of mature
economic theory and enormous datasets, I argue that post hoc theorizing is
typically optimal.
arXiv link: http://arxiv.org/abs/2505.10370v2
Better Understanding Triple Differences Estimators
parallel trends assumptions in Difference-in-Differences (DiD) settings. This
paper highlights that common DDD implementations -- such as taking the
difference between two DiDs or applying three-way fixed effects regressions --
are generally invalid when identification requires conditioning on covariates.
In staggered adoption settings, the common DiD practice of pooling all
not-yet-treated units as a comparison group can introduce additional bias, even
when covariates are not required for identification. These insights challenge
conventional empirical strategies and underscore the need for estimators
tailored specifically to DDD structures. We develop regression adjustment,
inverse probability weighting, and doubly robust estimators that remain valid
under covariate-adjusted DDD parallel trends. For staggered designs, we
demonstrate how to effectively utilize multiple comparison groups to obtain
more informative inferences. Simulations and three empirical applications
highlight bias reductions and precision gains relative to standard approaches.
A companion R package is available.
arXiv link: http://arxiv.org/abs/2505.09942v3
Sequential Scoring Rule Evaluation for Forecast Method Selection
generalised to the problem of selecting between alternative forecasting methods
using scoring rules. A return to basic principles is necessary in order to show
that ideas and concepts from sequential statistical methods can be adapted and
applied to sequential scoring rule evaluation (SSRE). One key technical
contribution of this paper is the development of a large deviations type result
for SSRE schemes using a change of measure that parallels a traditional
exponential tilting form. Further, we also show that SSRE will terminate in
finite time with probability one, and that the moments of the SSRE stopping
time exist. A second key contribution is to show that the exponential tilting
form underlying our large deviations result allows us to cast SSRE within the
framework of generalised e-values. Relying on this formulation, we devise
sequential testing approaches that are both powerful and maintain control on
error probabilities underlying the analysis. Through several simulated
examples, we demonstrate that our e-values based SSRE approach delivers
reliable results that are more powerful than more commonly applied testing
methods precisely in the situations where these commonly applied methods can be
expected to fail.
arXiv link: http://arxiv.org/abs/2505.09090v1
Assumption-robust Causal Inference
adjustment sets that appear equally plausible. It is often untestable which of
these adjustment sets are valid to adjust for (i.e., satisfies ignorability).
This discrepancy can pose practical challenges as it is typically unclear how
to reconcile multiple, possibly conflicting estimates of the average treatment
effect (ATE). A naive approach is to report the whole range (convex hull of the
union) of the resulting confidence intervals. However, the width of this
interval might not shrink to zero in large samples and can be unnecessarily
wide in real applications. To address this issue, we propose a summary
procedure that generates a single estimate, one confidence interval, and
identifies a set of units for which the causal effect estimate remains valid,
provided at least one adjustment set is valid. The width of our proposed
confidence interval shrinks to zero with sample size at $n^{-1/2}$ rate, unlike
the original range which is of constant order. Thus, our assumption-robust
approach enables reliable causal inference on the ATE even in scenarios where
most of the adjustment sets are invalid. Admittedly, this robustness comes at a
cost: our inferential guarantees apply to a target population close to, but
different from, the one originally intended. We use synthetic and real-data
examples to demonstrate that our proposed procedure provides substantially
tighter confidence intervals for the ATE as compared to the whole range.
arXiv link: http://arxiv.org/abs/2505.08729v2
An Efficient Multi-scale Leverage Effect Estimator under Dependent Microstructure Noise
challenged by complex, dependent microstructure noise, often exhibiting
non-Gaussian higher-order moments. This paper introduces a novel multi-scale
framework for efficient and robust leverage effect estimation under such
flexible noise structures. We develop two new estimators, the
Subsampling-and-Averaging Leverage Effect (SALE) and the Multi-Scale Leverage
Effect (MSLE), which adapt subsampling and multi-scale approaches holistically
using a unique shifted window technique. This design simplifies the multi-scale
estimation procedure and enhances noise robustness without requiring the
pre-averaging approach. We establish central limit theorems and stable
convergence, with MSLE achieving convergence rates of an optimal $n^{-1/4}$ and
a near-optimal $n^{-1/9}$ for the noise-free and noisy settings, respectively.
A cornerstone of our framework's efficiency is a specifically designed MSLE
weighting strategy that leverages covariance structures across scales. This
significantly reduces asymptotic variance and, critically, yields substantially
smaller finite-sample errors than existing methods under both noise-free and
realistic noisy settings. Extensive simulations and empirical analyses confirm
the superior efficiency, robustness, and practical advantages of our approach.
arXiv link: http://arxiv.org/abs/2505.08654v2
On Selection of Cross-Section Averages in Non-stationary Environments
an unknown number of latent factors. It has recently been shown that IC perform
well in Common Correlated Effects (CCE) and related setups in selecting a set
of cross-section averages (CAs) sufficient for the factor space under
stationary factors. As CAs can proxy non-stationary factors, it is tempting to
claim such generality of IC, too. We show formally and in simulations that IC
have a severe underselection issue even under very mild forms of factor
non-stationarity, which goes against the sentiment in the literature.
arXiv link: http://arxiv.org/abs/2505.08615v4
Team Networks with Partially Observed Links
links. In the model, heterogeneous workers, represented as nodes, produce
jointly and repeatedly within teams, represented as links. Links are omitted
when their associated outcome variables fall below a threshold, resulting in
partial observability of the network. To address this, I propose a Generalized
Method of Moments estimator under normally distributed errors and develop a
distribution-free test for detecting link truncation. Applied to academic
publication data, the estimator reveals and corrects a substantial downward
bias in the estimated scaling factor that aggregates individual fixed effects
into team-specific fixed effects. This finding suggests that the collaboration
premium may be systematically underestimated when missing links are not
properly accounted for.
arXiv link: http://arxiv.org/abs/2505.08405v2
rd2d: Causal Inference in Boundary Discontinuity Designs
Discontinuity (RD) designs, with Geographic RD designs as a prominent example
-- are often used in empirical research to learn about causal treatment effects
along a continuous assignment boundary defined by a bivariate score. This
article introduces the R package rd2d, which implements and extends the
methodological results developed in Cattaneo, Titiunik and Yu (2025) for
boundary discontinuity designs. The package employs local polynomial estimation
and inference using either the bivariate score or a univariate
distance-to-boundary metric. It features novel data-driven bandwidth selection
procedures, and offers both pointwise and uniform estimation and inference
along the assignment boundary. The numerical performance of the package is
demonstrated through a simulation study.
arXiv link: http://arxiv.org/abs/2505.07989v2
Exploring Monetary Policy Shocks with Large-Scale Bayesian VARs
framework designed to estimate the effects of conventional monetary policy
shocks. The model captures structural shocks as latent factors, enabling
computationally efficient estimation in high-dimensional settings through a
straightforward Gibbs sampler. By incorporating time variation in the effects
of monetary policy while maintaining tractability, the methodology offers a
flexible and scalable approach to empirical macroeconomic analysis using BVARs,
well-suited to handle data irregularities observed in recent times. Applied to
the U.S. economy, I identify monetary shocks using a combination of
high-frequency surprises and sign restrictions, yielding results that are
robust across a wide range of specification choices. The findings indicate that
the Federal Reserve's influence on disaggregated consumer prices fluctuated
significantly during the 2022-24 high-inflation period, shedding new light on
the evolving dynamics of monetary policy transmission.
arXiv link: http://arxiv.org/abs/2505.06649v1
Beyond the Mean: Limit Theory and Tests for Infinite-Mean Autoregressive Conditional Durations
counterparts to the well-known integrated GARCH models used for financial
returns. However, despite their resemblance, asymptotic theory for ACD is
challenging and also not complete, in particular for integrated ACD. Central
challenges arise from the facts that (i) integrated ACD processes imply
durations with infinite expectation, and (ii) even in the non-integrated case,
conventional asymptotic approaches break down due to the randomness in the
number of durations within a fixed observation period. Addressing these
challenges, we provide here unified asymptotic theory for the (quasi-) maximum
likelihood estimator for ACD models; a unified theory which includes integrated
ACD models. Based on the new results, we also provide a novel framework for
hypothesis testing in duration models, enabling inference on a key empirical
question: whether durations possess a finite or infinite expectation. We apply
our results to high-frequency cryptocurrency ETF trading data. Motivated by
parameter estimates near the integrated ACD boundary, we assess whether
durations between trades in these markets have finite expectation, an
assumption often made implicitly in the literature on point process models. Our
empirical findings indicate infinite-mean durations for all the five
cryptocurrencies examined, with the integrated ACD hypothesis rejected --
against alternatives with tail index less than one -- for four out of the five
cryptocurrencies considered.
arXiv link: http://arxiv.org/abs/2505.06190v1
Estimation and Inference in Boundary Discontinuity Designs
along a continuous boundary that splits units into control and treatment groups
according to a bivariate score variable. These research designs are also called
Multi-Score Regression Discontinuity Designs, a leading special case being
Geographic Regression Discontinuity Designs. We study the statistical
properties of commonly used local polynomial treatment effects estimators along
the continuous treatment assignment boundary. We consider two distinct
approaches: one based explicitly on the bivariate score variable for each unit,
and the other based on their univariate distance to the boundary. For each
approach, we present pointwise and uniform estimation and inference methods for
the treatment effect function over the assignment boundary. Notably, we show
that methods based on univariate distance to the boundary exhibit an
irreducible large misspecification bias when the assignment boundary has kinks
or other irregularities, making the distance-based approach unsuitable for
empirical work in those settings. In contrast, methods based on the bivariate
score variable do not suffer from that drawback. We illustrate our methods with
an empirical application. Companion general-purpose software is provided.
arXiv link: http://arxiv.org/abs/2505.05670v1
Comparative Evaluation of VaR Models: Historical Simulation, GARCH-Based Monte Carlo, and Filtered Historical Simulation
modeling approaches: Historical Simulation (HS), GARCH with Normal
approximation (GARCH-N), and GARCH with Filtered Historical Simulation (FHS),
using both in-sample and multi-day forecasting frameworks. We compute daily 5
percent VaR estimates using each method and assess their accuracy via empirical
breach frequencies and visual breach indicators. Our findings reveal severe
miscalibration in the HS and GARCH-N models, with empirical breach rates far
exceeding theoretical levels. In contrast, the FHS method consistently aligns
with theoretical expectations and exhibits desirable statistical and visual
behavior. We further simulate 5-day cumulative returns under both GARCH-N and
GARCH-FHS frameworks to compute multi-period VaR and Expected Shortfall.
Results show that GARCH-N underestimates tail risk due to its reliance on the
Gaussian assumption, whereas GARCH-FHS provides more robust and conservative
tail estimates. Overall, the study demonstrates that the GARCH-FHS model offers
superior performance in capturing fat-tailed risks and provides more reliable
short-term risk forecasts.
arXiv link: http://arxiv.org/abs/2505.05646v1
Nonparametric Testability of Slutsky Symmetry
behavior are considered rational. Rationality implies that the Slutsky matrix,
which captures the substitution effects of compensated price changes on demand
for different goods, is symmetric and negative semi-definite. While empirically
informed versions of negative semi-definiteness have been shown to be
nonparametrically testable, the analogous question for Slutsky symmetry has
remained open. Recently, it has even been shown that the symmetry condition is
not testable via the average Slutsky matrix, prompting conjectures about its
non-testability. We settle this question by deriving nonparametric conditional
quantile restrictions on observable data that permit construction of a fully
nonparametric test for Slutsky symmetry in an empirical setting with individual
heterogeneity and endogeneity. The theoretical contribution is a multivariate
generalization of identification results for partial effects in nonseparable
models without monotonicity, which is of independent interest. This result has
implications for different areas in econometric theory, including nonparametric
welfare analysis with individual heterogeneity for which, in the case of more
than two goods, the symmetry condition introduces a nonlinear correction
factor.
arXiv link: http://arxiv.org/abs/2505.05603v2
Measuring the Euro Area Output Gap
non-stationary dynamic factor model estimated on a large dataset of
macroeconomic and financial variables. From 2012 to 2024, we estimate that the
EA economy was tighter than policy institutions estimate, suggesting that the
slow EA growth results from a potential output issue, not a business cycle
issue. Moreover, we find that a decline in trend inflation, not slack in the
economy, kept core inflation below 2% before the pandemic and that demand
forces account for at least 30% of the post-pandemic increase in core
inflation.
arXiv link: http://arxiv.org/abs/2505.05536v1
Forecasting Thai inflation from univariate Bayesian regression perspective
priors in predicting Thai inflation in a univariate setup, with a particular
interest in comparing those more advance shrinkage prior to a likelihood
dominated/noninformative prior. Our forecasting exercises are evaluated using
Root Mean Squared Error (RMSE), Quantile-Weighted Continuous Ranked Probability
Scores (qwCRPS), and Log Predictive Likelihood (LPL). The empirical results
reveal several interesting findings: SV-augmented models consistently
underperform compared to their non-SV counterparts, particularly in large
predictor settings. Notably, HS, DL and LASSO in large-sized model setting
without SV exhibit superior performance across multiple horizons. This
indicates that a broader range of predictors captures economic dynamics more
effectively than modeling time-varying volatility. Furthermore, while left-tail
risks (deflationary pressures) are well-controlled by advanced priors (HS, HS+,
and DL), right-tail risks (inflationary surges) remain challenging to forecast
accurately. The results underscore the trade-off between model complexity and
forecast accuracy, with simpler models delivering more reliable predictions in
both normal and crisis periods (e.g., the COVID-19 pandemic). This study
contributes to the literature by highlighting the limitations of SV models in
high-dimensional environments and advocating for a balanced approach that
combines advanced shrinkage techniques with broad predictor coverage. These
insights are crucial for policymakers and researchers aiming to enhance the
precision of inflation forecasts in emerging economies.
arXiv link: http://arxiv.org/abs/2505.05334v2
Scenario Synthesis and Macroeconomic Risk
forecasting, leveraging their respective strengths in policy settings. Our
Bayesian framework addresses the fundamental challenge of reconciling
judgmental narrative approaches with statistical forecasting. Analysis
evaluates explicit measures of concordance of scenarios with a reference
forecasting model, delivers Bayesian predictive synthesis of the scenarios to
best match that reference, and addresses scenario set incompleteness. This
underlies systematic evaluation and integration of risks from different
scenarios, and quantifies relative support for scenarios modulo the defined
reference forecasts. The framework offers advances in forecasting in policy
institutions that supports clear and rigorous communication of evolving risks.
We also discuss broader questions of integrating judgmental information with
statistical model-based forecasts in the face of unexpected circumstances.
arXiv link: http://arxiv.org/abs/2505.05193v1
A Powerful Chi-Square Specification Test with Support Vectors
Conditional Moment (KCM) tests, are crucial for model validation but often lack
power in finite samples. This paper proposes a novel framework to enhance
specification test performance using Support Vector Machines (SVMs) for
direction learning. We introduce two alternative SVM-based approaches: one
maximizes the discrepancy between nonparametric and parametric classes, while
the other maximizes the separation between residuals and the origin. Both
approaches lead to a $t$-type test statistic that converges to a standard
chi-square distribution under the null hypothesis. Our method is
computationally efficient and capable of detecting any arbitrary alternative.
Simulation studies demonstrate its superior performance compared to existing
methods, particularly in large-dimensional settings.
arXiv link: http://arxiv.org/abs/2505.04414v1
Shocking concerns: public perception about climate change and the macroeconomy
adaptation and support for policy intervention. In this paper, we propose a
novel Climate Concern Index (CCI), based on disaggregated web-search volumes
related to climate change topics, to gauge the intensity and dynamic evolution
of collective climate perceptions, and evaluate its impacts on the business
cycle. Using data from the United States over the 2004:2024 span, we capture
widespread shifts in perceived climate-related risks, particularly those
consistent with the postcognitive interpretation of affective responses to
extreme climate events. To assess the aggregate implications of evolving public
concerns about the climate, we estimate a proxy-SVAR model and find that
exogenous variation in the CCI entails a statistically significant drop in both
employment and private consumption and a persistent surge in stock market
volatility, while core inflation remains largely unaffected. These results
suggest that, even in the absence of direct physical risks, heightened concerns
for climate-related phenomena can trigger behavioral adaptation with nontrivial
consequences for the macroeconomy, thereby demanding attention from
institutional players in the macro-financial field.
arXiv link: http://arxiv.org/abs/2505.04669v1
Causal Inference in Counterbalanced Within-Subjects Designs
fields, within-subjects designs, which expose participants to both control and
treatment at different time periods, are used to address practical and
logistical concerns. Counterbalancing, a common technique in within-subjects
designs, aims to remove carryover effects by randomizing treatment sequences.
Despite its appeal, counterbalancing relies on the assumption that carryover
effects are symmetric and cancel out, which is often unverifiable a priori. In
this paper, we formalize the challenges of counterbalanced within-subjects
designs using the potential outcomes framework. We introduce sequential
exchangeability as an additional identification assumption necessary for valid
causal inference in these designs. To address identification concerns, we
propose diagnostic checks, the use of washout periods, and covariate
adjustments, and alternative experimental designs to counterbalanced
within-subjects design. Our findings demonstrate the limitations of
counterbalancing and provide guidance on when and how within-subjects designs
can be appropriately used for causal inference.
arXiv link: http://arxiv.org/abs/2505.03937v1
Slope Consistency of Quasi-Maximum Likelihood Estimator for Binary Choice Models
binary choice model (BCM) with logistic errors is widely used, especially in
machine learning contexts with many covariates and high-dimensional slope
coefficients. This paper revisits the slope consistency of QMLE for BCMs. Ruud
(1983) introduced a set of conditions under which QMLE may yield a constant
multiple of the slope coefficient of BCMs asymptotically. However, he did not
fully establish slope consistency of QMLE, which requires the existence of a
positive multiple of slope coefficient identified as an interior maximizer of
the population QMLE likelihood function over an appropriately restricted
parameter space. We fill this gap by providing a formal proof of slope
consistency under the same set of conditions for any binary choice model
identified as in Manski (1975, 1985). Our result implies that logistic
regression yields a consistent estimate for the slope coefficient of BCMs under
suitable conditions.
arXiv link: http://arxiv.org/abs/2505.02327v2
Latent Variable Estimation in Bayesian Black-Litterman Models
reliance on subjective investor views. Classical BL requires an investor
"view": a forecast vector $q$ and its uncertainty matrix $\Omega$ that describe
how much a chosen portfolio should outperform the market. Our key idea is to
treat $(q,\Omega)$ as latent variables and learn them from market data within a
single Bayesian network. Consequently, the resulting posterior estimation
admits closed-form expression, enabling fast inference and stable portfolio
weights. Building on these, we propose two mechanisms to capture how features
interact with returns: shared-latent parametrization and feature-influenced
views; both recover classical BL and Markowitz portfolios as special cases.
Empirically, on 30-year Dow-Jones and 20-year sector-ETF data, we improve
Sharpe ratios by 50% and cut turnover by 55% relative to Markowitz and the
index baselines. This work turns BL into a fully data-driven, view-free, and
coherent Bayesian framework for portfolio optimization.
arXiv link: http://arxiv.org/abs/2505.02185v1
Unemployment Dynamics Forecasting with Machine Learning Regression Models
techniques can be applied to monthly U.S. unemployment data to produce timely
forecasts. I compared seven models: Linear Regression, SGDRegressor, Random
Forest, XGBoost, CatBoost, Support Vector Regression, and an LSTM network,
training each on a historical span of data and then evaluating on a later
hold-out period. Input features include macro indicators (GDP growth, CPI),
labor market measures (job openings, initial claims), financial variables
(interest rates, equity indices), and consumer sentiment.
I tuned model hyperparameters via cross-validation and assessed performance
with standard error metrics and the ability to predict the correct unemployment
direction. Across the board, tree-based ensembles (and CatBoost in particular)
deliver noticeably better forecasts than simple linear approaches, while the
LSTM captures underlying temporal patterns more effectively than other
nonlinear methods. SVR and SGDRegressor yield modest gains over standard
regression but don't match the consistency of the ensemble and deep-learning
models.
Interpretability tools ,feature importance rankings and SHAP values, point to
job openings and consumer sentiment as the most influential predictors across
all methods. By directly comparing linear, ensemble, and deep-learning
approaches on the same dataset, our study shows how modern machine-learning
techniques can enhance real-time unemployment forecasting, offering economists
and policymakers richer insights into labor market trends.
In the comparative evaluation of the models, I employed a dataset comprising
thirty distinct features over the period from January 2020 through December
2024.
arXiv link: http://arxiv.org/abs/2505.01933v1
Identification and estimation of dynamic random coefficient models
lagged dependent variables) where coefficients are individual-specific,
allowing for heterogeneity in the effects of the regressors on the dependent
variable. I show that the model is not point-identified in a short panel
context but rather partially identified, and I characterize the identified sets
for the mean, variance, and CDF of the coefficient distribution. This
characterization is general, accommodating discrete, continuous, and unbounded
data, and it leads to computationally tractable estimation and inference
procedures. I apply the method to study lifecycle earnings dynamics among U.S.
households using the Panel Study of Income Dynamics (PSID) dataset. The results
suggest substantial unobserved heterogeneity in earnings persistence, implying
that households face varying levels of earnings risk which, in turn, contribute
to heterogeneity in their consumption and savings behaviors.
arXiv link: http://arxiv.org/abs/2505.01600v1
Asset Pricing in Pre-trained Transformer
representative from Transformer (SERT), for US large capital stock pricing. It
also innovatively applies the pre-trained Transformer models under the stock
pricing and factor investment context. They are compared with standard
Transformer models and encoder-only Transformer models in three periods
covering the entire COVID-19 pandemic to examine the model adaptivity and
suitability during the extreme market fluctuations. Namely, pre-COVID-19 period
(mild up-trend), COVID-19 period (sharp up-trend with deep down shock) and
1-year post-COVID-19 (high fluctuation sideways movement). The best proposed
SERT model achieves the highest out-of-sample R2, 11.2% and 10.91%
respectively, when extreme market fluctuation takes place followed by
pre-trained Transformer models (10.38% and 9.15%). Their Trend-following-based
strategy wise performance also proves their excellent capability for hedging
downside risks during market shocks. The proposed SERT model achieves a Sortino
ratio 47% higher than the buy-and-hold benchmark in the equal-weighted
portfolio and 28% higher in the value-weighted portfolio when the pandemic
period is attended. It proves that Transformer models have a great capability
to capture patterns of temporal sparsity data in the asset pricing factor
model, especially with considerable volatilities. We also find the softmax
signal filter as the common configuration of Transformer models in alternative
contexts, which only eliminates differences between models, but does not
improve strategy-wise performance, while increasing attention heads improve the
model performance insignificantly and applying the 'layer norm first' method do
not boost the model performance in our case.
arXiv link: http://arxiv.org/abs/2505.01575v2
Multiscale Causal Analysis of Market Efficiency via News Uncertainty Networks and the Financial Chaos Index
markets using the Financial Chaos Index, a tensor-eigenvalue-based measure of
realized volatility. Incorporating Granger causality and network-theoretic
analysis across a range of economic, policy, and news-based uncertainty
indices, we assess whether public information is efficiently incorporated into
asset price fluctuations. Based on a 34-year time period from 1990 to 2023, at
the daily frequency, the semi-strong form of the Efficient Market Hypothesis is
rejected at the 1% level of significance, indicating that asset price changes
respond predictably to lagged news-based uncertainty. In contrast, at the
monthly frequency, such predictive structure largely vanishes, supporting
informational efficiency at coarser temporal resolutions. A structural analysis
of the Granger causality network reveals that fiscal and monetary policy
uncertainties act as core initiators of systemic volatility, while peripheral
indices, such as those related to healthcare and consumer prices, serve as
latent bridges that become activated under crisis conditions. These findings
underscore the role of time-scale decomposition and structural asymmetries in
diagnosing market inefficiencies and mapping the propagation of macro-financial
uncertainty.
arXiv link: http://arxiv.org/abs/2505.01543v1
Predicting the Price of Gold in the Financial Markets Using Hybrid Models
highest accuracy has been one of the most challenging issues and one of the
most critical concerns among capital market activists and researchers.
Therefore, a model that can solve problems and provide results with high
accuracy is one of the topics of interest among researchers. In this project,
using time series prediction models such as ARIMA to estimate the price,
variables, and indicators related to technical analysis show the behavior of
traders involved in involving psychological factors for the model. By linking
all of these variables to stepwise regression, we identify the best variables
influencing the prediction of the variable. Finally, we enter the selected
variables as inputs to the artificial neural network. In other words, we want
to call this whole prediction process the "ARIMA_Stepwise Regression_Neural
Network" model and try to predict the price of gold in international financial
markets. This approach is expected to be able to be used to predict the types
of stocks, commodities, currency pairs, financial market indicators, and other
items used in local and international financial markets. Moreover, a comparison
between the results of this method and time series methods is also expressed.
Finally, based on the results, it can be seen that the resulting hybrid model
has the highest accuracy compared to the time series method, regression, and
stepwise regression.
arXiv link: http://arxiv.org/abs/2505.01402v1
Design-Based Inference under Random Potential Outcomes via Riesz Representation
random potential outcomes, thereby extending the classical Neyman-Rubin model
in which outcomes are treated as fixed. Each unit's potential outcome is
modelled as a structural mapping $y_i(z, \omega)$, where $z$ denotes
the treatment assignment and \(\omega\) represents latent outcome-level
randomness. Inspired by recent connections between design-based inference and
the Riesz representation theorem, we embed potential outcomes in a Hilbert
space and define treatment effects as linear functionals, yielding estimators
constructed via their Riesz representers. This approach preserves the core
identification logic of randomised assignment while enabling valid inference
under stochastic outcome variation. We establish large-sample properties under
local dependence and develop consistent variance estimators that remain valid
under weaker structural assumptions, including partially known dependence. A
simulation study illustrates the robustness and finite-sample behaviour of the
estimators. Overall, the framework unifies design-based reasoning with
stochastic outcome modelling, broadening the scope of causal inference in
complex experimental settings.
arXiv link: http://arxiv.org/abs/2505.01324v5
Detecting multiple change points in linear models with heteroscedastic errors
linear regression model with errors and covariates exhibiting
heteroscedasticity is considered. Asymptotic results for weighted functionals
of the cumulative sum (CUSUM) processes of model residuals are established when
the model errors are weakly dependent and non-stationary, allowing for either
abrupt or smooth changes in their variance. These theoretical results
illuminate how to adapt standard change point test statistics for linear models
to this setting. We studied such adapted change-point tests in simulation
experiments, along with a finite sample adjustment to the proposed testing
procedures. The results suggest that these methods perform well in practice for
detecting multiple change points in the linear model parameters and controlling
the Type I error rate in the presence of heteroscedasticity. We illustrate the
use of these approaches in applications to test for instability in predictive
regression models and explanatory asset pricing models.
arXiv link: http://arxiv.org/abs/2505.01296v2
Model Checks in a Kernel Ridge Regression Framework
conditional moment restriction models. By regressing estimated residuals on
kernel functions via kernel ridge regression (KRR), we obtain a coefficient
function in a reproducing kernel Hilbert space (RKHS) that is zero if and only
if the model is correctly specified. We introduce two classes of test
statistics: (i) projection-based tests, using RKHS inner products to capture
global deviations, and (ii) random location tests, evaluating the KRR estimator
at randomly chosen covariate points to detect local departures. The tests are
consistent against fixed alternatives and sensitive to local alternatives at
the $n^{-1/2}$ rate. When nuisance parameters are estimated, Neyman
orthogonality projections ensure valid inference without repeated estimation in
bootstrap samples. The random location tests are interpretable and can
visualize model misspecification. Simulations show strong power and size
control, especially in higher dimensions, outperforming existing methods.
arXiv link: http://arxiv.org/abs/2505.01161v1
Proper Correlation Coefficients for Nominal Random Variables
variables of which at least one has a nominal scale that is attainable for all
marginal distributions and proposes a set of dependence measures that are 1 if
and only if this perfect dependence is satisfied. The advantages of these
dependence measures relative to classical dependence measures like contingency
coefficients, Goodman-Kruskal's lambda and tau and the so-called uncertainty
coefficient are twofold. Firstly, they are defined if one of the variables is
real-valued and exhibits continuities. Secondly, they satisfy the property of
attainability. That is, they can take all values in the interval [0,1]
irrespective of the marginals involved. Both properties are not shared by the
classical dependence measures which need two discrete marginal distributions
and can in some situations yield values close to 0 even though the dependence
is strong or even perfect.
Additionally, I provide a consistent estimator for one of the new dependence
measures together with its asymptotic distribution under independence as well
as in the general case. This allows to construct confidence intervals and an
independence test, whose finite sample performance I subsequently examine in a
simulation study. Finally, I illustrate the use of the new dependence measure
in two applications on the dependence between the variables country and income
or country and religion, respectively.
arXiv link: http://arxiv.org/abs/2505.00785v1
Explainable AI in Spatial Analysis
Intelligence (XAI) within the realm of spatial analysis. A key objective in
spatial analysis is to model spatial relationships and infer spatial processes
to generate knowledge from spatial data, which has been largely based on
spatial statistical methods. More recently, machine learning offers scalable
and flexible approaches that complement traditional methods and has been
increasingly applied in spatial data science. Despite its advantages, machine
learning is often criticized for being a black box, which limits our
understanding of model behavior and output. Recognizing this limitation, XAI
has emerged as a pivotal field in AI that provides methods to explain the
output of machine learning models to enhance transparency and understanding.
These methods are crucial for model diagnosis, bias detection, and ensuring the
reliability of results obtained from machine learning models. This chapter
introduces key concepts and methods in XAI with a focus on Shapley value-based
approaches, which is arguably the most popular XAI method, and their
integration with spatial analysis. An empirical example of county-level voting
behaviors in the 2020 Presidential election is presented to demonstrate the use
of Shapley values and spatial analysis with a comparison to multi-scale
geographically weighted regression. The chapter concludes with a discussion on
the challenges and limitations of current XAI techniques and proposes new
directions.
arXiv link: http://arxiv.org/abs/2505.00591v1
Pre-Training Estimators for Structural Models: Application to Consumer Search
estimator is "pretrained" in the sense that the bulk of the computational cost
and researcher effort occur during the construction of the estimator.
Subsequent applications of the estimator to different datasets require little
computational cost or researcher effort. The estimation leverages a neural net
to recognize the structural model's parameter from data patterns. As an initial
trial, this paper builds a pretrained estimator for a sequential search model
that is known to be difficult to estimate. We evaluate the pretrained estimator
on 12 real datasets. The estimation takes seconds to run and shows high
accuracy. We provide the estimator at pnnehome.github.io. More generally,
pretrained, off-the-shelf estimators can make structural models more accessible
to researchers and practitioners.
arXiv link: http://arxiv.org/abs/2505.00526v3
A Unifying Framework for Robust and Efficient Inference with Unstructured Data
parameters derived from unstructured data, which include text, images, audio,
and video. Economists have long used unstructured data by first extracting
low-dimensional structured features (e.g., the topic or sentiment of a text),
since the raw data are too high-dimensional and uninterpretable to include
directly in empirical analyses. The rise of deep neural networks has
accelerated this practice by greatly reducing the costs of extracting
structured data at scale, but neural networks do not make generically unbiased
predictions. This potentially propagates bias to the downstream estimators that
incorporate imputed structured data, and the availability of different
off-the-shelf neural networks with different biases moreover raises p-hacking
concerns. To address these challenges, we reframe inference with unstructured
data as a problem of missing structured data, where structured variables are
imputed from high-dimensional unstructured inputs. This perspective allows us
to apply classic results from semiparametric inference, leading to estimators
that are valid, efficient, and robust. We formalize this approach with MAR-S, a
framework that unifies and extends existing methods for debiased inference
using machine learning predictions, connecting them to familiar problems such
as causal inference. Within this framework, we develop robust and efficient
estimators for both descriptive and causal estimands and address challenges
like inference with aggregated and transformed missing structured data-a common
scenario that is not covered by existing work. These methods-and the
accompanying implementation package-provide economists with accessible tools
for constructing unbiased estimators using unstructured data in a wide range of
applications, as we demonstrate by re-analyzing several influential studies.
arXiv link: http://arxiv.org/abs/2505.00282v2
Policy Learning with $α$-Expected Welfare
worst-off $\alpha$-fraction of the post-treatment outcome distribution. We
refer to this policy as the $\alpha$-Expected Welfare Maximization
($\alpha$-EWM) rule, where $\alpha \in (0,1]$ denotes the size of the
subpopulation of interest. The $\alpha$-EWM rule interpolates between the
expected welfare ($\alpha=1$) and the Rawlsian welfare ($\alpha\rightarrow 0$).
For $\alpha\in (0,1)$, an $\alpha$-EWM rule can be interpreted as a
distributionally robust EWM rule that allows the target population to have a
different distribution than the study population. Using the dual formulation of
our $\alpha$-expected welfare function, we propose a debiased estimator for the
optimal policy and establish its asymptotic upper regret bounds. In addition,
we develop asymptotically valid inference for the optimal welfare based on the
proposed debiased estimator. We examine the finite sample performance of the
debiased estimator and inference via both real and synthetic data.
arXiv link: http://arxiv.org/abs/2505.00256v1
On the Robustness of Mixture Models in the Presence of Hidden Markov Regimes with Covariate-Dependent Transition Probabilities
estimation in hidden Markov models (HMMs) when the regime-switching structure
is misspecified. Specifically, we examine the case where the true
data-generating process features a hidden Markov regime sequence with
covariate-dependent transition probabilities, but estimation proceeds under a
simplified mixture model that assumes regimes are independent and identically
distributed. We show that the parameters governing the conditional distribution
of the observables can still be consistently estimated under this
misspecification, provided certain regularity conditions hold. Our results
highlight a practical benefit of using computationally simpler mixture models
in settings where regime dependence is complex or difficult to model directly.
arXiv link: http://arxiv.org/abs/2504.21669v1
Real-time Program Evaluation using Anytime-valid Rank Tests
synthetic control have grown into workhorse tools for program evaluation.
Inference for these estimators is well-developed in settings where all
post-treatment data is available at the time of analysis. However, in settings
where data arrives sequentially, these tests do not permit real-time inference,
as they require a pre-specified sample size T. We introduce real-time inference
for program evaluation through anytime-valid rank tests. Our methodology relies
on interpreting the absence of a treatment effect as exchangeability of the
treatment estimates. We then convert these treatment estimates into sequential
ranks, and construct optimal finite-sample valid sequential tests for
exchangeability. We illustrate our methods in the context of
difference-in-differences and synthetic control. In simulations, they control
size even under mild exchangeability violations. While our methods suffer
slight power loss at T, they allow for early rejection (before T) and preserve
the ability to reject later (after T).
arXiv link: http://arxiv.org/abs/2504.21595v1
Publication Design with Incentives in Mind
attention, and influences the supply of research through its impact on
researchers' private incentives. We introduce a framework to study optimal
publication decisions when researchers can choose (i) whether or how to conduct
a study and (ii) whether or how to manipulate the research findings (e.g., via
selective reporting or data manipulation). When manipulation is not possible,
but research entails substantial private costs for the researchers, it may be
optimal to incentivize cheaper research designs even if they are less accurate.
When manipulation is possible, it is optimal to publish some manipulated
results, as well as results that would have not received attention in the
absence of manipulability. Even if it is possible to deter manipulation, such
as by requiring pre-registered experiments instead of (potentially manipulable)
observational studies, it is suboptimal to do so when experiments entail high
research costs. We illustrate the implications of our model in an application
to medical studies.
arXiv link: http://arxiv.org/abs/2504.21156v2
An Axiomatic Approach to Comparing Sensitivity Parameters
These methods typically make different, non-falsifiable assumptions. Hence the
data alone cannot tell us which method is most appropriate. Since it is
unreasonable to expect results to be robust against all possible robustness
checks, researchers often use methods deemed "interpretable", a subjective
criterion with no formal definition. In contrast, we develop the first formal,
axiomatic framework for comparing and selecting among these methods. Our
framework is analogous to the standard approach for comparing estimators based
on their sampling distributions. We propose that sensitivity parameters be
selected based on their covariate sampling distributions, a design distribution
of parameter values induced by an assumption on how covariates are assigned to
be observed or unobserved. Using this idea, we define a new concept of
parameter consistency, and argue that a reasonable sensitivity parameter should
be consistent. We prove that the literature's most popular approach is
inconsistent, while several alternatives are consistent.
arXiv link: http://arxiv.org/abs/2504.21106v2
Construct to Commitment: The Effect of Narratives on Economic Growth
construct, a mechanism for framing expectations, into commitment, a sustainable
pillar for growth. We propose the "Narratives-Construct-Commitment (NCC)"
framework outlining the mechanism and institutionalization of narratives, and
formalize it as a dynamic Bayesian game. Using the Innovation-Driven
Development Strategy (2016) as a case study, we identify the narrative shock
from high-frequency financial data and trace its impact using local projection
method. By shaping expectations, credible narratives institutionalize
investment incentives, channel resources into R&D, and facilitate sustained
improvements in total factor productivity (TFP). Our findings strive to provide
insights into the New Quality Productive Forces initiative, highlighting the
role of narratives in transforming vision into tangible economic growth.
arXiv link: http://arxiv.org/abs/2504.21060v3
Inference with few treated units
of units) are treated. An important challenge in such settings is that standard
inference methods that rely on asymptotic theory may be unreliable, even when
the total number of units is large. This survey reviews and categorizes
inference methods that are designed to accommodate few treated units,
considering both cross-sectional and panel data methods. We discuss trade-offs
and connections between different approaches. In doing so, we propose slight
modifications to improve the finite-sample validity of some methods, and we
also provide theoretical justifications for existing heuristic approaches that
have been proposed in the literature.
arXiv link: http://arxiv.org/abs/2504.19841v2
Assignment at the Frontier: Identifying the Frontier Structural Function and Bounding Mean Deviations
inputs minus a nonnegative unobserved deviation. We allow the distribution of
the deviation to depend on inputs. If zero lies in the support of the deviation
given inputs -- an assumption we term assignment at the frontier -- then the
frontier is identified by the supremum of the outcome at those inputs,
obviating the need for instrumental variables. We then estimate the frontier,
allowing for random error whose distribution may also depend on inputs.
Finally, we derive a lower bound on the mean deviation, using only variance and
skewness, that is robust to a scarcity of data near the frontier. We apply our
methods to estimate a firm-level frontier production function and mean
inefficiency.
arXiv link: http://arxiv.org/abs/2504.19832v5
Finite-Sample Properties of Generalized Ridge Estimators in Nonlinear Models
error (MSE) of ridge-type estimators in nonlinear models, including duration,
Poisson, and multinomial choice models, where theoretical results have been
scarce. Using a finite-sample approximation technique from the econometrics
literature, we derive new results showing that the generalized ridge maximum
likelihood estimator (MLE) with a sufficiently small penalty achieves lower
finite-sample MSE for both estimation and prediction than the conventional MLE,
regardless of whether the hypotheses incorporated in the penalty are valid. A
key theoretical contribution is to demonstrate that generalized ridge
estimators generate a variance-bias trade-off in the first-order MSE of
nonlinear likelihood-based models -- a feature absent for the conventional MLE
-- which enables ridge-type estimators to attain smaller MSE when the penalty
is properly selected. Extensive simulations and an empirical application to the
estimation of marginal mean and quantile treatment effects further confirm the
superior performance and practical relevance of the proposed method.
arXiv link: http://arxiv.org/abs/2504.19018v2
Inference in High-Dimensional Panel Models: Two-Way Dependence and Unobserved Heterogeneity
raising the number of nuisance parameters and making high dimensionality a
practical issue. Meanwhile, temporal and cross-sectional dependence in panel
data further complicates high-dimensional estimation and inference. This paper
proposes a toolkit for high-dimensional panel models with large cross-sectional
and time sample sizes. To reduce the dimensionality, I propose a weighted LASSO
using two-way cluster-robust penalty weights. Although consistent, the
convergence rate of LASSO is slow due to the cluster dependence, rendering
inference challenging in general. Nevertheless, asymptotic normality can be
established in a semiparametric moment-restriction model by leveraging a
clustered-panel cross-fitting approach and, as a special case, in a partial
linear model using the full sample. In a panel estimation of the government
spending multiplier, I demonstrate how high dimensionality could be hidden and
how the proposed toolkit enables flexible modeling and robust inference.
arXiv link: http://arxiv.org/abs/2504.18772v1
Regularized Generalized Covariance (RGCov) Estimator
extension of the GCov estimator to high dimensional setting that results either
from high-dimensional data or a large number of nonlinear transformations used
in the objective function. The approach relies on a ridge-type regularization
for high-dimensional matrix inversion in the objective function of the GCov.
The RGCov estimator is consistent and asymptotically normally distributed. We
provide the conditions under which it can reach semiparametric efficiency and
discuss the selection of the optimal regularization parameter. We also examine
the diagonal GCov estimator, which simplifies the computation of the objective
function. The GCov-based specification test, and the test for nonlinear serial
dependence (NLSD) are extended to the regularized RGCov specification and RNLSD
tests with asymptotic Chi-square distributions. Simulation studies show that
the RGCov estimator and the regularized tests perform well in the high
dimensional setting. We apply the RGCov to estimate the mixed causal and
noncausal VAR model of stock prices of green energy companies.
arXiv link: http://arxiv.org/abs/2504.18678v1
Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations
different in two populations. For instance, if a jobs program benefits members
of one city more than another, is that due to differences in program
participants (particular covariates) or the local labor markets (outcomes given
covariates)? The Kitagawa-Oaxaca-Blinder (KOB) decomposition is a standard tool
in econometrics that explains the difference in the mean outcome across two
populations. However, the KOB decomposition assumes a linear relationship
between covariates and outcomes, while the true relationship may be
meaningfully nonlinear. Modern machine learning boasts a variety of nonlinear
functional decompositions for the relationship between outcomes and covariates
in one population. It seems natural to extend the KOB decomposition using these
functional decompositions. We observe that a successful extension should not
attribute the differences to covariates -- or, respectively, to outcomes given
covariates -- if those are the same in the two populations. Unfortunately, we
demonstrate that, even in simple examples, two common decompositions --
functional ANOVA and Accumulated Local Effects -- can attribute differences to
outcomes given covariates, even when they are identical in two populations. We
provide a characterization of when functional ANOVA misattributes, as well as a
general property that any discrete decomposition must satisfy to avoid
misattribution. We show that if the decomposition is independent of its input
distribution, it does not misattribute. We further conjecture that
misattribution arises in any reasonable additive decomposition that depends on
the distribution of the covariates.
arXiv link: http://arxiv.org/abs/2504.16864v1
MLOps Monitoring at Scale for Digital Platforms
forecasting. To keep that performance in streaming data settings, they have to
be monitored and frequently re-trained. This can be done with machine learning
operations (MLOps) techniques under supervision of an MLOps engineer. However,
in digital platform settings where the number of data streams is typically
large and unstable, standard monitoring becomes either suboptimal or too labor
intensive for the MLOps engineer. As a consequence, companies often fall back
on very simple worse performing ML models without monitoring. We solve this
problem by adopting a design science approach and introducing a new monitoring
framework, the Machine Learning Monitoring Agent (MLMA), that is designed to
work at scale for any ML model with reasonable labor cost. A key feature of our
framework concerns test-based automated re-training based on a data-adaptive
reference loss batch. The MLOps engineer is kept in the loop via key metrics
and also acts, pro-actively or retrospectively, to maintain performance of the
ML model in the production stage. We conduct a large-scale test at a last-mile
delivery platform to empirically validate our monitoring framework.
arXiv link: http://arxiv.org/abs/2504.16789v1
Evaluating Meta-Regression Techniques: A Simulation Study on Heterogeneity in Location and Time
evaluate conventional meta-regression approaches (study-level random, fixed,
and mixed effects) against seven methodology specifications new to
meta-regressions that control joint heterogeneity in location and time
(including a new one that we introduce). We systematically vary heterogeneity
levels to assess statistical power, estimator bias and model robustness for
each methodology specification. This assessment focuses on three aspects:
performance under joint heterogeneity in location and time, the effectiveness
of our proposed settings incorporating location fixed effects and study-level
fixed effects with a time trend, as well as guidelines for model selection. The
results show that jointly modeling heterogeneity when heterogeneity is in both
dimensions improves performance compared to modeling only one type of
heterogeneity.
arXiv link: http://arxiv.org/abs/2504.16696v2
Dynamic Discrete-Continuous Choice Models: Identification and Conditional Choice Probability Estimation
individuals simultaneously make both discrete and continuous choices. The
framework incorporates a wide range of unobserved heterogeneity. I show that
such models are nonparametrically identified. Based on constructive
identification arguments, I build a novel two-step estimation method in the
lineage of Hotz and Miller (1993) and Arcidiacono and Miller (2011) but
extended to simultaneous discrete-continuous choice. In the first step, I
recover the (type-dependent) optimal choices with an expectation-maximization
algorithm and instrumental variable quantile regression. In the second step, I
estimate the primitives of the model taking the estimated optimal choices as
given. The method is especially attractive for complex dynamic models because
it significantly reduces the computational burden associated with their
estimation compared to alternative full solution methods.
arXiv link: http://arxiv.org/abs/2504.16630v1
Global identification of dynamic panel models with interactive effects
models with interactive effects, a fundamental issue in econometric theory. We
focus on the setting where the number of cross-sectional units (N) is large,
but the time dimension (T) remains fixed. While local identification based on
the Jacobian matrix is well understood and relatively straightforward to
establish, achieving global identification remains a significant challenge.
Under a set of mild and easily satisfied conditions, we demonstrate that the
parameters of the model are globally identified, ensuring that no two distinct
parameter values generate the same probability distribution of the observed
data. Our findings contribute to the broader literature on identification in
panel data models and have important implications for empirical research that
relies on interactive effects.
arXiv link: http://arxiv.org/abs/2504.14354v1
Finite Population Identification and Design-Based Sensitivity Analysis
by using design distributions to calibrate sensitivity parameters in finite
population identified sets. This yields uncertainty intervals that can be
interpreted as identified sets, Bayesian credible sets, or frequentist
design-based confidence sets. We focus on quantifying uncertainty about the
average treatment effect (ATE) due to missing potential outcomes in a
randomized experiment, where our approach (1) yields design-based confidence
intervals for ATE which allow for heterogeneous treatment effects but do not
rely on asymptotics, (2) provides a new motivation for examining covariate
balance, and (3) gives a new formal analysis of the role of randomized
treatment assignment. We illustrate our approach in three empirical
applications.
arXiv link: http://arxiv.org/abs/2504.14127v2
Projection Inference for set-identified SVARs
Vector Autoregressions. A nominal $1-\alpha$ projection region collects the
structural parameters that are compatible with a $1-\alpha$ Wald ellipsoid for
the model's reduced-form parameters (autoregressive coefficients and the
covariance matrix of residuals).
We show that projection inference can be applied to a general class of
stationary models, is computationally feasible, and -- as the sample size grows
large -- it produces regions for the structural parameters and their identified
set with both frequentist coverage and robust Bayesian credibility of at
least $1-\alpha$.
A drawback of the projection approach is that both coverage and robust
credibility may be strictly above their nominal level. Following the work of
Kaido_Molinari_Stoye:2014, we `calibrate' the radius of the Wald
ellipsoid to guarantee that -- for a given posterior on the reduced-form
parameters -- the robust Bayesian credibility of the projection method is
exactly $1-\alpha$. If the bounds of the identified set are differentiable, our
calibrated projection also covers the identified set with probability
$1-\alpha$. %eliminating the excess of robust Bayesian credibility also
eliminates excessive frequentist coverage.
We illustrate the main results of the paper using the demand/supply-model for
the U.S. labor market in Baumeister_Hamilton(2015)
arXiv link: http://arxiv.org/abs/2504.14106v2
Bayesian Model Averaging in Causal Instrumental Variable Models
unobserved confounding, but choosing suitable instruments is challenging in
practice. We propose gIVBMA, a Bayesian model averaging procedure that
addresses this challenge by averaging across different sets of instrumental
variables and covariates in a structural equation model. Our approach extends
previous work through a scale-invariant prior structure and accommodates
non-Gaussian outcomes and treatments, offering greater flexibility than
existing methods. The computational strategy uses conditional Bayes factors to
update models separately for the outcome and treatments. We prove that this
model selection procedure is consistent. By explicitly accounting for model
uncertainty, gIVBMA allows instruments and covariates to switch roles and
provides robustness against invalid instruments. In simulation experiments,
gIVBMA outperforms current state-of-the-art methods. We demonstrate its
usefulness in two empirical applications: the effects of malaria and
institutions on income per capita and the returns to schooling. A software
implementation of gIVBMA is available in Julia.
arXiv link: http://arxiv.org/abs/2504.13520v4
Using Multiple Outcomes to Adjust Standard Errors for Spatial Correlation
in a geographic space. In such cases, statistical inference is complicated by
the interdependence of economic outcomes across locations. A common approach to
account for this dependence is to cluster standard errors based on a predefined
geographic partition. A second strategy is to model dependence in terms of the
distance between units. Dependence, however, does not necessarily stop at
borders and is typically not determined by distance alone. This paper
introduces a method that leverages observations of multiple outcomes to adjust
standard errors for cross-sectional dependence. Specifically, a researcher,
while interested in a particular outcome variable, often observes dozens of
other variables for the same units. We show that these outcomes can be used to
estimate dependence under the assumption that the cross-sectional correlation
structure is shared across outcomes. We develop a procedure, which we call
Thresholding Multiple Outcomes (TMO), that uses this estimate to adjust
standard errors in a given regression setting. We show that adjustments of this
form can lead to sizable reductions in the bias of standard errors in
calibrated U.S. county-level regressions. Re-analyzing nine recent papers, we
find that the proposed correction can make a substantial difference in
practice.
arXiv link: http://arxiv.org/abs/2504.13295v1
How Much Weak Overlap Can Doubly Robust T-Statistics Handle?
root-n-consistent estimators exist and standard estimators may fail to be
asymptotically normal. This paper shows that a thresholded version of the
standard doubly robust estimator is asymptotically normal with well-calibrated
Wald confidence intervals even when constructed using nonparametric estimates
of the propensity score and conditional mean outcome. The analysis implies a
cost of weak overlap in terms of black-box nuisance rates, borne when the
semiparametric bound is infinite, and the contribution of outcome smoothness to
the outcome regression rate, which is incurred even when the semiparametric
bound is finite. As a byproduct of this analysis, I show that under weak
overlap, the optimal global regression rate is the same as the optimal
pointwise regression rate, without the usual polylogarithmic penalty. The
high-level conditions yield new rules of thumb for thresholding in practice. In
simulations, thresholded AIPW can exhibit moderate overrejection in small
samples, but I am unable to reject a null hypothesis of exact coverage in large
samples. In an empirical application, the clipped AIPW estimator that targets
the standard average treatment effect yields similar precision to a heuristic
10% fixed-trimming approach that changes the target sample.
arXiv link: http://arxiv.org/abs/2504.13273v2
Anemia, weight, and height among children under five in Peru from 2007 to 2022: A Panel Data analysis
crucial in Public Health Economics and Social Policy analysis. In this
discussion paper, we employ a helpful approach of Feasible Generalized Least
Squares (FGLS) to assess if there are statistically relevant relationships
between hemoglobin (adjusted to sea-level), weight, and height from 2007 to
2022 in children up to five years of age in Peru. By using this method, we may
find a tool that allows us to confirm if the relationships considered between
the target variables by the Peruvian agencies and authorities are in the right
direction to fight against chronic malnutrition and stunting.
arXiv link: http://arxiv.org/abs/2504.12888v1
The heterogeneous causal effects of the EU's Cohesion Fund
output and investment focusing on one of its least studied instruments, i.e.,
the Cohesion Fund (CF). We employ modern causal inference methods to estimate
not only the local average treatment effect but also its time-varying and
heterogeneous effects across regions. Utilizing this method, we propose a novel
framework for evaluating the effectiveness of CF as an EU cohesion policy tool.
Specifically, we estimate the time varying distribution of the CF's causal
effects across EU regions and derive key distribution metrics useful for policy
evaluation. Our analysis shows that relying solely on average treatment effects
masks significant heterogeneity and can lead to misleading conclusions about
the effectiveness of the EU's cohesion policy. We find that the impact of the
CF is frontloaded, peaking within the first seven years after a region's
initial inclusion in the program. The distribution of the effects during this
first seven-year cycle of funding is right skewed with relatively thick tails.
This indicates positive effects but unevenly distributed across regions.
Moreover, the magnitude of the CF effect is inversely related to a region's
relative position in the initial distribution of output, i.e., relatively
poorer recipient regions experience higher effects compared to relatively
richer regions. Finally, we find a non-linear relationship with diminishing
returns, whereby the impact of CF declines as the ratio of CF funds received to
a region's gross value added (GVA) increases.
arXiv link: http://arxiv.org/abs/2504.13223v1
Can Moran Eigenvectors Improve Machine Learning of Spatial Data? Insights from Synthetic Data Validation
accounting for spatial effects in statistical models. Can this extend to
machine learning? This paper examines the effectiveness of using Moran
Eigenvectors as additional spatial features in machine learning models. We
generate synthetic datasets with known processes involving spatially varying
and nonlinear effects across two different geometries. Moran Eigenvectors
calculated from different spatial weights matrices, with and without a priori
eigenvector selection, are tested. We assess the performance of popular machine
learning models, including Random Forests, LightGBM, XGBoost, and TabNet, and
benchmark their accuracies in terms of cross-validated R2 values against models
that use only coordinates as features. We also extract coefficients and
functions from the models using GeoShapley and compare them with the true
processes. Results show that machine learning models using only location
coordinates achieve better accuracies than eigenvector-based approaches across
various experiments and datasets. Furthermore, we discuss that while these
findings are relevant for spatial processes that exhibit positive spatial
autocorrelation, they do not necessarily apply when modeling network
autocorrelation and cases with negative spatial autocorrelation, where Moran
Eigenvectors would still be useful.
arXiv link: http://arxiv.org/abs/2504.12450v1
Ordinary Least Squares as an Attention Mechanism
output of a restricted attention module, akin to those forming the backbone of
large language models. This connection offers an alternative perspective on
attention beyond the conventional information retrieval framework, making it
more accessible to researchers and analysts with a background in traditional
statistics. It falls into place when OLS is framed as a similarity-based method
in a transformed regressor space, distinct from the standard view based on
partial correlations. In fact, the OLS solution can be recast as the outcome of
an alternative problem: minimizing squared prediction errors by optimizing the
embedding space in which training and test vectors are compared via inner
products. Rather than estimating coefficients directly, we equivalently learn
optimal encoding and decoding operations for predictors. From this vantage
point, OLS maps naturally onto the query-key-value structure of attention
mechanisms. Building on this foundation, I discuss key elements of
Transformer-style attention and draw connections to classic ideas from time
series econometrics.
arXiv link: http://arxiv.org/abs/2504.09663v1
Robust Tests for Factor-Augmented Regressions with an Application to the novel EA-MD Dataset
out-of-sample forecasts based on factor-augmented regression. We extend the
work of Pitarakis (2023a,b) to develop the inferential theory of predictive
regressions with generated regressors which are estimated by using Common
Correlated Effects (henceforth CCE) - a technique that utilizes cross-sectional
averages of grouped series. It is particularly useful since large datasets of
such structure are becoming increasingly popular. Under our framework,
CCE-based tests are asymptotically normal and robust to overspecification of
the number of factors, which is in stark contrast to existing methodologies in
the CCE context. Our tests are highly applicable in practice as they
accommodate for different predictor types (e.g., stationary and highly
persistent factors), and remain invariant to the location of structural breaks
in loadings. Extensive Monte Carlo simulations indicate that our tests exhibit
excellent local power properties. Finally, we apply our tests to a novel
EA-MD-QD dataset by Barigozzi et al. (2024b), which covers Euro Area as a whole
and primary member countries. We demonstrate that CCE factors offer a
substantial predictive power even under varying data persistence and structural
breaks.
arXiv link: http://arxiv.org/abs/2504.08455v1
An Introduction to Double/Debiased Machine Learning
Learning (DML). DML provides a general approach to performing inference about a
target parameter in the presence of nuisance parameters. The aim of DML is to
reduce the impact of nuisance parameter estimation on estimators of the
parameter of interest. We describe DML and its two essential components: Neyman
orthogonality and cross-fitting. We highlight that DML reduces functional form
dependence and accommodates the use of complex data types, such as text data.
We illustrate its application through three empirical examples that demonstrate
DML's applicability in cross-sectional and panel settings.
arXiv link: http://arxiv.org/abs/2504.08324v1
Double Machine Learning for Causal Inference under Shared-State Interference
settings where units interact via markets and recommendation systems. In these
settings, units are affected by certain shared states, like prices, algorithmic
recommendations or social signals. We formalize this structure, calling it
shared-state interference, and argue that our formulation captures many
relevant applied settings. Our key modeling assumption is that individuals'
potential outcomes are independent conditional on the shared state. We then
prove an extension of a double machine learning (DML) theorem providing
conditions for achieving efficient inference under shared-state interference.
We also instantiate our general theorem in several models of interest where it
is possible to efficiently estimate the average direct effect (ADE) or global
average treatment effect (GATE).
arXiv link: http://arxiv.org/abs/2504.08836v1
Causal Inference under Interference through Designed Markets
individual-level treatment on outcomes in a single market, even with data from
a randomized trial. In some markets, however, a centralized mechanism allocates
goods and imposes useful structure on spillovers. For a class of strategy-proof
"cutoff" mechanisms, we propose an estimator for global treatment effects using
individual-level data from one market, where treatment assignment is
unconfounded. Algorithmically, we re-run a weighted and perturbed version of
the mechanism. Under a continuum market approximation, the estimator is
asymptotically normal and semi-parametrically efficient. We extend this
approach to learn spillover-aware treatment rules with vanishing asymptotic
regret. Empirically, adjusting for equilibrium effects notably diminishes the
estimated effect of information on inequality in the Chilean school system.
arXiv link: http://arxiv.org/abs/2504.07217v1
Randomization Inference in Two-Sided Market Experiments
as buyer-seller platforms, to evaluate treatment effects from marketplace
interventions. These experiments must reflect the underlying two-sided market
structure in their design (e.g., sellers and buyers), making them particularly
challenging to analyze. In this paper, we propose a randomization inference
framework to analyze outcomes from such two-sided experiments. Our approach is
finite-sample valid under sharp null hypotheses for any test statistic and
maintains asymptotic validity under weak null hypotheses through
studentization. Moreover, we provide heuristic guidance for choosing among
multiple valid randomization tests to enhance statistical power, which we
demonstrate empirically. Finally, we demonstrate the performance of our
methodology through a series of simulation studies.
arXiv link: http://arxiv.org/abs/2504.06215v1
Optimizing Data-driven Weights In Multidimensional Indexes
non-negligible normative choices when it comes to attributing weights to their
dimensions. This paper provides a more rigorous approach to the choice of
weights by defining a set of desirable properties that weighting models should
meet. It shows that Bayesian Networks is the only model across statistical,
econometric, and machine learning computational models that meets these
properties. An example with EU-SILC data illustrates this new approach
highlighting its potential for policies.
arXiv link: http://arxiv.org/abs/2504.06012v1
Bayesian Shrinkage in High-Dimensional VAR Models: A Comparative Study
framework for multivariate time series analysis, yet face critical challenges
from over-parameterization and uncertain lag order. In this paper, we
systematically compare three Bayesian shrinkage priors (horseshoe, lasso, and
normal) and two frequentist regularization approaches (ridge and nonparametric
shrinkage) under three carefully crafted simulation scenarios. These scenarios
encompass (i) overfitting in a low-dimensional setting, (ii) sparse
high-dimensional processes, and (iii) a combined scenario where both large
dimension and overfitting complicate inference.
We evaluate each method in quality of parameter estimation (root mean squared
error, coverage, and interval length) and out-of-sample forecasting
(one-step-ahead forecast RMSE). Our findings show that local-global Bayesian
methods, particularly the horseshoe, dominate in maintaining accurate coverage
and minimizing parameter error, even when the model is heavily
over-parameterized. Frequentist ridge often yields competitive point forecasts
but underestimates uncertainty, leading to sub-nominal coverage. A real-data
application using macroeconomic variables from Canada illustrates how these
methods perform in practice, reinforcing the advantages of local-global priors
in stabilizing inference when dimension or lag order is inflated.
arXiv link: http://arxiv.org/abs/2504.05489v2
Eigenvalue-Based Randomness Test for Residual Diagnostics in Panel Data Models
approach rooted in the Tracy-Widom law from random matrix theory - and applies
it to the context of residual analysis in panel data models. Unlike traditional
methods, which target specific issues like cross-sectional dependence or
autocorrelation, the EBR test simultaneously examines multiple assumptions by
analyzing the largest eigenvalue of a symmetrized residual matrix. Monte Carlo
simulations demonstrate that the EBR test is particularly robust in detecting
not only standard violations such as autocorrelation and linear cross-sectional
dependence (CSD) but also more intricate non-linear and non-monotonic
dependencies, making it a comprehensive and highly flexible tool for enhancing
the reliability of panel data analyses.
arXiv link: http://arxiv.org/abs/2504.05297v1
Rationalizing dynamic choices
have access to the agent's information and ponders whether the observed actions
could be justified through a rational Bayesian model with a known utility
function. We show that the observed actions cannot be justified if and only if
there is a single deviation argument that leaves the agent better off,
regardless of the information. The result is then extended to allow for
distributions over possible action sequences. Four applications are presented:
monotonicity of rationalization with risk aversion, a potential rejection of
the Bayesian model with observable data, feasible outcomes in dynamic
information design, and partial identification of preferences without
assumptions on information.
arXiv link: http://arxiv.org/abs/2504.05251v1
Non-linear Phillips Curve for India: Evidence from Explainable Machine Learning
policymaking, often struggles to deliver accurate forecasts in the presence of
structural breaks and inherent nonlinearities. This paper addresses these
limitations by leveraging machine learning methods within a New Keynesian
Phillips Curve framework to forecast and explain headline inflation in India, a
major emerging economy. Our analysis demonstrates that machine learning-based
approaches significantly outperform standard linear models in forecasting
accuracy. Moreover, by employing explainable machine learning techniques, we
reveal that the Phillips curve relationship in India is highly nonlinear,
characterized by thresholds and interaction effects among key variables.
Headline inflation is primarily driven by inflation expectations, followed by
past inflation and the output gap, while supply shocks, except rainfall, exert
only a marginal influence. These findings highlight the ability of machine
learning models to improve forecast accuracy and uncover complex, nonlinear
dynamics in inflation data, offering valuable insights for policymakers.
arXiv link: http://arxiv.org/abs/2504.05350v1
Estimating Demand with Recentered Instruments
supply-side shocks. Our approach avoids conventional assumptions of exogenous
product characteristics, putting no restrictions on product entry, despite
using instrumental variables that incorporate characteristic variation. The
proposed instruments are model-predicted responses of endogenous variables to
the exogenous shocks, recentered to avoid bias from endogenous characteristics.
We illustrate the approach in a series of Monte Carlo simulations.
arXiv link: http://arxiv.org/abs/2504.04056v1
Regression Discontinuity Design with Distribution-Valued Outcomes
Distribution-Valued Outcomes (R3D), extending the standard RDD framework to
settings where the outcome is a distribution rather than a scalar. Such
settings arise when treatment is assigned at a higher level of aggregation than
the outcome-for example, when a subsidy is allocated based on a firm-level
revenue cutoff while the outcome of interest is the distribution of employee
wages within the firm. Since standard RDD methods cannot accommodate such
two-level randomness, I propose a novel approach based on random distributions.
The target estimand is a "local average quantile treatment effect", which
averages across random quantiles. To estimate this target, I introduce two
related approaches: one that extends local polynomial regression to random
quantiles and another based on local Fr\'echet regression, a form of functional
regression. For both estimators, I establish asymptotic normality and develop
uniform, debiased confidence bands together with a data-driven bandwidth
selection procedure. Simulations validate these theoretical properties and show
existing methods to be biased and inconsistent in this setting. I then apply
the proposed methods to study the effects of gubernatorial party control on
within-state income distributions in the US, using a close-election design. The
results suggest a classic equality-efficiency tradeoff under Democratic
governorship, driven by reductions in income at the top of the distribution.
arXiv link: http://arxiv.org/abs/2504.03992v1
Flatness-Robust Critical Bandwidth
regression functions, as well as for clustering methods. CB tests are known to
be inconsistent if the function of interest is constant ("flat") over even a
small interval, and to suffer from low power and incorrect size in finite
samples if the function has a relatively small derivative over an interval.
This paper proposes a solution, flatness-robust CB (FRCB), that exploits the
novel observation that the inconsistency manifests only from regions consistent
with the null hypothesis, and thus identifying and excluding them does not
alter the null or alternative sets. I provide sufficient conditions for
consistency of FRCB, and simulations of a test of regression monotonicity
demonstrate the finite-sample properties of FRCB compared with CB for various
regression functions. Surprisingly, FRCB performs better than CB in some cases
where there are no flat regions, which can be explained by FRCB essentially
giving more importance to parts of the function where there are larger
violations of the null hypothesis. I illustrate the usefulness of FRCB with an
empirical analysis of the monotonicity of the conditional mean function of
radiocarbon age with respect to calendar age.
arXiv link: http://arxiv.org/abs/2504.03594v1
Weak instrumental variables due to nonlinearities in panel data: A Super Learner Control Function estimator
individual-specific effects is used to model the causal effect of a covariate
on an outcome variable when there are unobservable confounders with some of
them time-invariant. In this setup, a linear reduced-form equation might be
problematic when the conditional mean of the endogenous covariate and the
instrumental variables is nonlinear. The reason is that ignoring the
nonlinearity could lead to weak instruments As a solution, we propose a
triangular simultaneous equation model for panel data with additive separable
individual-specific fixed effects composed of a linear structural equation with
a nonlinear reduced form equation. The parameter of interest is the structural
parameter of the endogenous variable. The identification of this parameter is
obtained under the assumption of available exclusion restrictions and using a
control function approach. Estimating the parameter of interest is done using
an estimator that we call Super Learner Control Function estimator (SLCFE). The
estimation procedure is composed of two main steps and sample splitting. We
estimate the control function using a super learner using sample splitting. In
the following step, we use the estimated control function to control for
endogeneity in the structural equation. Sample splitting is done across the
individual dimension. We perform a Monte Carlo simulation to test the
performance of the estimators proposed. We conclude that the Super Learner
Control Function Estimators significantly outperform Within 2SLS estimators.
arXiv link: http://arxiv.org/abs/2504.03228v4
Online Multivariate Regularized Distributional Regression for High-dimensional Probabilistic Electricity Price Forecasting
electricity markets, yet the multivariate nature of day-ahead prices - spanning
24 consecutive hours - remains underexplored. At the same time, real-time
decision-making requires methods that are both accurate and fast. We introduce
an online algorithm for multivariate distributional regression models, allowing
an efficient modelling of the conditional means, variances, and dependence
structures of electricity prices. The approach combines multivariate
distributional regression with online coordinate descent and LASSO-type
regularization, enabling scalable estimation in high-dimensional covariate
spaces. Additionally, we propose a regularized estimation path over
increasingly complex dependence structures, allowing for early stopping and
avoiding overfitting. In a case study of the German day-ahead market, our
method outperforms a wide range of benchmarks, showing that modeling dependence
improves both calibration and predictive accuracy. Furthermore, we analyse the
trade-off between predictive accuracy and computational costs for batch and
online estimation and provide an high-performing open-source Python
implementation in the ondil package.
arXiv link: http://arxiv.org/abs/2504.02518v2
Estimation of the complier causal hazard ratio under dependent censoring
endogenous binary treatment on a dependently censored duration outcome. By
dependent censoring, it is meant that the duration time ($T$) and right
censoring time ($C$) are not statistically independent of each other, even
after conditioning on the measured covariates. The endogeneity issue is handled
by making use of a binary instrumental variable for the treatment. To deal with
the dependent censoring problem, it is assumed that on the stratum of
compliers: (i) $T$ follows a semiparametric proportional hazards model; (ii)
$C$ follows a fully parametric model; and (iii) the relation between $T$ and
$C$ is modeled by a parametric copula, such that the association parameter can
be left unspecified. In this framework, the treatment effect of interest is the
complier causal hazard ratio (CCHR). We devise an estimation procedure that is
based on a weighted maximum likelihood approach, where the weights are the
probabilities of an observation coming from a complier. The weights are
estimated non-parametrically in a first stage, followed by the estimation of
the CCHR. Novel conditions under which the model is identifiable are given, a
two-step estimation procedure is proposed and some important asymptotic
properties are established. Simulations are used to assess the validity and
finite-sample performance of the estimation procedure. Finally, we apply the
approach to estimate the CCHR of both job training programs on unemployment
duration and periodic screening examinations on time until death from breast
cancer. The data come from the National Job Training Partnership Act study and
the Health Insurance Plan of Greater New York experiment respectively.
arXiv link: http://arxiv.org/abs/2504.02096v1
Non-parametric Quantile Regression and Uniform Inference with Unknown Error Distribution
the conditional quantile regression function (CQRF) with covariates exposed to
measurement errors. We consider the case that the distribution of the
measurement error is unknown and allowed to be either ordinary or super smooth.
We estimate the density of the measurement error by the repeated measurements
and propose the deconvolution kernel estimator for the CQRF. We derive the
uniform Bahadur representation of the proposed estimator and construct the
uniform confidence bands for the CQRF, uniformly in the sense for all
covariates and a set of quantile indices, and establish the theoretical
validity of the proposed inference. A data-driven approach for selecting the
tuning parameter is also included. Monte Carlo simulations and a real data
application demonstrate the usefulness of the proposed method.
arXiv link: http://arxiv.org/abs/2504.01761v1
A Causal Inference Framework for Data Rich Environments
confounding in "data-rich" settings, i.e., where there are a large number of
units and a large number of measurements per unit. Our model provides a bridge
between the structural causal model view of causal inference common in the
graphical models literature with that of the latent factor model view common in
the potential outcomes literature. We show how classic models for potential
outcomes and treatment assignments fit within our framework. We provide an
identification argument for the average treatment effect, the average treatment
effect on the treated, and the average treatment effect on the untreated. For
any estimator that has a fast enough estimation error rate for a certain
nuisance parameter, we establish it is consistent for these various causal
parameters. We then show principal component regression is one such estimator
that leads to consistent estimation, and we analyze the minimal smoothness
required of the potential outcomes function for consistency.
arXiv link: http://arxiv.org/abs/2504.01702v1
On Robust Empirical Likelihood for Nonparametric Regression with Application to Regression Discontinuity Designs
intervals in nonparametric regression and regression discontinuity designs
(RDD). The original empirical likelihood framework can be naturally extended to
these settings using local linear smoothers, with Wilks' theorem holding only
when an undersmoothed bandwidth is selected. However, the generalization of
bias-corrected versions of empirical likelihood under more realistic conditions
is non-trivial and has remained an open challenge in the literature. This paper
provides a satisfactory solution by proposing a novel approach, referred to as
robust empirical likelihood, designed for nonparametric regression and RDD. The
core idea is to construct robust weights which simultaneously achieve bias
correction and account for the additional variability introduced by the
estimated bias, thereby enabling valid confidence interval construction without
extra estimation steps involved. We demonstrate that the Wilks' phenomenon
still holds under weaker conditions in nonparametric regression, sharp and
fuzzy RDD settings. Extensive simulation studies confirm the effectiveness of
our proposed approach, showing superior performance over existing methods in
terms of coverage probabilities and interval lengths. Moreover, the proposed
procedure exhibits robustness to bandwidth selection, making it a flexible and
reliable tool for empirical analyses. The practical usefulness is further
illustrated through applications to two real datasets.
arXiv link: http://arxiv.org/abs/2504.01535v1
Locally- but not Globally-identified SVARs
identification of structural parameters holds locally but not globally. In this
case there exists a set of isolated structural parameter points that are
observationally equivalent under the imposed restrictions. Although the data do
not inform us which observationally equivalent point should be selected, the
common frequentist practice is to obtain one as a maximum likelihood estimate
and perform impulse response analysis accordingly. For Bayesians, the lack of
global identification translates to non-vanishing sensitivity of the posterior
to the prior, and the multi-modal likelihood gives rise to computational
challenges as posterior sampling algorithms can fail to explore all the modes.
This paper overcomes these challenges by proposing novel estimation and
inference procedures. We characterize a class of identifying restrictions and
circumstances that deliver local but non-global identification, and the
resulting number of observationally equivalent parameter values. We propose
algorithms to exhaustively compute all admissible structural parameters given
reduced-form parameters and utilize them to sample from the multi-modal
posterior. In addition, viewing the set of observationally equivalent parameter
points as the identified set, we develop Bayesian and frequentist procedures
for inference on the corresponding set of impulse responses. An empirical
example illustrates our proposal.
arXiv link: http://arxiv.org/abs/2504.01441v1
A Practical Guide to Estimating Conditional Marginal Effects: Modern Approaches
effects-how treatment effects vary with a moderating variable-using modern
statistical methods. Commonly used approaches, such as linear interaction
models, often suffer from unclarified estimands, limited overlap, and
restrictive functional forms. This guide begins by clearly defining the
estimand and presenting the main identification results. It then reviews and
improves upon existing solutions, such as the semiparametric kernel estimator,
and introduces robust estimation strategies, including augmented inverse
propensity score weighting with Lasso selection (AIPW-Lasso) and double machine
learning (DML) with modern algorithms. Each method is evaluated through
simulations and empirical examples, with practical recommendations tailored to
sample size and research context. All tools are implemented in the accompanying
interflex package for R.
arXiv link: http://arxiv.org/abs/2504.01355v1
Partial Identification of Mean Achievement in ILSA Studies with Multi-Stage Stratified Sample Design and Student Non-Participation
across education systems with the objective of learning about the
population-wide distribution of student achievement in the assessment. In this
article, we study one of the most fundamental threats that these studies face
when justifying the conclusions reached about these distributions: the
identification problem that arises from student non-participation during data
collection. Recognizing that ILSA studies have traditionally employed a narrow
range of strategies to address non-participation, we examine this problem using
tools developed within the framework of partial identification of probability
distributions. We tailor this framework to the problem of non-participation
when data are collected using a multi-stage stratified random sample design, as
in most ILSA studies. We demonstrate this approach with application to the
International Computer and Information Literacy Study in 2018. We show how to
use the framework to assess mean achievement under reasonable and credible sets
of assumptions about the non-participating population. We also provide examples
of how these results may be reported by agencies that administer ILSA studies.
By doing so, we bring to the field of ILSA an alternative strategy for
identification, estimation, and reporting of population parameters of interest.
arXiv link: http://arxiv.org/abs/2504.01209v2
Nonlinearity in Dynamic Causal Effects: Making the Bad into the Good, and the Good into the Great?
Nonlinear World: the Good, the Bad, and the Ugly" by Michal Koles\'{a}r and
Mikkel Plagborg-M{\o}ller. We make three comments, including a novel
contribution to the literature, showing how a reasonable economic
interpretation can potentially be restored for average-effect estimators with
negative weights.
arXiv link: http://arxiv.org/abs/2504.01140v2
Quantile Treatment Effects in High Dimensional Panel Data
dimensional panel data (large $N$ and $T$), where only one or a few units are
affected by the intervention or policy. Our method extends the generalized
synthetic control method xu_2017 from average treatment effects on the
treated to quantile treatment effects on the treated, allowing the underlying
factor structure to change across the quantile of the interested outcome
distribution. Our method involves estimating the quantile-dependent factors
using the control group, followed by a quantile regression to estimate the
quantile treatment effect using the treated units. We establish the asymptotic
properties of our estimators and propose a bootstrap procedure for statistical
inference, supported by simulation studies. An empirical application of the
2008 China Stimulus Program is provided.
arXiv link: http://arxiv.org/abs/2504.00785v2
Where the Trees Fall: Macroeconomic Forecasts for Forest-Reliant States
sawtimber as well as pulp and paper mill closures, which raises an important
policy question: how have and will key macroeconomic and industry specific
indicators within the U.S. forest sector likely to change over time? This study
provides empirical evidence to support forest-sector policy design by using a
vector error correction (VEC) model to forecast economic trends in three major
industries - forestry and logging, wood manufacturing, and paper manufacturing
- across six of the most forest-dependent states found by the location quotient
(LQ) measure: Alabama, Arkansas, Maine, Mississippi, Oregon, and Wisconsin.
Overall, the results suggest a general decline in employment and the number of
firms in the forestry and logging industry as well as the paper manufacturing
industry, while wood manufacturing is projected to see modest employment gains.
These results also offer key insights for regional policymakers, industry
leaders, and local economic development officials: communities dependent on
timber-based manufacturing may be more resilient than other forestry-based
industries in the face of economic disruptions. Our findings can help
prioritize targeted policy interventions and inform regional economic
resilience strategies. We show distinct differences across forest-dependent
industries and/or state sectors and geographies, highlighting that policies may
have to be specific to each sector and/or geographical area. Finally, our VEC
modeling framework is adaptable to other resource-dependent industries that
serve as regional economic pillars such as mining, agriculture, and energy
production offering a transferable tool for policy analysis in regions with
similar economic structures.
arXiv link: http://arxiv.org/abs/2503.23569v2
Reinterpreting demand estimation
interpreting nonparametric structural assumptions as restrictions on
counterfactual outcomes. It offers nontrivial and equivalent restatements of
key demand estimation assumptions in the Neyman-Rubin potential outcomes model,
for both settings with market-level data (Berry and Haile, 2014) and settings
with demographic-specific market shares (Berry and Haile, 2024). The
reformulation highlights a latent homogeneity assumption underlying structural
demand models: The relationship between counterfactual outcomes is assumed to
be identical across markets. This assumption is strong, but necessary for
identification of market-level counterfactuals. Viewing structural demand
models as misspecified but approximately correct reveals a tradeoff between
specification flexibility and robustness to latent homogeneity.
arXiv link: http://arxiv.org/abs/2503.23524v2
Forward Selection Fama-MacBeth Regression with Higher-Order Asset Pricing Factors
linear factors can effectively subsume the factor zoo. To this extend, we
propose a forward selection Fama-MacBeth procedure as a method to estimate a
high-dimensional stochastic discount factor model, isolating the most relevant
higher-order factors. Applying this approach to terms derived from six widely
used factors (the Fama-French five-factor model and the momentum factor), we
show that the resulting higher-order model with only a small number of selected
higher-order terms significantly outperforms traditional benchmarks both
in-sample and out-of-sample. Moreover, it effectively subsumes a majority of
the factors from the extensive factor zoo, suggesting that the pricing power of
most zoo factors is attributable to their exposure to higher-order terms of
common linear factors.
arXiv link: http://arxiv.org/abs/2503.23501v1
Estimation of Latent Group Structures in Time-Varying Panel Data Models
the cross-section. Slope coefficients change smoothly over time and follow a
latent group structure, being homogeneous within but heterogeneous across
groups. The group structure is identified using a pairwise adaptive group
fused-Lasso penalty. The trajectories of time-varying coefficients are
estimated via polynomial spline functions. We derive the asymptotic
distributions of the penalized and post-selection estimators and show their
oracle efficiency. A simulation study demonstrates excellent finite sample
properties. An application to the emission intensity of GDP highlights the
relevance of addressing cross-sectional heterogeneity and time-variance in
empirical settings.
arXiv link: http://arxiv.org/abs/2503.23165v1
Inference on effect size after multiple hypothesis testing
summarizing empirical findings in studies that estimate multiple, possibly
many, treatment effects. Under this kind of selective reporting, conventional
treatment effect estimates may be biased and their corresponding confidence
intervals may undercover the true effect sizes. We propose new estimators and
confidence intervals that provide valid inferences on the effect sizes of the
significant effects after multiple hypothesis testing. Our methods are based on
the principle of selective conditional inference and complement a wide range of
tests, including step-up tests and bootstrap-based step-down tests. Our
approach is scalable, allowing us to study an application with over 370
estimated effects. We justify our procedure for asymptotically normal treatment
effect estimators. We provide two empirical examples that demonstrate bias
correction and confidence interval adjustments for significant effects. The
magnitude and direction of the bias correction depend on the correlation
structure of the estimated effects and whether the interpretation of the
significant effects depends on the (in)significance of other effects.
arXiv link: http://arxiv.org/abs/2503.22369v2
tempdisagg: A Python Framework for Temporal Disaggregation of Time Series Data
temporal disaggregation of time series data. It transforms low-frequency
aggregates into consistent, high-frequency estimates using a wide array of
econometric techniques-including Chow-Lin, Denton, Litterman, Fernandez, and
uniform interpolation-as well as enhanced variants with automated estimation of
key parameters such as the autocorrelation coefficient rho. The package
introduces features beyond classical methods, including robust ensemble
modeling via non-negative least squares optimization, post-estimation
correction of negative values under multiple aggregation rules, and optional
regression-based imputation of missing values through a dedicated
Retropolarizer module. Architecturally, it follows a modular design inspired by
scikit-learn, offering a clean API for validation, modeling, visualization, and
result interpretation.
arXiv link: http://arxiv.org/abs/2503.22054v1
An Artificial Trend Index for Private Consumption Using Google Trends
to make economic projections or create indicators has gained significant
popularity, particularly with the Google Trends platform. This article explores
the potential of Google search data to develop a new index that improves
economic forecasts, with a particular focus on one of the key components of
economic activity: private consumption (64% of GDP in Peru). By selecting and
estimating categorized variables, machine learning techniques are applied,
demonstrating that Google data can identify patterns to generate a leading
indicator in real time and improve the accuracy of forecasts. Finally, the
results show that Google's "Food" and "Tourism" categories significantly reduce
projection errors, highlighting the importance of using this information in a
segmented manner to improve macroeconomic forecasts.
arXiv link: http://arxiv.org/abs/2503.21981v1
Identification and estimation of treatment effects in a linear factor model with fixed number of time periods
Treatment Effect on the Treated under a linear factor model that allows for
multiple time-varying unobservables. Unlike the majority of the literature on
treatment effects in linear factor models, our approach does not require the
number of pre-treatment periods to go to infinity to obtain a valid estimator.
Our identification approach employs a certain nonlinear transformations of the
time invariant observed covariates that are sufficiently correlated with the
unobserved variables. This relevance condition can be checked with the
available data on pre-treatment periods by validating the correlation of the
transformed covariates and the pre-treatment outcomes. Based on our
identification approach, we provide an asymptotically unbiased estimator of the
effect of participating in the treatment when there is only one treated unit
and the number of control units is large.
arXiv link: http://arxiv.org/abs/2503.21763v1
A Powerful Bootstrap Test of Independence in High Dimensions
random variable from a large pool of other random variables. The test statistic
is the maximum of several Chatterjee's rank correlations and critical values
are computed via a block multiplier bootstrap. The test is shown to
asymptotically control size uniformly over a large class of data-generating
processes, even when the number of variables is much larger than sample size.
The test is consistent against any fixed alternative. It can be combined with a
stepwise procedure for selecting those variables from the pool that violate
independence, while controlling the family-wise error rate. All formal results
leave the dependence among variables in the pool completely unrestricted. In
simulations, we find that our test is very powerful, outperforming existing
tests in most scenarios considered, particularly in high dimensions and/or when
the variables in the pool are dependent.
arXiv link: http://arxiv.org/abs/2503.21715v2
Inferring Treatment Effects in Large Panels by Uncovering Latent Similarities
identifying treatment effects. In this paper, we propose a new approach to
causal inference using panel data with large large $N$ and $T$. Our approach
imputes the untreated potential outcomes for treated units using the outcomes
for untreated individuals with similar values of the latent confounders. In
order to find units with similar latent characteristics, we utilize long
pre-treatment histories of the outcomes. Our analysis is based on a
nonparametric, nonlinear, and nonseparable factor model for untreated potential
outcomes and treatments. The model satisfies minimal smoothness requirements.
We impute both missing counterfactual outcomes and propensity scores using
kernel smoothing based on the constructed measure of latent similarity between
units, and demonstrate that our estimates can achieve the optimal nonparametric
rate of convergence up to log terms. Using these estimates, we construct a
doubly robust estimator of the period-specifc average treatment effect on the
treated (ATT), and provide conditions, under which this estimator is
$N$-consistent, and asymptotically normal and unbiased. Our simulation
study demonstrates that our method provides accurate inference for a wide range
of data generating processes.
arXiv link: http://arxiv.org/abs/2503.20769v2
Large Structural VARs with Multiple Sign and Ranking Restrictions
framework to study the impacts of multiple structural shocks simultaneously.
However, the concurrent identification of multiple shocks using sign and
ranking restrictions poses significant practical challenges to the point where
existing algorithms cannot be used with such large VARs. To address this, we
introduce a new numerically efficient algorithm that facilitates the estimation
of impulse responses and related measures in large structural VARs identified
with a large number of structural restrictions on impulse responses. The
methodology is illustrated using a 35-variable VAR with over 100 sign and
ranking restrictions to identify 8 structural shocks.
arXiv link: http://arxiv.org/abs/2503.20668v1
Quasi-Bayesian Local Projections: Simultaneous Inference and Extension to the Instrumental Variable Method
Bayesian methods face challenges due to the absence of a likelihood function.
Existing approaches rely on pseudo-likelihoods, which often result in poorly
calibrated posteriors. We propose a quasi-Bayesian method based on the
Laplace-type estimator, where a quasi-likelihood is constructed using a
generalized method of moments criterion. This approach avoids strict
distributional assumptions, ensures well-calibrated inferences, and supports
simultaneous credible bands. Additionally, it can be naturally extended to the
instrumental variable method. We validate our approach through Monte Carlo
simulations.
arXiv link: http://arxiv.org/abs/2503.20249v2
Treatment Effects Inference with High-Dimensional Instruments and Control Variables
when dealing with numerous instruments and non-sparse control variables. In
this paper, we propose a novel ridge regularization-based instrumental
variables method for estimation and inference in the presence of both
high-dimensional instrumental variables and high-dimensional control variables.
These methods are applicable both with and without sparsity assumptions. To
address the bias caused by high-dimensional instruments, we introduce a
two-step procedure incorporating a data-splitting strategy. We establish
statistical properties of the estimator, including consistency and asymptotic
normality. Furthermore, we develop statistical inference procedures by
providing a consistent estimator for the asymptotic variance of the estimator.
The finite sample performance of the proposed method is evaluated through
numerical simulations. Results indicate that the new estimator consistently
outperforms existing sparsity-based approaches across various settings,
offering valuable insights for more complex scenarios. Finally, we provide an
empirical application estimating the causal effect of schooling on earnings by
addressing potential endogeneity through the use of high-dimensional
instrumental variables and high-dimensional covariates.
arXiv link: http://arxiv.org/abs/2503.20149v1
EASI Drugs in the Streets of Colombia: Modeling Heterogeneous and Endogenous Drug Preferences
legalization, is heavily mediated by their demand behavior. Since individual
drug use is driven by many unobservable factors, accounting for unobserved
heterogeneity is crucial for modeling demand and designing targeted public
policies. This paper introduces a finite Gaussian mixture of Exact Affine Stone
Index (EASI) demand systems to estimate the joint demand for marijuana,
cocaine, and basuco (cocaine residual or "crack") in Colombia, accounting for
corner solutions and endogenous price variation. Our results highlight the
importance of unobserved heterogeneity in identifying reliable price
elasticities. The method reveals two regular consumer subpopulations: "safe"
(recreational) and "addict" users, with the majority falling into the first
group. For the "safe" group, whose estimates are precise and nationally
representative, all three drugs exhibit unitary price elasticities, with
cocaine being complementary to marijuana and basuco an inferior substitute to
cocaine. Given the low production cost of marijuana in Colombia, legalization
is likely to drive prices down significantly. Our counterfactual analysis
suggests that a 50% price decrease would result in a $363 USD gain in
utility-equivalent expenditure per representative consumer, $120 million USD
in government tax revenue, and a $127 million USD revenue loss for drug
dealers. Legalization, therefore, has the potential to reduce the incentive for
drug-related criminal activity, the current largest source of violent crime in
Colombia.
arXiv link: http://arxiv.org/abs/2503.20100v1
Identification of Average Treatment Effects in Nonparametric Panel Models
data setting. It introduces a novel nonparametric factor model and proves
identification of average treatment effects. The identification proof is based
on the introduction of a consistent estimator. Underlying the proof is a result
that there is a consistent estimator for the expected outcome in the absence of
the treatment for each unit and time period; this result can be applied more
broadly, for example in problems of decompositions of group-level differences
in outcomes, such as the much-studied gender wage gap.
arXiv link: http://arxiv.org/abs/2503.19873v1
Two Level Nested and Sequential Logit
equations in two-level nested and sequential logit models for analyzing
hierarchical choice structures. We present derivations of the Berry (1994)
inversion formula, nested inclusive values computation, and multi-level market
share equations, complementing existing literature. While conceptually
distinct, nested and sequential logit models share mathematical similarities
and, under specific distributional assumptions, yield identical inversion
formulas-offering valuable analytical insights. These notes serve as a
practical reference for researchers implementing multi-level discrete choice
models in empirical applications, particularly in industrial organization and
demand estimation contexts, and complement Mansley et al. (2019).
arXiv link: http://arxiv.org/abs/2503.21808v2
Bayesian Outlier Detection for Matrix-variate Models
impactful events -- are of theoretical interest, but can also severely distort
inference. Although outlier-robust methodologies can be used, many researchers
prefer pre-processing strategies that remove outliers. In this work, an
efficient sequential Bayesian framework is proposed for outlier detection based
on the predictive Bayes Factor (BF). The proposed method is specifically
designed for large, multidimensional datasets and extends univariate Bayesian
model outlier detection procedures to the matrix-variate setting. Leveraging
power-discounted priors, tractable predictive BF are obtained, thereby avoiding
computationally intensive techniques. The BF finite sample distribution, the
test critical region, and robust extensions of the test are introduced by
exploiting the sampling variability. The framework supports online detection
with analytical tractability, ensuring both accuracy and scalability. Its
effectiveness is demonstrated through simulations, and three applications to
reference datasets in macroeconomics and finance are provided.
arXiv link: http://arxiv.org/abs/2503.19515v2
Automatic Inference for Value-Added Regressions
estimates of teacher value-added. However, when the goal is to perform
inference on coefficients in the regression of long-term outcomes on
value-added, it's unclear whether shrinking the value-added estimators can help
or hurt. In this paper, we consider a general class of value-added estimators
and the properties of their corresponding regression coefficients. Our main
finding is that regressing long-term outcomes on shrinkage estimates of
value-added performs an automatic bias correction: the associated regression
estimator is asymptotically unbiased, asymptotically normal, and efficient in
the sense that it is asymptotically equivalent to regressing on the true
(latent) value-added. Further, OLS standard errors from regressing on shrinkage
estimates are consistent. As such, efficient inference is easy for
practitioners to implement: simply regress outcomes on shrinkage estimates of
value added.
arXiv link: http://arxiv.org/abs/2503.19178v1
Empirical Bayes shrinkage (mostly) does not correct the measurement error in regression
empirical Bayes shrinkage estimates corrects for the measurement error problem
in linear regression. We clarify the conditions needed; we argue that these
conditions are stronger than the those needed for classical measurement error
correction, which we advocate for instead. Moreover, we show that the classical
estimator cannot be improved without stronger assumptions. We extend these
results to regressions on nonlinear transformations of the latent attribute and
find generically slow minimax estimation rates.
arXiv link: http://arxiv.org/abs/2503.19095v1
Forecasting Labor Demand: Predicting JOLT Job Openings using Deep Learning Model
forecasting future Job Openings and Labor Turnover Survey data in the United
States. Drawing on multiple economic indicators from various sources, the data
are fed directly into LSTM model to predict JOLT job openings in subsequent
periods. The performance of the LSTM model is compared with conventional
autoregressive approaches, including ARIMA, SARIMA, and Holt-Winters. Findings
suggest that the LSTM model outperforms these traditional models in predicting
JOLT job openings, as it not only captures the dependent variables trends but
also harmonized with key economic factors. These results highlight the
potential of deep learning techniques in capturing complex temporal
dependencies in economic data, offering valuable insights for policymakers and
stakeholders in developing data-driven labor market strategies
arXiv link: http://arxiv.org/abs/2503.19048v1
Simultaneous Inference Bands for Autocorrelations
regions) for the null hypothesis of no temporal correlation. These bands have
two shortcomings. First, they build on pointwise intervals and suffer from
joint undercoverage (overrejection) under the null hypothesis. Second, if this
null is clearly violated one would rather prefer to see confidence bands to
quantify estimation uncertainty. We propose and discuss both simultaneous
significance bands and simultaneous confidence bands for time series and series
of regression residuals. They are as easy to construct as their pointwise
counterparts and at the same time provide an intuitive and visual
quantification of sampling uncertainty as well as valid statistical inference.
For regression residuals, we show that for static regressions the asymptotic
variances underlying the construction of the bands are the same as those for
observed time series, and for dynamic regressions (with lagged endogenous
regressors) we show how they need to be adjusted. We study theoretical
properties of simultaneous significance bands and two types of simultaneous
confidence bands (sup-t and Bonferroni) and analyse their finite-sample
performance in a simulation study. Finally, we illustrate the use of the bands
in an application to monthly US inflation and residuals from Phillips curve
regressions.
arXiv link: http://arxiv.org/abs/2503.18560v2
Spatiotemporal Impact of Trade Policy Variables on Asian Manufacturing Hubs: Bayesian Global Vector Autoregression Model
proposed in this research to analyze relationships among eight economy-wide
variables in varying market conditions. Employing Vector Autoregression (VAR)
and Granger causality, we explore trade policy effects on emerging
manufacturing hubs in China, India, Malaysia, Singapore, and Vietnam. A
Bayesian Global Vector Autoregression (BGVAR) model also assesses interaction
of cross unit and perform Unconditional and Conditional Forecasts. Utilizing
time-series data from the Asian Development Bank, our study reveals multi-way
cointegration and dynamic connectedness relationships among key economy-wide
variables. This innovative framework enhances investment decisions and
policymaking through a data-driven approach.
arXiv link: http://arxiv.org/abs/2503.17790v1
Calibration Strategies for Robust Causal Estimation: Theoretical and Empirical Insights on Propensity Score-Based Estimators
the performance of propensity score based estimators like inverse probability
weighting (IPW) and double/debiased machine learning (DML) frameworks. We
extend recent advances in calibration techniques for propensity score
estimation, improving the robustness of propensity scores in challenging
settings such as limited overlap, small sample sizes, or unbalanced data. Our
contributions are twofold: First, we provide a theoretical analysis of the
properties of calibrated estimators in the context of DML. To this end, we
refine existing calibration frameworks for propensity score models, with a
particular emphasis on the role of sample-splitting schemes in ensuring valid
causal inference. Second, through extensive simulations, we show that
calibration reduces variance of inverse-based propensity score estimators while
also mitigating bias in IPW, even in small-sample regimes. Notably, calibration
improves stability for flexible learners (e.g., gradient boosting) while
preserving the doubly robust properties of DML. A key insight is that, even
when methods perform well without calibration, incorporating a calibration step
does not degrade performance, provided that an appropriate sample-splitting
approach is chosen.
arXiv link: http://arxiv.org/abs/2503.17290v3
Local Projections or VARs? A Primer for Macroeconomists
vector autoregression (VAR) impulse response estimators? The two methods share
the same estimand, but in finite samples lie on opposite ends of a
bias-variance trade-off. While the low bias of LPs comes at a quite steep
variance cost, this cost must be paid to achieve robust uncertainty
assessments. Hence, when the goal is to convey what can be learned about
dynamic causal effects from the data, VARs should only be used with long lag
lengths, ensuring equivalence with LP. For LP estimation, we provide guidance
on selection of lag length and controls, bias correction, and confidence
interval construction.
arXiv link: http://arxiv.org/abs/2503.17144v2
Partial Identification in Moment Models with Incomplete Data--A Conditional Optimal Transport Approach
of a finite-dimensional parameter defined by a general moment model with
incomplete data. We establish a novel characterization of the identified set
for the true parameter in terms of a continuum of inequalities defined by
conditional optimal transport. For the special case of an affine moment model,
we show that the identified set is convex and that its support function can be
easily computed by solving a conditional optimal transport problem. For
parameters that may not satisfy the moment model, we propose a two-step
procedure to construct its identified set. Finally, we demonstrate the
generality and effectiveness of our approach through several running examples.
arXiv link: http://arxiv.org/abs/2503.16098v2
Has the Paris Agreement Shaped Emission Trends? A Panel VECM Analysis of Energy, Growth, and CO$_2$ in 106 Middle-Income Countries
middle-income countries where economic growth drives environmental degradation.
This study examines the long-run and short-run relationships between CO$_2$
emissions, energy use, GDP per capita, and population across 106 middle-income
countries from 1980 to 2023. Using a Panel Vector Error Correction Model
(VECM), we assess the impact of the Paris Agreement (2015) on emissions while
conducting cointegration tests to confirm long-run equilibrium relationships.
The findings reveal a strong long-run relationship among the variables, with
energy use as the dominant driver of emissions, while GDP per capita has a
moderate impact. However, the Paris Agreement has not significantly altered
emissions trends in middle-income economies. Granger causality tests indicate
that energy use strongly causes emissions, but GDP per capita and population do
not exhibit significant short-run causal effects. Variance decomposition
confirms that energy shocks have the most persistent effects, and impulse
response functions (IRFs) show emissions trajectories are primarily shaped by
economic activity rather than climate agreements. Robustness checks, including
autocorrelation tests, polynomial root stability, and Yamagata-Pesaran slope
homogeneity tests, validate model consistency. These results suggest that while
global agreements set emissions reduction goals, their effectiveness remains
limited without stronger national climate policies, sectoral energy reforms,
and financial incentives for clean energy adoption to ensure sustainable
economic growth.
arXiv link: http://arxiv.org/abs/2503.14946v2
Linear programming approach to partially identified econometric models
of linear programs (LPs). This paper introduces a novel estimator of the LP
value. Unlike existing procedures, our estimator is root-n-consistent,
pointwise in the probability measure, whenever the population LP is feasible
and finite. Our estimator is valid under point-identification, over-identifying
constraints, and solution multiplicity. Turning to uniformity properties, we
prove that the LP value cannot be uniformly consistently estimated without
restricting the set of possible distributions. We then show that our estimator
achieves uniform consistency under a condition that is minimal for the
existence of any such estimator. We obtain computationally efficient,
asymptotically normal inference procedure with exact asymptotic coverage at any
fixed probability measure. To complement our estimation results, we derive LP
sharp bounds in a general identification setting. We apply our findings to
estimating returns to education. To that end, we propose the conditionally
monotone IV assumption (cMIV) that tightens the classical monotone IV (MIV)
bounds and is testable under a mild regularity condition. Under cMIV,
university education in Colombia is shown to increase the average wage by at
least $5.5%$, whereas classical conditions fail to yield an informative bound.
arXiv link: http://arxiv.org/abs/2503.14940v1
Testing Conditional Stochastic Dominance at Target Points
at specific values of the conditioning covariates, referred to as target
points. The test is relevant for analyzing income inequality, evaluating
treatment effects, and studying discrimination. We propose a
Kolmogorov--Smirnov-type test statistic that utilizes induced order statistics
from independent samples. Notably, the test features a data-independent
critical value, eliminating the need for resampling techniques such as the
bootstrap. Our approach avoids kernel smoothing and parametric assumptions,
instead relying on a tuning parameter to select relevant observations. We
establish the asymptotic properties of our test, showing that the induced order
statistics converge to independent draws from the true conditional
distributions and that the test is asymptotically of level $\alpha$ under weak
regularity conditions. While our results apply to both continuous and discrete
data, in the discrete case, the critical value only provides a valid upper
bound. To address this, we propose a refined critical value that significantly
enhances power, requiring only knowledge of the support size of the
distributions. Additionally, we analyze the test's behavior in the limit
experiment, demonstrating that it reduces to a problem analogous to testing
unconditional stochastic dominance in finite samples. This framework allows us
to prove the validity of permutation-based tests for stochastic dominance when
the random variables are continuous. Monte Carlo simulations confirm the strong
finite-sample performance of our method.
arXiv link: http://arxiv.org/abs/2503.14747v2
Bounds for within-household encouragement designs with interference
with strategic interaction and discrete treatments, outcome and independent
instruments. We consider a framework with two decision-makers who play
pure-strategy Nash equilibria in treatment take-up, whose outcomes are
determined by their joint take-up decisions. We obtain a latent-type
representation at the pair level. We enumerate all types that are consistent
with pure-strategy Nash equilibria and exclusion restrictions, and then impose
conditions such as symmetry, strategic complementarity/substitution, several
notions of monotonicity, and homogeneity. Under any combination of the above
restrictions, we provide sharp bounds for our parameters of interest via a
simple Python optimization routine. Our framework allows the empirical
researcher to tailor the above menu of assumptions to their empirical
application and to assess their individual and joint identifying power.
arXiv link: http://arxiv.org/abs/2503.14314v1
How does Bike Absence Influence Mode Shifts Among Dockless Bike-Sharing Users? Evidence From Nanjing, China
available bikes at their preferred times and locations. This study examines the
determinants of the users' mode shifts in the context of bike absence, using
survey data from Nanjing, China. An integrated choice and latent variable based
on multinomial logit was employed to investigate the impact of
socio-demographic, trip characteristics, and psychological factors on travel
mode choices. Mode choice models were estimated with seven mode alternatives,
including bike-sharing related choices (waiting in place, picking up bikes on
the way, and picking up bikes on a detour), bus, taxi, riding hailing, and
walk. The findings show that under shared-bike unavailability, users prefer to
pick up bikes on the way rather than take detours, with buses and walking as
favored alternatives to shared bikes. Lower-educated users tend to wait in
place, showing greater concern for waiting time compared to riding time.
Lower-income users, commuters, and females prefer picking up bikes on the way,
while non-commuters and males opt for detours. The insights gained in this
study can provide ideas for solving the problems of demand estimation, parking
area siting, and multi-modal synergies of bike sharing to enhance utilization
and user satisfaction.
arXiv link: http://arxiv.org/abs/2503.14265v1
A Note on the Asymptotic Properties of the GLS Estimator in Multivariate Regression with Heteroskedastic and Autocorrelated Errors
regression with heteroskedastic and autocorrelated errors. We derive Wald
statistics for linear restrictions and assess their performance. The statistics
remains robust to heteroskedasticity and autocorrelation.
arXiv link: http://arxiv.org/abs/2503.13950v1
Minnesota BART
macroeconomic analysis, yet they remain limited by their reliance on a linear
parameterization. Recent research has introduced nonparametric alternatives,
such as Bayesian additive regression trees (BART), which provide flexibility
without strong parametric assumptions. However, existing BART-based frameworks
do not account for time dependency or allow for sparse estimation in the
construction of regression tree priors, leading to noisy and inefficient
high-dimensional representations. This paper introduces a sparsity-inducing
Dirichlet hyperprior on the regression tree's splitting probabilities, allowing
for automatic variable selection and high-dimensional VARs. Additionally, we
propose a structured shrinkage prior that decreases the probability of
splitting on higher-order lags, aligning with the Minnesota prior's principles.
Empirical results demonstrate that our approach improves predictive accuracy
over the baseline BART prior and Bayesian VAR (BVAR), particularly in capturing
time-dependent relationships and enhancing density forecasts. These findings
highlight the potential of developing domain-specific nonparametric methods in
macroeconomic forecasting.
arXiv link: http://arxiv.org/abs/2503.13759v1
Treatment Effect Heterogeneity in Regression Discontinuity Designs
heterogeneous treatment effects based on pretreatment covariates, even though
no formal statistical methods exist for such analyses. This has led to the
widespread use of ad hoc approaches in applications. Motivated by common
empirical practice, we develop a unified, theoretically grounded framework for
RD heterogeneity analysis. We show that a fully interacted local linear (in
functional parameters) model effectively captures heterogeneity while still
being tractable and interpretable in applications. The model structure holds
without loss of generality for discrete covariates. Although our proposed model
is potentially restrictive for continuous covariates, it naturally aligns with
standard empirical practice and offers a causal interpretation for RD
applications. We establish principled bandwidth selection and robust
bias-corrected inference methods to analyze heterogeneous treatment effects and
test group differences. We provide companion software to facilitate
implementation of our results. An empirical application illustrates the
practical relevance of our methods.
arXiv link: http://arxiv.org/abs/2503.13696v3
Difference-in-Differences Designs: A Practitioner's Guide
quasi-experimental research design. Its canonical form, with two groups and two
periods, is well-understood. However, empirical practices can be ad hoc when
researchers go beyond that simple case. This article provides an organizing
framework for discussing different types of DiD designs and their associated
DiD estimators. It discusses covariates, weights, handling multiple periods,
and staggered treatments. The organizational framework, however, applies to
other extensions of DiD methods as well.
arXiv link: http://arxiv.org/abs/2503.13323v3
Tracking the Hidden Forces Behind Laos' 2022 Exchange Rate Crisis and Balance of Payments Instability
underlying factors contributing to the debt-induced economic crisis in the
People's Democratic Republic of Laos ('Laos'). The analysis aims to use the
latent macroeconomic insights to propose ways forward for forecasting. We focus
on Laos's historic structural weaknesses to identify when a balance of payments
crisis with either a persistent current account imbalance or rapid capital
outflows would occur. By extracting latent economic factors from macroeconomic
indicators, the model provides a starting point for analyzing the structural
vulnerabilities leading to the value of the kip in USD terms dropping and
contributing to inflation in the country. This findings of this working paper
contribute to the broader literature on exchange rate instability and external
sector vulnerabilities in emerging economies, offering insights on what
constitutes as 'signals' as opposed to plain 'noise' from a macroeconomic
forecasting standpoint.
arXiv link: http://arxiv.org/abs/2503.13308v1
SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement
of learning decision policies that balance multiple objectives using offline
data. Often, they aim to develop policies that maximize goal outcomes, while
ensuring there are no undesirable changes in guardrail outcomes. To provide
credible recommendations, experimenters must not only identify policies that
satisfy the desired changes in goal and guardrail outcomes, but also offer
probabilistic guarantees about the changes these policies induce. In practice,
however, policy classes are often large, and digital experiments tend to
produce datasets with small effect sizes relative to noise. In this setting,
standard approaches such as data splitting or multiple testing often result in
unstable policy selection and/or insufficient statistical power. In this paper,
we provide safe noisy policy learning (SNPL), a novel approach that leverages
the concept of algorithmic stability to address these challenges. Our method
enables policy learning while simultaneously providing high-confidence
guarantees using the entire dataset, avoiding the need for data-splitting. We
present finite-sample and asymptotic versions of our algorithm that ensure the
recommended policy satisfies high-probability guarantees for avoiding guardrail
regressions and/or achieving goal outcome improvements. We test both variants
of our approach approach empirically on a real-world application of
personalizing SMS delivery. Our results on real-world data suggest that our
approach offers dramatic improvements in settings with large policy classes and
low signal-to-noise across both finite-sample and asymptotic safety guarantees,
offering up to 300% improvements in detection rates and 150% improvements in
policy gains at significantly smaller sample sizes.
arXiv link: http://arxiv.org/abs/2503.12760v2
Functional Factor Regression with an Application to Electricity Price Curve Modeling
curve data that is consistently estimated by imposing factor structures on the
regressors. An integral operator based on cross-covariances identifies two
components for each functional regressor: a predictive low-dimensional
component, along with associated factors that are guaranteed to be correlated
with the dependent variable, and an infinite-dimensional component that has no
predictive power. In order to consistently estimate the correct number of
factors for each regressor, we introduce a functional eigenvalue difference
test. While conventional estimators for functional linear models fail to
converge in distribution, we establish asymptotic normality, making it possible
to construct confidence bands and conduct statistical inference. The model is
applied to forecast electricity price curves in three different energy markets.
Its prediction accuracy is found to be comparable to popular machine learning
approaches, while providing statistically valid inference and interpretable
insights into the conditional correlation structures of electricity prices.
arXiv link: http://arxiv.org/abs/2503.12611v2
Identification and estimation of structural vector autoregressive models via LU decomposition
simultaneous relationships between multiple time-dependent data. Various
statistical inference methods have been studied to overcome the identification
problems of SVAR models. However, most of these methods impose strong
assumptions for innovation processes such as the uncorrelation of components.
In this study, we relax the assumptions for innovation processes and propose an
identification method for SVAR models under the zero-restrictions on the
coefficient matrices, which correspond to sufficient conditions for LU
decomposition of the coefficient matrices of the reduced form of the SVAR
models. Moreover, we establish asymptotically normal estimators for the
coefficient matrices and impulse responses, which enable us to construct test
statistics for the simultaneous relationships of time-dependent data. The
finite-sample performance of the proposed method is elucidated by numerical
simulations. We also present an example of an empirical study that analyzes the
impact of policy rates on unemployment and prices.
arXiv link: http://arxiv.org/abs/2503.12378v1
Nonlinear Forecast Error Variance Decompositions with Hermite Polynomials
nonlinear Structural Vector Autoregressive models with Gaussian innovations is
proposed, called the Hermite FEVD (HFEVD). This method employs a Hermite
polynomial expansion to approximate the future trajectory of a nonlinear
process. The orthogonality of Hermite polynomials under the Gaussian density
facilitates the construction of the decomposition, providing a separation of
shock effects by time horizon, by components of the structural innovation and
by degree of nonlinearity. A link between the HFEVD and nonlinear Impulse
Response Functions is established and distinguishes between marginal and
interaction contributions of shocks. Simulation results from standard nonlinear
models are provided as illustrations and an application to fiscal policy shocks
is examined.
arXiv link: http://arxiv.org/abs/2503.11416v2
Difference-in-Differences Meets Synthetic Control: Doubly Robust Identification and Estimation
methods for causal inference in panel data, each with distinct strengths and
limitations. We propose a novel method for short-panel causal inference that
integrates the advantages of both approaches. Our method delivers a doubly
robust identification strategy for the average treatment effect on the treated
(ATT) under either of two non-nested assumptions: parallel trends or a
group-level SC condition. Building on this identification result, we develop a
unified semiparametric framework for estimating the ATT. Notably, the
identification-robust moment function satisfies Neyman orthogonality under the
parallel trends assumption but not under the SC assumption, leading to
different asymptotic variances across the two identification strategies. To
ensure valid inference, we propose a multiplier bootstrap method that
consistently approximates the asymptotic distribution under either assumption.
Furthermore, we extend our methodology to accommodate repeated cross-sectional
data and staggered treatment designs. As an empirical application, we evaluate
the impact of the 2003 minimum wage increase in Alaska on family income.
Finally, in simulation studies based on empirically calibrated data-generating
processes, we demonstrate that the proposed estimation and inference methods
perform well in finite samples under either identification assumption.
arXiv link: http://arxiv.org/abs/2503.11375v2
On the numerical approximation of minimax regret rules via fictitious play
interest. To do so when potential outcomes are in {0,1} we discretize the
action space of nature and apply a variant of Robinson's (1951) algorithm for
iterative solutions for finite two-person zero sum games. Our approach avoids
the need to evaluate regret of each treatment rule in each iteration. When
potential outcomes are in [0,1] we apply the so-called coarsening approach. We
consider a policymaker choosing between two treatments after observing data
with unequal sample sizes per treatment and the case of testing several
innovations against the status quo.
arXiv link: http://arxiv.org/abs/2503.10932v1
Constructing an Instrument as a Function of Covariates
causal relationship between an endogenous variable and an outcome while
controlling for covariates. When an exogenous variable is unavailable to serve
as the instrument for an endogenous treatment, a recurring empirical practice
is to construct one from a nonlinear transformation of the covariates. We
investigate how reliable these estimates are under mild forms of
misspecification. Our main result shows that for instruments constructed from
covariates, the IV estimand can be arbitrarily biased under mild forms of
misspecification, even when imposing constant linear treatment effects. We
perform a semi-synthetic exercise by calibrating data to alternative models
proposed in the literature and estimating the average treatment effect. Our
results show that IV specifications that use instruments constructed from
covariates are non-robust to nonlinearity in the true structural function.
arXiv link: http://arxiv.org/abs/2503.10929v2
A New Design-Based Variance Estimator for Finely Stratified Experiments
treatment effect in finely stratified experiments. Here, by "design-based” we
mean that the only source of uncertainty stems from the randomness in treatment
assignment; by "finely stratified” we mean that units are stratified into
groups of a fixed size according to baseline covariates and then, within each
group, a fixed number of units are assigned uniformly at random to treatment
and the remainder to control. In this setting we present a novel estimator of
the variance of the difference-in-means based on pairing "adjacent" strata.
Importantly, our estimator is well defined even in the challenging setting
where there is exactly one treated or control unit per stratum. We prove that
our estimator is upward-biased, and thus can be used for inference under mild
restrictions on the finite population. We compare our estimator with some
well-known estimators that have been proposed previously in this setting, and
demonstrate that, while these estimators are also upward-biased, our estimator
has smaller bias and therefore leads to more precise inferences whenever
adjacent strata are sufficiently similar. To further understand when our
estimator leads to more precise inferences, we introduce a framework motivated
by a thought experiment in which the finite population is modeled as having
been drawn once in an i.i.d. fashion from a well-behaved probability
distribution. In this framework, we argue that our estimator dominates the
others in terms of limiting bias and that these improvements are strict except
under strong restrictions on the treatment effects. Finally, we illustrate the
practical relevance of our theoretical results through a simulation study,
which reveals that our estimator can in fact lead to substantially more precise
inferences, especially when the quality of stratification is high.
arXiv link: http://arxiv.org/abs/2503.10851v3
Visual Polarization Measurement Using Counterfactual Image Generation
influencing public discourse, policy, and consumer behavior. While studies on
polarization in news media have extensively focused on verbal content,
non-verbal elements, particularly visual content, have received less attention
due to the complexity and high dimensionality of image data. Traditional
descriptive approaches often rely on feature extraction from images, leading to
biased polarization estimates due to information loss. In this paper, we
introduce the Polarization Measurement using Counterfactual Image Generation
(PMCIG) method, which combines economic theory with generative models and
multi-modal deep learning to fully utilize the richness of image data and
provide a theoretically grounded measure of polarization in visual content.
Applying this framework to a decade-long dataset featuring 30 prominent
politicians across 20 major news outlets, we identify significant polarization
in visual content, with notable variations across outlets and politicians. At
the news outlet level, we observe significant heterogeneity in visual slant.
Outlets such as Daily Mail, Fox News, and Newsmax tend to favor Republican
politicians in their visual content, while The Washington Post, USA Today, and
The New York Times exhibit a slant in favor of Democratic politicians. At the
politician level, our results reveal substantial variation in polarized
coverage, with Donald Trump and Barack Obama among the most polarizing figures,
while Joe Manchin and Susan Collins are among the least. Finally, we conduct a
series of validation tests demonstrating the consistency of our proposed
measures with external measures of media slant that rely on non-image-based
sources.
arXiv link: http://arxiv.org/abs/2503.10738v1
PLRD: Partially Linear Regression Discontinuity Inference
designs in empirical economics. We argue, however, that widely used approaches
to building confidence intervals in regression discontinuity designs exhibit
suboptimal behavior in practice: In a simulation study calibrated to
high-profile applications of regression discontinuity designs, existing methods
either have systematic under-coverage or have wider-than-necessary intervals.
We propose a new approach, partially linear regression discontinuity inference
(PLRD), and find it to address shortcomings of existing methods: Throughout our
experiments, confidence intervals built using PLRD are both valid and short. We
also provide large-sample guarantees for PLRD under smoothness assumptions.
arXiv link: http://arxiv.org/abs/2503.09907v1
Designing Graph Convolutional Neural Networks for Discrete Choice with Network Effects
into discrete choice problems, achieving higher predictive performance than
standard discrete choice models while offering greater interpretability than
general-purpose flexible model classes. Econometric discrete choice models aid
in studying individual decision-making, where agents select the option with the
highest reward from a discrete set of alternatives. Intuitively, the utility an
individual derives from a particular choice depends on their personal
preferences and characteristics, the attributes of the alternative, and the
value their peers assign to that alternative or their previous choices.
However, most applications ignore peer influence, and models that do consider
peer or network effects often lack the flexibility and predictive performance
of recently developed approaches to discrete choice, such as deep learning. We
propose a novel graph convolutional neural network architecture to model
network effects in discrete choices, achieving higher predictive performance
than standard discrete choice models while retaining the interpretability
necessary for inference--a quality often lacking in general-purpose deep
learning architectures. We evaluate our architecture using revealed commuting
choice data, extended with travel times and trip costs for each travel mode for
work-related trips in New York City, as well as 2016 U.S. election data
aggregated by county, to test its performance on datasets with highly
imbalanced classes. Given the interpretability of our models, we can estimate
relevant economic metrics, such as the value of travel time savings in New York
City. Finally, we compare the predictive performance and behavioral insights
from our architecture to those derived from traditional discrete choice and
general-purpose deep learning models.
arXiv link: http://arxiv.org/abs/2503.09786v1
On the Wisdom of Crowds (of Economists)
the number of survey respondents grows. Such averages are "portfolios" of
forecasts. We characterize the speed and pattern of the gains from
diversification and their eventual decrease with portfolio size (the number of
survey respondents) in both (1) the key real-world data-based environment of
the U.S. Survey of Professional Forecasters (SPF), and (2) the theoretical
model-based environment of equicorrelated forecast errors. We proceed by
proposing and comparing various direct and model-based "crowd size signature
plots," which summarize the forecasting performance of k-average forecasts as a
function of k, where k is the number of forecasts in the average. We then
estimate the equicorrelation model for growth and inflation forecast errors by
choosing model parameters to minimize the divergence between direct and
model-based signature plots. The results indicate near-perfect equicorrelation
model fit for both growth and inflation, which we explicate by showing
analytically that, under conditions, the direct and fitted equicorrelation
model-based signature plots are identical at a particular model parameter
configuration, which we characterize. We find that the gains from
diversification are greater for inflation forecasts than for growth forecasts,
but that both gains nevertheless decrease quite quickly, so that fewer SPF
respondents than currently used may be adequate.
arXiv link: http://arxiv.org/abs/2503.09287v1
On a new robust method of inference for general time series models
estimation (LQMLE) for general parametric time series models. Compared to the
classical Gaussian QMLE and existing robust estimations, it enjoys many
distinctive advantages, such as robustness in respect of distributional
misspecification and heavy-tailedness of the innovation, more resiliency to
outliers, smoothness and strict concavity of the log logistic quasi-likelihood
function, and boundedness of the influence function among others. Under some
mild conditions, we establish the strong consistency and asymptotic normality
of the LQMLE. Moreover, we propose a new and vital parameter identifiability
condition to ensure desirable asymptotics of the LQMLE. Further, based on the
LQMLE, we consider the Wald test and the Lagrange multiplier test for the
unknown parameters, and derive the limiting distributions of the corresponding
test statistics. The applicability of our methodology is demonstrated by
several time series models, including DAR, GARCH, ARMA-GARCH, DTARMACH, and
EXPAR. Numerical simulation studies are carried out to assess the finite-sample
performance of our methodology, and an empirical example is analyzed to
illustrate its usefulness.
arXiv link: http://arxiv.org/abs/2503.08655v1
Functional Linear Projection and Impulse Response Analysis
respond to function-valued shocks. Our methods are developed based on linear
projection estimation of predictive regression models with a function-valued
predictor and other control variables. We show that the linear projection
coefficient associated with the functional variable allows for the impulse
response interpretation in a functional structural vector autoregressive model
under a certain identification scheme, similar to well-known Sims' (1972)
causal chain, but with nontrivial complications in our functional setup. A
novel estimator based on an operator Schur complement is proposed and its
asymptotic properties are studied. We illustrate its empirical applicability
with two examples involving functional variables: economy sentiment
distributions and functional monetary policy shocks.
arXiv link: http://arxiv.org/abs/2503.08364v2
A primer on optimal transport for causal inference with observational data
elegant framework for comparing probability distributions, with wide-ranging
applications in all areas of science. The fundamental idea of analyzing
probabilities by comparing their underlying state space naturally aligns with
the core idea of causal inference, where understanding and quantifying
counterfactual states is paramount. Despite this intuitive connection, explicit
research at the intersection of optimal transport and causal inference is only
beginning to develop. Yet, many foundational models in causal inference have
implicitly relied on optimal transport principles for decades, without
recognizing the underlying connection. Therefore, the goal of this review is to
offer an introduction to the surprisingly deep existing connections between
optimal transport and the identification of causal effects with observational
data -- where optimal transport is not just a set of potential tools, but
actually builds the foundation of model assumptions. As a result, this review
is intended to unify the language and notation between different areas of
statistics, mathematics, and econometrics, by pointing out these existing
connections, and to explore novel problems and directions for future work in
both areas derived from this realization.
arXiv link: http://arxiv.org/abs/2503.07811v2
Nonlinear Temperature Sensitivity of Residential Electricity Demand: Evidence from a Distributional Regression Approach
during extreme temperature events using the distribution-to-scalar regression
model. Rather than relying on simple averages or individual quantile statistics
of raw temperature data, we construct distributional summaries, such as
probability density, hazard rate, and quantile functions, to retain a more
comprehensive representation of temperature variation. This approach not only
utilizes richer information from the underlying temperature distribution but
also enables the examination of extreme temperature effects that conventional
models fail to capture. Additionally, recognizing that distribution functions
are typically estimated from limited discrete observations and may be subject
to measurement errors, our econometric framework explicitly addresses this
issue. Empirical findings from the hazard-to-demand model indicate that
residential electricity demand exhibits a stronger nonlinear response to cold
waves than to heat waves, while heat wave shocks demonstrate a more pronounced
incremental effect. Moreover, the temperature quantile-to-demand model produces
largely insignificant demand response estimates, attributed to the offsetting
influence of two counteracting forces.
arXiv link: http://arxiv.org/abs/2503.07213v1
Taxonomy and Estimation of Multiple Breakpoints in High-Dimensional Factor Models
which factor loadings undergo an unknown number of structural changes over
time. Given that a model with multiple changes in factor loadings can be
observationally indistinguishable from one with constant loadings but varying
factor variances, this reduces the high-dimensional structural change problem
to a lower-dimensional one. Due to the presence of multiple breakpoints, the
factor space may expand, potentially causing the pseudo factor covariance
matrix within some regimes to be singular. We define two types of breakpoints:
{\bf a singular change}, where the number of factors in the combined regime
exceeds the minimum number of factors in the two separate regimes, and {\bf a
rotational change}, where the number of factors in the combined regime equals
that in each separate regime. Under a singular change, we derive the properties
of the small eigenvalues and establish the consistency of the QML estimators.
Under a rotational change, unlike in the single-breakpoint case, the pseudo
factor covariance matrix within each regime can be either full rank or
singular, yet the QML estimation error for the breakpoints remains stably
bounded. We further propose an information criterion (IC) to estimate the
number of breakpoints and show that, with probability approaching one, it
accurately identifies the true number of structural changes. Monte Carlo
simulations confirm strong finite-sample performance. Finally, we apply our
method to the FRED-MD dataset, identifying five structural breaks in factor
loadings between 1959 and 2024.
arXiv link: http://arxiv.org/abs/2503.06645v2
Bayesian Synthetic Control with a Soft Simplex Constraint
constraint and how to implement it in a high-dimensional setting have been
widely discussed. To address both issues simultaneously, we propose a novel
Bayesian synthetic control method that integrates a soft simplex constraint
with spike-and-slab variable selection. Our model is featured by a hierarchical
prior capturing how well the data aligns with the simplex assumption, which
enables our method to efficiently adapt to the structure and information
contained in the data by utilizing the constraint in a more flexible and
data-driven manner. A unique computational challenge posed by our model is that
conventional Markov chain Monte Carlo sampling algorithms for Bayesian variable
selection are no longer applicable, since the soft simplex constraint results
in an intractable marginal likelihood. To tackle this challenge, we propose to
update the regression coefficients of two predictors simultaneously from their
full conditional posterior distribution, which has an explicit but highly
complicated characterization. This novel Gibbs updating scheme leads to an
efficient Metropolis-within-Gibbs sampler that enables effective posterior
sampling from our model and accurate estimation of the average treatment
effect. Simulation studies demonstrate that our method performs well across a
wide range of settings, in terms of both variable selection and treatment
effect estimation, even when the true data-generating process does not adhere
to the simplex constraint. Finally, application of our method to two empirical
examples in the economic literature yields interesting insights into the impact
of economic policies.
arXiv link: http://arxiv.org/abs/2503.06454v1
Bounding the Effect of Persuasion with Monotonicity Assumptions: Reassessing the Impact of TV Debates
exemplar of persuasive communication. Yet, recent evidence from Le Pennec and
Pons (2023) indicates that they may not sway voters as strongly as popular
belief suggests. We revisit their findings through the lens of the persuasion
rate and introduce a robust framework that does not require exogenous
treatment, parallel trends, or credible instruments. Instead, we leverage
plausible monotonicity assumptions to partially identify the persuasion rate
and related parameters. Our results reaffirm that the sharp upper bounds on the
persuasive effects of TV debates remain modest.
arXiv link: http://arxiv.org/abs/2503.06046v2
A Hybrid Framework Combining Autoregression and Common Factors for Matrix Time Series Modeling
but existing approaches such as matrix autoregressive and dynamic matrix factor
models often impose restrictive assumptions and fail to capture complex
dependencies. We propose a hybrid framework that integrates autoregressive
dynamics with a shared low-rank common factor structure, enabling flexible
modeling of temporal dependence and cross-sectional correlation while achieving
dimension reduction. The model captures dynamic relationships through lagged
matrix terms and leverages low-rank structures across predictor and response
matrices, with connections between their row and column subspaces established
via common latent bases to improve interpretability and efficiency. We develop
a computationally efficient gradient-based estimation method and establish
theoretical guarantees for statistical consistency and algorithmic convergence.
Extensive simulations show robust performance under various data-generating
processes, and in an application to multinational macroeconomic data, the model
outperforms existing methods in forecasting and reveals meaningful interactions
among economic factors and countries. The proposed framework provides a
practical, interpretable, and theoretically grounded tool for analyzing
high-dimensional matrix time series.
arXiv link: http://arxiv.org/abs/2503.05340v2
When can we get away with using the two-way fixed effects regression?
was historically motivated by folk wisdom that it uncovers the Average
Treatment effect on the Treated (ATT) as in the canonical two-period two-group
case. This belief has come under scrutiny recently due to recent results in
applied econometrics showing that it fails to uncover meaningful averages of
heterogeneous treatment effects in the presence of effect heterogeneity over
time and across adoption cohorts, and several heterogeneity-robust alternatives
have been proposed. However, these estimators often have higher variance and
are therefore under-powered for many applications, which poses a bias-variance
tradeoff that is challenging for researchers to navigate. In this paper, we
propose simple tests of linear restrictions that can be used to test for
differences in dynamic treatment effects over cohorts, which allows us to test
for when the two-way fixed effects regression is likely to yield biased
estimates of the ATT. These tests are implemented as methods in the pyfixest
python library.
arXiv link: http://arxiv.org/abs/2503.05125v1
Enhancing Poverty Targeting with Spatial Machine Learning: An application to Indonesia
of Proxy Means Testing (PMT) for poverty targeting in Indonesia. Conventional
PMT methodologies are prone to exclusion and inclusion errors due to their
inability to account for spatial dependencies and regional heterogeneity. By
integrating spatial contiguity matrices, SML models mitigate these limitations,
facilitating a more precise identification and comparison of geographical
poverty clusters. Utilizing household survey data from the Social Welfare
Integrated Data Survey (DTKS) for the periods 2016 to 2020 and 2016 to 2021,
this study examines spatial patterns in income distribution and delineates
poverty clusters at both provincial and district levels. Empirical findings
indicate that the proposed SML approach reduces exclusion errors from 28% to
20% compared to standard machine learning models, underscoring the critical
role of spatial analysis in refining machine learning-based poverty targeting.
These results highlight the potential of SML to inform the design of more
equitable and effective social protection policies, particularly in
geographically diverse contexts. Future research can explore the applicability
of spatiotemporal models and assess the generalizability of SML approaches
across varying socio-economic settings.
arXiv link: http://arxiv.org/abs/2503.04300v1
Optimal Policy Choices Under Uncertainty
unknown and must be inferred from statistical estimates in empirical studies.
In this paper I consider the problem of a planner who changes upfront spending
on a set of policies to maximize social welfare but faces statistical
uncertainty about the impact of those changes. I set up a local optimization
problem that is tractable under statistical uncertainty and solve for the local
change in spending that maximizes the posterior expected rate of increase in
welfare. I propose an empirical Bayes approach to approximating the optimal
local spending rule, which solves the planner's local problem with posterior
mean estimates of benefits and net costs. I show theoretically that the
empirical Bayes approach performs well by deriving rates of convergence for the
rate of increase in welfare. These rates converge for a large class of decision
problems, including those where rates from a sample plug-in approach do not.
arXiv link: http://arxiv.org/abs/2503.03910v3
Extrapolating the long-term seasonal component of electricity prices for forecasting in the day-ahead market
the long-term seasonal component (LTSC) and the remaining part, predicting both
separately, and then combining their forecasts can bring significant accuracy
gains in day-ahead electricity price forecasting. However, not much attention
has been paid to predicting the LTSC, and the last 24 hourly values of the
estimated pattern are typically copied for the target day. To address this gap,
we introduce a novel approach which extracts the trend-seasonal pattern from a
price series extrapolated using price forecasts for the next 24 hours. We
assess it using two 5-year long test periods from the German and Spanish power
markets, covering the Covid-19 pandemic, the 2021/2022 energy crisis, and the
war in Ukraine. Considering parsimonious autoregressive and LASSO-estimated
models, we find that improvements in predictive accuracy range from 3% to 15%
in terms of the root mean squared error and exceed 1% in terms of profits from
a realistic trading strategy involving day-ahead bidding and battery storage.
arXiv link: http://arxiv.org/abs/2503.02518v1
On the Realized Joint Laplace Transform of Volatilities with Application to Test the Volatility Dependence
Laplace transform of volatilities of two semi-martingales within a fixed time
interval [0, T] by using overlapped increments of high-frequency data. The
proposed estimator is robust to the presence of finite variation jumps in price
processes. The related functional central limit theorem for the proposed
estimator has been established. Compared with the estimator with non-overlapped
increments, the estimator with overlapped increments improves the asymptotic
estimation efficiency. Moreover, we study the asymptotic theory of estimator
under a long-span setting and employ it to create a feasible test for the
dependence between volatilities. Finally, simulation and empirical studies
demonstrate the performance of proposed estimators.
arXiv link: http://arxiv.org/abs/2503.02283v1
Enhancing Efficiency of Local Projections Estimation with Volatility Clustering in High-Frequency Data
inefficiency in high-frequency economic and financial data with volatility
clustering. We incorporate a generalized autoregressive conditional
heteroskedasticity (GARCH) process to resolve serial correlation issues and
extend the model with GARCH-X and GARCH-HAR structures. Monte Carlo simulations
show that exploiting serial dependence in LP error structures improves
efficiency across forecast horizons, remains robust to persistent volatility,
and yields greater gains as sample size increases. Our findings contribute to
refining LP estimation, enhancing its applicability in analyzing economic
interventions and financial market dynamics.
arXiv link: http://arxiv.org/abs/2503.02217v1
How Do Consumers Really Choose: Exposing Hidden Preferences with the Mixture of Experts Model
research, as firms increasingly seek to personalize offerings and optimize
customer engagement. Traditional choice modeling frameworks, such as
multinomial logit (MNL) and mixed logit models, impose rigid parametric
assumptions that limit their ability to capture the complexity of consumer
decision-making. This study introduces the Mixture of Experts (MoE) framework
as a machine learning-driven alternative that dynamically segments consumers
based on latent behavioral patterns. By leveraging probabilistic gating
functions and specialized expert networks, MoE provides a flexible,
nonparametric approach to modeling heterogeneous preferences.
Empirical validation using large-scale retail data demonstrates that MoE
significantly enhances predictive accuracy over traditional econometric models,
capturing nonlinear consumer responses to price variations, brand preferences,
and product attributes. The findings underscore MoEs potential to improve
demand forecasting, optimize targeted marketing strategies, and refine
segmentation practices. By offering a more granular and adaptive framework,
this study bridges the gap between data-driven machine learning approaches and
marketing theory, advocating for the integration of AI techniques in managerial
decision-making and strategic consumer insights.
arXiv link: http://arxiv.org/abs/2503.05800v1
Dynamic Factor Correlation Model
variation-free parametrization of factor loadings. The model is applicable to
high dimensions and can accommodate time-varying correlations, heterogeneous
heavy-tailed distributions, and dependent idiosyncratic shocks, such as those
observed in returns on stocks in the same subindustry. We apply the model to a
"small universe" with 12 asset returns and to a "large universe" with 323 asset
returns. The former facilitates a comprehensive empirical analysis and
comparisons and the latter demonstrates the flexibility and scalability of the
model.
arXiv link: http://arxiv.org/abs/2503.01080v1
Vector Copula Variational Inference and Dependent Block Posterior Approximations
Bayesian posterior. For large and complex models a common choice is to assume
independence between multivariate blocks in a partition of the parameter space.
While this simplifies the problem it can reduce accuracy. This paper proposes
using vector copulas to capture dependence between the blocks parsimoniously.
Tailored multivariate marginals are constructed using learnable transport maps.
We call the resulting joint distribution a “dependent block posterior”
approximation. Vector copula models are suggested that make tractable and
flexible variational approximations. They allow for differing marginals,
numbers of blocks, block sizes and forms of between block dependence. They also
allow for solution of the variational optimization using efficient stochastic
gradient methods. The approach is demonstrated using four different statistical
models and 16 datasets which have posteriors that are challenging to
approximate. This includes models that use global-local shrinkage priors for
regularization, and hierarchical models for smoothing and heteroscedastic time
series. In all cases, our method produces more accurate posterior
approximations than benchmark VI methods that either assume block independence
or factor-based dependence, at limited additional computational cost. A python
package implementing the method is available on GitHub at
https://github.com/YuFuOliver/VCVI_Rep_PyPackage.
arXiv link: http://arxiv.org/abs/2503.01072v2
Bayesian inference for dynamic spatial quantile models with interactive effects
systems, large-scale spatial panel data presents new methodological and
computational challenges. This paper introduces a dynamic spatial panel
quantile model that incorporates unobserved heterogeneity. The proposed model
captures the dynamic structure of panel data, high-dimensional cross-sectional
dependence, and allows for heterogeneous regression coefficients. To estimate
the model, we propose a novel Bayesian Markov Chain Monte Carlo (MCMC)
algorithm. Contributions to Bayesian computation include the development of
quantile randomization, a new Gibbs sampler for structural parameters, and
stabilization of the tail behavior of the inverse Gaussian random generator. We
establish Bayesian consistency for the proposed estimation method as both the
time and cross-sectional dimensions of the panel approach infinity. Monte Carlo
simulations demonstrate the effectiveness of the method. Finally, we illustrate
the applicability of the approach through a case study on the quantile
co-movement structure of the gasoline market.
arXiv link: http://arxiv.org/abs/2503.00772v2
Wikipedia Contributions in the Wake of ChatGPT
ChatGPT following its introduction? We estimate the impact using
differences-in-differences models, with dissimilar Wikipedia articles as a
baseline for comparison, to examine how changes in voluntary knowledge
contributions and information-seeking behavior differ by article content. Our
analysis reveals that newly created, popular articles whose content overlaps
with ChatGPT 3.5 saw a greater decline in editing and viewership after the
November 2022 launch of ChatGPT than dissimilar articles did. These findings
indicate heterogeneous substitution effects, where users selectively engage
less with existing platforms when AI provides comparable content. This points
to potential uneven impacts on the future of human-driven online knowledge
contributions.
arXiv link: http://arxiv.org/abs/2503.00757v1
Causal Inference on Outcomes Learned from Text
randomized trials. Based on a simple econometric framework in which text may
capture outcomes of interest, our procedure addresses three questions: First,
is the text affected by the treatment? Second, which outcomes is the effect on?
And third, how complete is our description of causal effects? To answer all
three questions, our approach uses large language models (LLMs) that suggest
systematic differences across two groups of text documents and then provides
valid inference based on costly validation. Specifically, we highlight the need
for sample splitting to allow for statistical validation of LLM outputs, as
well as the need for human labeling to validate substantive claims about how
documents differ across groups. We illustrate the tool in a proof-of-concept
application using abstracts of academic manuscripts.
arXiv link: http://arxiv.org/abs/2503.00725v1
The Uncertainty of Machine Learning Predictions in Asset Pricing
point estimates, ignoring uncertainty. We develop new methods to construct
forecast confidence intervals for expected returns obtained from neural
networks. We show that neural network forecasts of expected returns share the
same asymptotic distribution as classic nonparametric methods, enabling a
closed-form expression for their standard errors. We also propose a
computationally feasible bootstrap to obtain the asymptotic distribution. We
incorporate these forecast confidence intervals into an uncertainty-averse
investment framework. This provides an economic rationale for shrinkage
implementations of portfolio selection. Empirically, our methods improve
out-of-sample performance.
arXiv link: http://arxiv.org/abs/2503.00549v1
GMM and M Estimation under Network Dependence
network-dependent data. To this end, I build on Kojevnikov, Marmer, and Song
(KMS, 2021) and develop a novel uniform law of large numbers (ULLN), which is
essential to ensure desired asymptotic behaviors of nonlinear estimators (e.g.,
Newey and McFadden, 1994, Section 2). Using this ULLN, I establish the
consistency and asymptotic normality of both GMM and M estimators. For
practical convenience, complete estimation and inference procedures are also
provided.
arXiv link: http://arxiv.org/abs/2503.00290v2
Location Characteristics of Conditional Selective Confidence Intervals via Polyhedral Methods
interval constructed via the polyhedral method. The interval is derived from
the distribution of a test statistic conditional on the event of statistical
significance. For a one-sided test, its behavior depends on whether the
parameter is highly or only marginally significant. In the highly significant
case, the interval closely resembles the conventional confidence interval that
ignores selection. By contrast, when the parameter is only marginally
significant, the interval may shift far to the left of zero, potentially
excluding all a priori plausible parameter values. This "location problem" does
not arise if significance is determined by a two-sided test or by a one-sided
test with randomized response (e.g., data carving).
arXiv link: http://arxiv.org/abs/2502.20917v2
Structural breaks detection and variable selection in dynamic linear regression via the Iterative Fused LASSO in high dimension
high-dimensional environments, addressing two critical challenges: variable
selection from a large pool of candidates, and the detection of structural
break points, where the model's parameters shift. This effort centers on
formulating a least squares estimation problem with regularization constraints,
drawing on techniques such as Fused LASSO and AdaLASSO, which are
well-established in machine learning. Our primary achievement is the creation
of an efficient algorithm capable of handling high-dimensional cases within
practical time limits. By addressing these pivotal challenges, our methodology
holds the potential for widespread adoption. To validate its effectiveness, we
detail the iterative algorithm and benchmark its performance against the widely
recognized Path Algorithm for Generalized Lasso. Comprehensive simulations and
performance analyses highlight the algorithm's strengths. Additionally, we
demonstrate the methodology's applicability and robustness through simulated
case studies and a real-world example involving a stock portfolio dataset.
These examples underscore the methodology's practical utility and potential
impact across diverse high-dimensional settings.
arXiv link: http://arxiv.org/abs/2502.20816v2
Economic Causal Inference Based on DML Framework: Python Implementation of Binary and Continuous Treatment Variables
Machine Learning (DML) using Anaconda's Jupyter Notebook and the DML software
package from GitHub. The research focuses on causal inference experiments for
both binary and continuous treatment variables. The findings reveal that the
DML model demonstrates relatively stable performance in calculating the Average
Treatment Effect (ATE) and its robustness metrics. However, the study also
highlights that the computation of Conditional Average Treatment Effect (CATE)
remains a significant challenge for future DML modeling, particularly in the
context of continuous treatment variables. This underscores the need for
further research and development in this area to enhance the model's
applicability and accuracy.
arXiv link: http://arxiv.org/abs/2502.19898v1
Semiparametric Triple Difference Estimators
well-known difference-in-differences framework. It relaxes the parallel trends
assumption of the difference-in-differences framework through leveraging data
from an auxiliary domain. Despite being commonly applied in empirical research,
the triple difference framework has received relatively limited attention in
the statistics literature. Specifically, investigating the intricacies of
identification and the design of robust and efficient estimators for this
framework has remained largely unexplored. This work aims to address these gaps
in the literature. From the identification standpoint, we present outcome
regression and weighting methods to identify the average treatment effect on
the treated in both panel data and repeated cross-section settings. For the
latter, we relax the commonly made assumption of time-invariant composition of
units. From the estimation perspective, we develop semiparametric estimators
for the triple difference framework in both panel data and repeated
cross-sections settings. These estimators are based on the cross-fitting
technique, and flexible machine learning tools can be used to estimate the
nuisance components. We characterize conditions under which our proposed
estimators are efficient, doubly robust, root-n consistent and asymptotically
normal. As an application of our proposed methodology, we examined the effect
of mandated maternity benefits on the hourly wages of women of childbearing age
and found that these mandates result in a 2.6% drop in hourly wages.
arXiv link: http://arxiv.org/abs/2502.19788v3
Time-Varying Identification of Structural Vector Autoregressions
vector autoregression with data-driven time-varying identification. The model
selects among alternative patterns of exclusion restrictions to identify
structural shocks within the Markov process regimes. We implement the selection
through a multinomial prior distribution over these patterns, which is a
spike'n'slab prior for individual parameters. By combining a Markov-switching
structural matrix with heteroskedastic structural shocks following a stochastic
volatility process, the model enables shock identification through time-varying
volatility within a regime. As a result, the exclusion restrictions become
over-identifying, and their selection is driven by the signal from the data.
Our empirical application shows that data support time variation in the US
monetary policy shock identification. We also verify that time-varying
volatility identifies the monetary policy shock within the regimes.
arXiv link: http://arxiv.org/abs/2502.19659v1
Triple Difference Designs with Heterogeneous Treatment Effects
economics. The advantage of a triple difference design is that, within a
treatment group, it allows for another subgroup of the population --
potentially less impacted by the treatment -- to serve as a control for the
subgroup of interest. While literature on difference-in-differences has
discussed heterogeneity in treatment effects between treated and control groups
or over time, little attention has been given to the implications of
heterogeneity in treatment effects between subgroups. In this paper, I show
that the parameter identified under the usual triple difference assumptions
does not allow for causal interpretation of differences between subgroups when
subgroups may differ in their underlying (unobserved) treatment effects. I
propose a new parameter of interest, the causal difference in average treatment
effects on the treated, which makes causal comparisons between subgroups. I
discuss assumptions for identification and derive the semiparametric efficiency
bounds for this parameter. I then propose doubly-robust, efficient estimators
for this parameter. I use a simulation study to highlight the desirable
finite-sample properties of these estimators, as well as to show the difference
between this parameter and the usual triple difference parameter of interest.
An empirical application shows the importance of considering treatment effect
heterogeneity in practical applications.
arXiv link: http://arxiv.org/abs/2502.19620v2
Empirical likelihood approach for high-dimensional moment restrictions with dependent data
projections, and multivariate volatility models -- feature complex dynamic
interactions and spillovers across many time series. These models can be
integrated into a unified framework, with high-dimensional parameters
identified by moment conditions. As the number of parameters and moment
conditions may surpass the sample size, we propose adding a double penalty to
the empirical likelihood criterion to induce sparsity and facilitate dimension
reduction. Notably, we utilize a marginal empirical likelihood approach despite
temporal dependence in the data. Under regularity conditions, we provide
asymptotic guarantees for our method, making it an attractive option for
estimating large-scale multivariate time series models. We demonstrate the
versatility of our procedure through extensive Monte Carlo simulations and
three empirical applications, including analyses of US sectoral inflation
rates, fiscal multipliers, and volatility spillover in China's banking sector.
arXiv link: http://arxiv.org/abs/2502.18970v2
Minimum Distance Estimation of Quantile Panel Data Models
models where unit effects may be correlated with covariates. This
computationally efficient method involves two stages: first, computing quantile
regression within each unit, then applying GMM to the first-stage fitted
values. Our estimators apply to (i) classical panel data, tracking units over
time, and (ii) grouped data, where individual-level data are available, but
treatment varies at the group level. Depending on the exogeneity assumptions,
this approach provides quantile analogs of classic panel data estimators,
including fixed effects, random effects, between, and Hausman-Taylor
estimators. In addition, our method offers improved precision for grouped
(instrumental) quantile regression compared to existing estimators. We
establish asymptotic properties as the number of units and observations per
unit jointly diverge to infinity. Additionally, we introduce an inference
procedure that automatically adapts to the potentially unknown convergence rate
of the estimator. Monte Carlo simulations demonstrate that our estimator and
inference procedure perform well in finite samples, even when the number of
observations per unit is moderate. In an empirical application, we examine the
impact of the food stamp program on birth weights. We find that the program's
introduction increased birth weights predominantly at the lower end of the
distribution, highlighting the ability of our method to capture heterogeneous
effects across the outcome distribution.
arXiv link: http://arxiv.org/abs/2502.18242v1
Certified Decisions
research, yet their connection to subsequent decision-making is often unclear.
We develop a theory of certified decisions that pairs recommended decisions
with inferential guarantees. Specifically, we attach P-certificates -- upper
bounds on loss that hold with probability at least $1-\alpha$ -- to recommended
actions. We show that such certificates allow "safe," risk-controlling adoption
decisions for ambiguity-averse downstream decision-makers. We further prove
that it is without loss to limit attention to P-certificates arising as minimax
decisions over confidence sets, or what Manski (2021) terms "as-if decisions
with a set estimate." A parallel argument applies to E-certified decisions
obtained from e-values in settings with unbounded loss.
arXiv link: http://arxiv.org/abs/2502.17830v1
Optimal Salaries of Researchers with Motivational Emergence
examines the system of nonuniform wage distribution for researchers. A
nonlinear mathematical model of optimal remuneration for scientific workers has
been developed, considering key and additive aspects of scientific activity:
basic qualifications, research productivity, collaborative projects, skill
enhancement, distinctions, and international collaborations. Unlike traditional
linear schemes, the proposed approach is based on exponential and logarithmic
dependencies, allowing for the consideration of saturation effects and
preventing artificial wage growth due to mechanical increases in scientific
productivity indicators.
The study includes detailed calculations of optimal, minimum, and maximum
wages, demonstrating a fair distribution of remuneration on the basis of
researcher productivity. A linear increase in publication activity or grant
funding should not lead to uncontrolled salary growth, thus avoiding
distortions in the motivational system. The results of this study can be used
to reform and modernize the wage system for researchers in Kazakhstan and other
countries, as well as to optimize grant-based science funding mechanisms. The
proposed methodology fosters scientific motivation, long-term productivity, and
the internationalization of research while also promoting self-actualization
and ultimately forming an adequate and authentic reward system for the research
community.
Specifically, in resource-limited scientific systems, science policy should
focus on the qualitative development of individual researchers rather than
quantitative expansion (e.g., increasing the number of scientists). This can be
achieved through the productive progress of their motivation and
self-actualization.
arXiv link: http://arxiv.org/abs/2502.17271v1
Conditional Triple Difference-in-Differences
effects in empirical work. Surveying the literature, we find that most
applications include controls. We show that this standard practice is generally
biased for the target causal estimand when covariate distributions differ
across groups. To address this, we propose identifying a causal estimand by
fixing the covariate distribution to that of one group. We then develop a
double-robust estimator and illustrate its application in a canonical policy
setting.
arXiv link: http://arxiv.org/abs/2502.16126v3
Binary Outcome Models with Extreme Covariates: Estimation and Prediction
extreme events on binary outcomes and subsequently forecast future outcomes.
Our approach, based on Bayes' theorem and regularly varying (RV) functions,
facilitates a Pareto approximation in the tail without imposing parametric
assumptions beyond the tail. We analyze cross-sectional as well as static and
dynamic panel data models, incorporate additional covariates, and accommodate
the unobserved unit-specific tail thickness and RV functions in panel data. We
establish consistency and asymptotic normality of our tail estimator, and show
that our objective function converges to that of a panel Logit regression on
tail observations with the log extreme covariate as a regressor, thereby
simplifying implementation. The empirical application assesses whether small
banks become riskier when local housing prices sharply decline, a crucial
channel in the 2007--2008 financial crisis.
arXiv link: http://arxiv.org/abs/2502.16041v1
Clustered Network Connectedness: A New Measurement Framework with Application to Global Equity Markets
economic contexts. In recent decades, a large literature has developed and
applied flexible methods for measuring network connectedness and its evolution,
based on variance decompositions from vector autoregressions (VARs), as in
Diebold and Yilmaz (2014). Those VARs are, however, typically identified using
full orthogonalization (Sims, 1980), or no orthogonalization (Koop, Pesaran,
and Potter, 1996; Pesaran and Shin, 1998), which, although useful, are special
and extreme cases of a more general framework that we develop in this paper. In
particular, we allow network nodes to be connected in "clusters", such as asset
classes, industries, regions, etc., where shocks are orthogonal across clusters
(Sims style orthogonalized identification) but correlated within clusters
(Koop-Pesaran-Potter-Shin style generalized identification), so that the
ordering of network nodes is relevant across clusters but irrelevant within
clusters. After developing the clustered connectedness framework, we apply it
in a detailed empirical exploration of sixteen country equity markets spanning
three global regions.
arXiv link: http://arxiv.org/abs/2502.15458v1
A Supervised Screening and Regularized Factor-Based Method for Time Series Forecasting
effective machine learning tool for dimension reduction with many applications
in statistics, economics, and finance. This paper introduces a Supervised
Screening and Regularized Factor-based (SSRF) framework that systematically
addresses high-dimensional predictor sets through a structured four-step
procedure integrating both static and dynamic forecasting mechanisms. The
static approach selects predictors via marginal correlation screening and
scales them using univariate predictive slopes, while the dynamic method
screens and scales predictors based on time series regression incorporating
lagged predictors. PCA then extracts latent factors from the scaled predictors,
followed by LASSO regularization to refine predictive accuracy. In the
simulation study, we validate the effectiveness of SSRF and identify its
parameter adjustment strategies in high-dimensional data settings. An empirical
analysis of macroeconomic indices in China demonstrates that the SSRF method
generally outperforms several commonly used forecasting techniques in
out-of-sample predictions.
arXiv link: http://arxiv.org/abs/2502.15275v1
Policy-Oriented Binary Classification: Improving (KD-)CART Final Splits for Subpopulation Targeting
based on binary outcomes and target subpopulations whose probability of the
binary event exceeds a threshold. We call such problems Latent Probability
Classification (LPC). Practitioners typically employ Classification and
Regression Trees (CART) for LPC. We prove that in the context of LPC, classic
CART and the knowledge distillation method, whose student model is a CART
(referred to as KD-CART), are suboptimal. We propose Maximizing Distance Final
Split (MDFS), which generates split rules that strictly dominate CART/KD-CART
under the unique intersect assumption. MDFS identifies the unique best split
rule, is consistent, and targets more vulnerable subpopulations than
CART/KD-CART. To relax the unique intersect assumption, we additionally propose
Penalized Final Split (PFS) and weighted Empirical risk Final Split (wEFS).
Through extensive simulation studies, we demonstrate that the proposed methods
predominantly outperform CART/KD-CART. When applied to real-world datasets,
MDFS generates policies that target more vulnerable subpopulations than the
CART/KD-CART.
arXiv link: http://arxiv.org/abs/2502.15072v2
biastest: Testing parameter equality across different models in Stata
to compare the coefficients of different regression models, enabling
researchers to assess the robustness and consistency of their empirical
findings. This command is particularly valuable for evaluating alternative
modeling approaches, such as ordinary least squares versus robust regression,
robust regression versus median regression, quantile regression across
different quantiles, and fixed effects versus random effects models in panel
data analysis. By providing both variable-specific and joint tests, biastest
command offers a comprehensive framework for detecting bias or significant
differences in model estimates, ensuring that researchers can make informed
decisions about model selection and interpretation.
arXiv link: http://arxiv.org/abs/2502.15049v2
An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model
known as offline Maximum Entropy-Regularized Inverse Reinforcement Learning
(offline MaxEnt-IRL) in machine learning. The objective is to recover reward or
$Q^*$ functions that govern agent behavior from offline behavior data. In this
paper, we propose a globally convergent gradient-based method for solving these
problems without the restrictive assumption of linearly parameterized rewards.
The novelty of our approach lies in introducing the Empirical Risk Minimization
(ERM) based IRL/DDC framework, which circumvents the need for explicit state
transition probability estimation in the Bellman equation. Furthermore, our
method is compatible with non-parametric estimation techniques such as neural
networks. Therefore, the proposed method has the potential to be scaled to
high-dimensional, infinite state spaces. A key theoretical insight underlying
our approach is that the Bellman residual satisfies the Polyak-Lojasiewicz (PL)
condition -- a property that, while weaker than strong convexity, is sufficient
to ensure fast global convergence guarantees. Through a series of synthetic
experiments, we demonstrate that our approach consistently outperforms
benchmark methods and state-of-the-art alternatives.
arXiv link: http://arxiv.org/abs/2502.14131v5
Locally Robust Policy Learning: Inequality, Inequality of Opportunity and Intergenerational Mobility
individuals. The optimal treatment choice depends on the welfare function that
the policy maker has in mind and it is referred to as the policy learning
problem. I study a general setting for policy learning with semiparametric
Social Welfare Functions (SWFs) that can be estimated by locally
robust/orthogonal moments based on U-statistics. This rich class of SWFs
substantially expands the setting in Athey and Wager (2021) and accommodates a
wider range of distributional preferences. Three main applications of the
general theory motivate the paper: (i) Inequality aware SWFs, (ii) Inequality
of Opportunity aware SWFs and (iii) Intergenerational Mobility SWFs. I use the
Panel Study of Income Dynamics (PSID) to assess the effect of attending
preschool on adult earnings and estimate optimal policy rules based on parental
years of education and parental income.
arXiv link: http://arxiv.org/abs/2502.13868v1
The Risk-Neutral Equivalent Pricing of Model-Uncertainty
utility-maximization frameworks and seek theoretical comprehensiveness. We move
toward practice by considering binary model-risks and by emphasizing
'constraints' over 'preference'. This decomposes viable economic asset-pricing
into that of model and non-model risks separately, leading to a unique and
convenient model-risk pricing formula. Its parameter, a dynamically conserved
constant of model-risk inference, allows an integrated representation of
ex-ante risk-pricing and bias such that their ex-post impacts are disentangled
via well-known anomalies, Momentum and Low-Risk, whose risk-reward patterns
acquire a fresh significance: peak-reward reveals ex-ante risk-premia, and
peak-location, bias.
arXiv link: http://arxiv.org/abs/2502.13744v9
Tensor dynamic conditional correlation model: A new way to pursuit "Holy Grail of investing"
correlations, aligning well with the principle of "Holy Grail of investing" in
terms of portfolio selection. The returns of styles naturally form a
tensor-valued time series, which requires new tools for studying the dynamics
of the conditional correlation matrix to facilitate the aforementioned
principle. Towards this goal, we introduce a new tensor dynamic conditional
correlation (TDCC) model, which is based on two novel treatments:
trace-normalization and dimension-normalization. These two normalizations adapt
to the tensor nature of the data, and they are necessary except when the tensor
data reduce to vector data. Moreover, we provide an easy-to-implement
estimation procedure for the TDCC model, and examine its finite sample
performance by simulations. Finally, we assess the usefulness of the TDCC model
in international portfolio selection across ten global markets and in large
portfolio selection for 1800 stocks from the Chinese stock market.
arXiv link: http://arxiv.org/abs/2502.13461v1
Balancing Flexibility and Interpretability: A Conditional Linear Model Estimation via Random Forest
forms, while nonparametric techniques, despite their flexibility, frequently
lack interpretability. This paper proposes a parsimonious alternative by
modeling the outcome $Y$ as a linear function of a vector of variables of
interest $X$, conditional on additional covariates
$Z$. Specifically, the conditional expectation is expressed as
$E[Y|X,Z]=X^{T}\beta(Z)$,
where $\beta(\cdot)$ is an unknown Lipschitz-continuous function.
We introduce an adaptation of the Random Forest (RF) algorithm to estimate this
model, balancing the flexibility of machine learning methods with the
interpretability of traditional linear models. This approach addresses a key
challenge in applied econometrics by accommodating heterogeneity in the
relationship between covariates and outcomes. Furthermore, the heterogeneous
partial effects of $X$ on $Y$ are represented by
$\beta(\cdot)$ and can be directly estimated using our proposed
method. Our framework effectively unifies established parametric and
nonparametric models, including varying-coefficient, switching regression, and
additive models. We provide theoretical guarantees, such as pointwise and
$L^p$-norm rates of convergence for the estimator, and establish a pointwise
central limit theorem through subsampling, aiding inference on the function
$\boldsymbol\beta(\cdot)$. We present Monte Carlo simulation results to assess
the finite-sample performance of the method.
arXiv link: http://arxiv.org/abs/2502.13438v1
Functional Network Autoregressive Models for Panel Data
analyzing network interactions of functional outcomes in panel data settings.
In this framework, an individual's outcome function is influenced by the
outcomes of others through a simultaneous equation system. To estimate the
functional parameters of interest, we need to address the endogeneity issue
arising from these simultaneous interactions among outcome functions. This
issue is carefully handled by developing a novel functional moment-based
estimator. We establish the consistency, convergence rate, and pointwise
asymptotic normality of the proposed estimator. Additionally, we discuss the
estimation of marginal effects and impulse response analysis. As an empirical
illustration, we analyze the demand for a bike-sharing service in the U.S. The
results reveal statistically significant spatial interactions in bike
availability across stations, with interaction patterns varying over the time
of day.
arXiv link: http://arxiv.org/abs/2502.13431v1
Robust Inference for the Direct Average Treatment Effect with Treatment Assignment Interference
inference settings with random network interference. We study the large-sample
distributional properties of the classical difference-in-means Hajek treatment
effect estimator, and propose a robust inference procedure for the
(conditional) direct average treatment effect. Our framework allows for
cross-unit interference in both the outcome equation and the treatment
assignment mechanism. Drawing from statistical physics, we introduce a novel
Ising model to capture complex dependencies in treatment assignment, and derive
three results. First, we establish a Berry-Esseen-type distributional
approximation that holds pointwise in the degree of interference induced by the
Ising model. This approximation recovers existing results in the absence of
treatment interference, and highlights the fragility of inference procedures
that do not account for the presence of interference in treatment assignment.
Second, we establish a uniform distributional approximation for the Hajek
estimator and use it to develop robust inference procedures that remain valid
uniformly over all interference regimes allowed by the model. Third, we propose
a novel resampling method to implement the robust inference procedure and
validate its performance through Monte Carlo simulations. A key technical
innovation is the introduction of a conditional i.i.d. Gaussianization that may
have broader applications. We also discuss extensions and generalizations of
our results.
arXiv link: http://arxiv.org/abs/2502.13238v2
Imputation Strategies for Rightcensored Wages in Longitudinal Datasets
reported wages are typically top-coded for confidentiality reasons. In
administrative databases the information is often collected only up to a
pre-specified threshold, for example, the contribution limit for the social
security system. While directly accounting for the censoring is possible for
some analyses, the most flexible solution is to impute the values above the
censoring point. This strategy offers the advantage that future users of the
data no longer need to implement possibly complicated censoring estimators.
However, standard cross-sectional imputation routines relying on the classical
Tobit model to impute right-censored data have a high risk of introducing bias
from uncongeniality (Meng, 1994) as future analyses to be conducted on the
imputed data are unknown to the imputer. Furthermore, as we show using a
large-scale administrative database from the German Federal Employment agency,
the classical Tobit model offers a poor fit to the data. In this paper, we
present some strategies to address these problems. Specifically, we use
leave-one-out means as suggested by Card et al. (2013) to avoid biases from
uncongeniality and rely on quantile regression or left censoring to improve the
model fit. We illustrate the benefits of these modeling adjustments using the
German Structure of Earnings Survey, which is (almost) unaffected by censoring
and can thus serve as a testbed to evaluate the imputation procedures.
arXiv link: http://arxiv.org/abs/2502.12967v1
Assortative Marriage and Geographic Sorting
sorting and educational homogamy, with college graduates increasingly
concentrating in high-skill cities and marrying similarly educated spouses. We
develop and estimate a spatial equilibrium model with local labor, housing, and
marriage markets, incorporating a marriage matching framework with transferable
utility. Using the model, we estimate trends in assortative preferences,
quantify the interplay between marital and geographic sorting, and assess their
combined impact on household inequality. Welfare analyses show that after
accounting for marriage, the college well-being gap grew substantially more
than the college wage gap.
arXiv link: http://arxiv.org/abs/2502.12867v1
Causal Inference for Qualitative Outcomes
discontinuity, and difference-in-differences are widely used to identify and
estimate treatment effects. However, when outcomes are qualitative, their
application poses fundamental challenges. This paper highlights these
challenges and proposes an alternative framework that focuses on well-defined
and interpretable estimands. We show that conventional identification
assumptions suffice for identifying the new estimands and outline simple,
intuitive estimation strategies that remain fully compatible with conventional
econometric methods. We provide an accompanying open-source R package,
$causalQual$, which is publicly available on CRAN.
arXiv link: http://arxiv.org/abs/2502.11691v2
Maximal Inequalities for Separately Exchangeable Empirical Processes
associated with separately exchangeable random arrays. For fixed index
dimension $K\ge 1$, we establish a global maximal inequality bounding the
$q$-th moment ($q\in[1,\infty)$) of the supremum of these processes. We also
obtain a refined local maximal inequality controlling the first absolute moment
of the supremum. Both results are proved for a general pointwise measurable
function class. Our approach uses a new technique partitioning the index set
into transversal groups, decoupling dependencies and enabling more
sophisticated higher moment bounds.
arXiv link: http://arxiv.org/abs/2502.11432v2
Regression Modeling of the Count Relational Data with Exchangeable Dependencies
common in social science. Most existing methods either assume the count edges
are derived from continuous random variables or model the edge dependency by
parametric distributions. In this paper, we develop a latent multiplicative
Poisson model for relational data with count edges. Our approach directly
models the edge dependency of count data by the pairwise dependence of latent
errors, which are assumed to be weakly exchangeable. This assumption not only
covers a variety of common network effects, but also leads to a concise
representation of the error covariance. In addition, the identification and
inference of the mean structure, as well as the regression coefficients, depend
on the errors only through their covariance. Such a formulation provides
substantial flexibility for our model. Based on this, we propose a
pseudo-likelihood based estimator for the regression coefficients,
demonstrating its consistency and asymptotic normality. The newly suggested
method is applied to a food-sharing network, revealing interesting network
effects in gift exchange behaviors.
arXiv link: http://arxiv.org/abs/2502.11255v1
Policy Learning with Confidence
expected welfare under estimation uncertainty. The proposed method explicitly
balances the size of the estimated welfare against the uncertainty inherent in
its estimation, ensuring that chosen policies meet a reporting guarantee,
namely, that actual welfare is guaranteed not to fall below the reported
estimate with a pre-specified confidence level. We produce the efficient
decision frontier, describing policies that offer maximum estimated welfare for
a given acceptable level of estimation risk. We apply this approach to a
variety of settings, including the selection of policy rules that allocate
individuals to treatments and the allocation of limited budgets among competing
social programs.
arXiv link: http://arxiv.org/abs/2502.10653v2
Residualised Treatment Intensity and the Estimation of Average Partial Effects
(APE) of a continuous treatment variable on an outcome variable in the presence
of non-linear and non-additively separable confounding of unknown form.
Identification of the APE is achieved by generalising Stein's Lemma (Stein,
1981), leveraging an exogenous error component in the treatment along with a
flexible functional relationship between the treatment and the confounders. The
identification results for R-OLS are used to characterize the properties of
Double/Debiased Machine Learning (Chernozhukov et al., 2018), specifying the
conditions under which the APE is estimated consistently. A novel decomposition
of the ordinary least squares estimand provides intuition for these results.
Monte Carlo simulations demonstrate that the proposed estimator outperforms
existing methods, delivering accurate estimates of the true APE and exhibiting
robustness to moderate violations of its underlying assumptions. The
methodology is further illustrated through an empirical application to Fetzer
(2019).
arXiv link: http://arxiv.org/abs/2502.10301v1
Self-Normalized Inference in (Quantile, Expected Shortfall) Regressions for Time Series
time series expected shortfall regressions and, as a corollary, also in
quantile regressions. Extant methods for such time series regressions, based on
a bootstrap or direct estimation of the long-run variance, are computationally
more involved, require the choice of tuning parameters and have serious size
distortions when the regression errors are strongly serially dependent. In
contrast, our inference tools only require estimates of the (quantile, expected
shortfall) regression parameters that are computed on an expanding window, and
are correctly sized as we show in simulations. Two empirical applications to
stock return predictability and to Growth-at-Risk demonstrate the practical
usefulness of the developed inference tools.
arXiv link: http://arxiv.org/abs/2502.10065v2
Prioritized Ranking Experimental Design Using Recommender Systems in Two-Sided Platforms
estimating causal effects in experimental settings. We propose a novel
experimental design to mitigate the interference bias in estimating the total
average treatment effect (TATE) of item-side interventions in online two-sided
marketplaces. Our Two-Sided Prioritized Ranking (TSPR) design uses the
recommender system as an instrument for experimentation. TSPR strategically
prioritizes items based on their treatment status in the listings displayed to
users. We designed TSPR to provide users with a coherent platform experience by
ensuring access to all items and a consistent realization of their treatment by
all users. We evaluate our experimental design through simulations using a
search impression dataset from an online travel agency. Our methodology closely
estimates the true simulated TATE, while a baseline item-side estimator
significantly overestimates TATE.
arXiv link: http://arxiv.org/abs/2502.09806v1
High-dimensional censored MIDAS logistic regression for corporate survival forecasting
problem marked by three key statistical hurdles: (i) right censoring, (ii)
high-dimensional predictors, and (iii) mixed-frequency data. To overcome these
complexities, we introduce a novel high-dimensional censored MIDAS (Mixed Data
Sampling) logistic regression. Our approach handles censoring through inverse
probability weighting and achieves accurate estimation with numerous
mixed-frequency predictors by employing a sparse-group penalty. We establish
finite-sample bounds for the estimation error, accounting for censoring, the
MIDAS approximation error, and heavy tails. The superior performance of the
method is demonstrated through Monte Carlo simulations. Finally, we present an
extensive application of our methodology to predict the financial distress of
Chinese-listed firms. Our novel procedure is implemented in the R package
'Survivalml'.
arXiv link: http://arxiv.org/abs/2502.09740v1
On (in)consistency of M-estimators under contamination
that commonly used robust estimators such as the median and the Huber estimator
are inconsistent under asymmetric contamination, while the Tukey estimator is
consistent. In order to make nuisance parameter free inference based on the
Tukey estimator a consistent scale estimator is required. However, standard
robust scale estimators such as the interquartile range and the median absolute
deviation are inconsistent under contamination.
arXiv link: http://arxiv.org/abs/2502.09145v1
Difference-in-Differences and Changes-in-Changes with Sample Selection
affects whether certain units are observed. It is a common pitfall in
longitudinal studies, particularly in settings where treatment assignment is
confounded. In this paper, I highlight the drawbacks of one of the most popular
identification strategies in such settings: Difference-in-Differences (DiD).
Specifically, I employ principal stratification analysis to show that the
conventional ATT estimand may not be well defined, and the DiD estimand cannot
be interpreted causally without additional assumptions. To address these
issues, I develop an identification strategy to partially identify causal
effects on the subset of units with well-defined and observed outcomes under
both treatment regimes. I adapt Lee bounds to the Changes-in-Changes (CiC)
setting (Athey & Imbens, 2006), leveraging the time dimension of the data to
relax the unconfoundedness assumption in the original trimming strategy of Lee
(2009). This setting has the DiD identification strategy as a particular case,
which I also implement in the paper. Additionally, I explore how to leverage
multiple sources of sample selection to relax the monotonicity assumption in
Lee (2009), which may be of independent interest. Alongside the identification
strategy, I present estimators and inference results. I illustrate the
relevance of the proposed methodology by analyzing a job training program in
Colombia.
arXiv link: http://arxiv.org/abs/2502.08614v1
Scenario Analysis with Multivariate Bayesian Machine Learning Models
such as variants of conditional forecasts and generalized impulse responses,
for use with dynamic nonparametric models. The proposed algorithms are based on
predictive simulation and sequential Monte Carlo methods. Their utility is
demonstrated with three applications: (1) conditional forecasts based on stress
test scenarios, measuring (2) macroeconomic risk under varying financial
stress, and estimating the (3) asymmetric effects of financial shocks in the US
and their international spillovers. Our empirical results indicate the
importance of nonlinearities and asymmetries in relationships between
macroeconomic and financial variables.
arXiv link: http://arxiv.org/abs/2502.08440v3
Inference in dynamic models for panel data using the moving block bootstrap
effects when (some of) the regressors are not strictly exogenous. Under
asymptotics where the number of cross-sectional observations and time periods
grow at the same rate, the within-group estimator is consistent but its limit
distribution features a bias term. In this paper we show that a panel version
of the moving block bootstrap, where blocks of adjacent cross-sections are
resampled with replacement, replicates the limit distribution of the
within-group estimator. Confidence ellipsoids and hypothesis tests based on the
reverse-percentile bootstrap are thus asymptotically valid without the need to
take the presence of bias into account.
arXiv link: http://arxiv.org/abs/2502.08311v1
Are Princelings Truly Busted? Evaluating Transaction Discounts in China's Land Market
$Journal$ $of$ $Economics$ 134(1): 185-226). Inspecting the data reveals that
nearly one-third of the transactions (388,903 out of 1,208,621) are perfect
duplicates of other rows, excluding the transaction number. Replicating the
analysis on the data sans-duplicates yields a slightly smaller but still
statistically significant princeling effect, robust across the regression
results. Further analysis also reveals that coefficients interpreted as the
effect of logarithm of area actually reflect the effect of scaled values of
area; this paper also reinterprets and contextualizes these results in light of
the true scaled values.
arXiv link: http://arxiv.org/abs/2502.07692v1
Comment on "Generic machine learning inference on heterogeneous treatment effects in randomized experiments."
Chernozhukov, Demirer, Duflo, and Fernandez-Val (CDDF) for quantifying
uncertainty in heterogeneous treatment effect estimation. While SSRI
effectively accounts for randomness in data splitting, its computational cost
can be prohibitive when combined with complex machine learning (ML) models. We
present an alternative randomization inference (RI) approach that maintains
SSRI's generality without requiring repeated data splitting. By leveraging
cross-fitting and design-based inference, RI achieves valid confidence
intervals while significantly reducing computational burden. We compare the two
methods through simulation, demonstrating that RI retains statistical
efficiency while being more practical for large-scale applications.
arXiv link: http://arxiv.org/abs/2502.06758v1
Grouped fixed effects regularization for binary choice models
(Bonhomme et al., ECMTA 90(2):625-643, 2022) to binary choice models for
network and panel data. This approach discretizes unobserved heterogeneity via
k-means clustering and performs maximum likelihood estimation, reducing the
number of fixed effects in finite samples. This regularization helps analyze
small/sparse networks and rare events by mitigating complete separation, which
can lead to data loss. We focus on dynamic models with few state transitions
and network formation models for sparse networks. The effectiveness of this
method is demonstrated through simulations and real data applications.
arXiv link: http://arxiv.org/abs/2502.06446v1
Dynamic Pricing with Adversarially-Censored Demands
time period $t=1,2,\ldots, T$ is stochastic and dependent on the price.
However, a perishable inventory is imposed at the beginning of each time $t$,
censoring the potential demand if it exceeds the inventory level. To address
this problem, we introduce a pricing algorithm based on the optimistic
estimates of derivatives. We show that our algorithm achieves
$O(T)$ optimal regret even with adversarial inventory series.
Our findings advance the state-of-the-art in online decision-making problems
with censored feedback, offering a theoretically optimal solution against
adversarial observations.
arXiv link: http://arxiv.org/abs/2502.06168v1
Global Ease of Living Index: a machine learning framework for longitudinal analysis of major economies
disruptions such as the COVID-19 pandemic have impacted the cost of living and
quality of life. It is important to understand the long-term nature of the cost
of living and quality of life in major economies. A transparent and
comprehensive living index must include multiple dimensions of living
conditions. In this study, we present an approach to quantifying the quality of
life through the Global Ease of Living Index that combines various
socio-economic and infrastructural factors into a single composite score. Our
index utilises economic indicators that define living standards, which could
help in targeted interventions to improve specific areas. We present a machine
learning framework for addressing the problem of missing data for some of the
economic indicators for specific countries. We then curate and update the data
and use a dimensionality reduction approach (principal component analysis) to
create the Ease of Living Index for major economies since 1970. Our work
significantly adds to the literature by offering a practical tool for
policymakers to identify areas needing improvement, such as healthcare systems,
employment opportunities, and public safety. Our approach with open data and
code can be easily reproduced and applied to various contexts. This
transparency and accessibility make our work a valuable resource for ongoing
research and policy development in quality-of-life assessment.
arXiv link: http://arxiv.org/abs/2502.06866v2
Point-Identifying Semiparametric Sample Selection Models with No Excluded Variable
develops semiparametric selection models that achieve point identification
without relying on exclusion restrictions, an assumption long believed
necessary for identification in semiparametric selection models. Our
identification conditions require at least one continuously distributed
covariate and certain nonlinearity in the selection process. We propose a
two-step plug-in estimator that is root-n-consistent, asymptotically normal,
and computationally straightforward (readily available in statistical
software), allowing for heteroskedasticity. Our approach provides a middle
ground between Lee (2009)'s nonparametric bounds and Honor\'e and Hu (2020)'s
linear selection bounds, while ensuring point identification. Simulation
evidence confirms its excellent finite-sample performance. We apply our method
to estimate the racial and gender wage disparity using data from the US Current
Population Survey. Our estimates tend to lie outside the Honor\'e and Hu
bounds.
arXiv link: http://arxiv.org/abs/2502.05353v1
Estimating Parameters of Structural Models Using Neural Networks
provide the parameter estimate of a given (structural) econometric model, for
example, discrete choice or consumer search. Training examples consist of
datasets generated by the econometric model under a range of parameter values.
The neural net takes the moments of a dataset as input and tries to recognize
the parameter value underlying that dataset. Besides the point estimate, the
neural net can also output statistical accuracy. This neural net estimator
(NNE) tends to limited-information Bayesian posterior as the number of training
datasets increases. We apply NNE to a consumer search model. It gives more
accurate estimates at lighter computational costs than the prevailing approach.
NNE is also robust to redundant moment inputs. In general, NNE offers the most
benefits in applications where other estimation approaches require very heavy
simulation costs. We provide code at: https://nnehome.github.io.
arXiv link: http://arxiv.org/abs/2502.04945v1
A sliced Wasserstein and diffusion approach to random coefficient models
models. This estimator integrates the recently advanced sliced Wasserstein
distance with the nearest neighbor methods, both of which enhance computational
efficiency. We demonstrate that the proposed method is consistent in
approximating the true distribution. Moreover, our formulation naturally leads
to a diffusion process-based algorithm and is closely connected to treatment
effect distribution estimation -- both of which are of independent interest and
hold promise for broader applications.
arXiv link: http://arxiv.org/abs/2502.04654v2
Estimation of large approximate dynamic matrix factor models based on the EM algorithm and Kalman filtering
for the time series nature of the data by explicitly modelling the time
evolution of the factors. We study estimation of the model parameters based on
the Expectation Maximization (EM) algorithm, implemented jointly with the
Kalman smoother which gives estimates of the factors. We establish the
consistency of the estimated loadings and factor matrices as the sample size
$T$ and the matrix dimensions $p_1$ and $p_2$ diverge to infinity. We then
illustrate two immediate extensions of this approach to: (a) the case of
arbitrary patterns of missing data and (b) the presence of common stochastic
trends. The finite sample properties of the estimators are assessed through a
large simulation study and two applications on: (i) a financial dataset of
volatility proxies and (ii) a macroeconomic dataset covering the main euro area
countries.
arXiv link: http://arxiv.org/abs/2502.04112v2
Combining Clusters for the Approximate Randomization Test
randomization test proposed by Canay, Romano, and Shaikh (2017). Their test can
be used to conduct inference with a small number of clusters and imposes weak
requirements on the correlation structure. However, their test requires the
target parameter to be identified within each cluster. A leading example where
this requirement fails to hold is when a variable has no variation within
clusters. For instance, this happens in difference-in-differences designs
because the treatment variable equals zero in the control clusters. Under this
scenario, combining control and treated clusters can solve the identification
problem, and the test remains valid. However, there is an arbitrariness in how
the clusters are combined. In this paper, I develop computationally efficient
procedures to combine clusters when this identification requirement does not
hold. Clusters are combined to maximize local asymptotic power. The simulation
study and empirical application show that the procedures to combine clusters
perform well in various settings.
arXiv link: http://arxiv.org/abs/2502.03865v1
Misspecification-Robust Shrinkage and Selection for VAR Forecasts and IRFs
dimensionality. The posterior means define a class of shrinkage estimators,
indexed by hyperparameters that determine the relative weight on maximum
likelihood estimates and prior means. In a Bayesian setting, it is natural to
choose these hyperparameters by maximizing the marginal data density. However,
this is undesirable if the VAR is misspecified. In this paper, we derive
asymptotically unbiased estimates of the multi-step forecasting risk and the
impulse response estimation risk to determine hyperparameters in settings where
the VAR is (potentially) misspecified. The proposed criteria can be used to
jointly select the optimal shrinkage hyperparameter, VAR lag length, and to
choose among different types of multi-step-ahead predictors; or among IRF
estimates based on VARs and local projections. The selection approach is
illustrated in a Monte Carlo study and an empirical application.
arXiv link: http://arxiv.org/abs/2502.03693v1
Type 2 Tobit Sample Selection Models with Bayesian Additive Regression Trees
(TOBART-2). BART can produce accurate individual-specific treatment effect
estimates. However, in practice estimates are often biased by sample selection.
We extend the Type 2 Tobit sample selection model to account for nonlinearities
and model uncertainty by including sums of trees in both the selection and
outcome equations. A Dirichlet Process Mixture distribution for the error terms
allows for departure from the assumption of bivariate normally distributed
errors. Soft trees and a Dirichlet prior on splitting probabilities improve
modeling of smooth and sparse data generating processes. We include a
simulation study and an application to the RAND Health Insurance Experiment
data set.
arXiv link: http://arxiv.org/abs/2502.03600v1
Wald inference on varying coefficients
nonparametric inference theory for linear restrictions on varying coefficients
in a range of regression models allowing for cross-sectional or spatial
dependence. We provide a general central limit theorem that covers a broad
range of error spatial dependence structures, allows for a degree of
misspecification robustness via nonparametric spatial weights and permits
inference on both varying regression and spatial dependence parameters. Using
our method, we first uncover evidence of constant returns to scale in the
Chinese nonmetal mineral industry's production function, and then show that
Boston house prices respond nonlinearly to proximity to employment centers. A
simulation study confirms that our tests perform very well in finite samples.
arXiv link: http://arxiv.org/abs/2502.03084v2
Panel Data Estimation and Inference: Homogeneity versus Heterogeneity
for different magnitudes of cross-sectional dependence, along with time series
autocorrelation. This is achieved via high-dimensional moving average processes
of infinite order (HDMA($\infty$)). Our setup and investigation integrates and
enhances homogenous and heterogeneous panel data estimation and testing in a
unified way. To study HDMA($\infty$), we extend the Beveridge-Nelson
decomposition to a high-dimensional time series setting, and derive a complete
toolkit set. We exam homogeneity versus heterogeneity using Gaussian
approximation, a prevalent technique for establishing uniform inference. For
post-testing inference, we derive central limit theorems through Edgeworth
expansions for both homogenous and heterogeneous settings. Additionally, we
showcase the practical relevance of the established asymptotic theory by (1).
connecting our results with the literature on grouping structure analysis, (2).
examining a nonstationary panel data generating process, and (3). revisiting
the common correlated effects (CCE) estimators. Finally, we verify our
theoretical findings via extensive numerical studies using both simulated and
real datasets.
arXiv link: http://arxiv.org/abs/2502.03019v2
Kotlarski's lemma for dyadic models
two-way dyadic model $y_{ij}=c+\alpha_i+\eta_j+\varepsilon_{ij}$. To this end,
we extend the lemma of Kotlarski (1967), mimicking the arguments of Evdokimov
and White (2012). We allow the characteristic functions of the error components
to have real zeros, as long as they do not overlap with zeros of their first
derivatives.
arXiv link: http://arxiv.org/abs/2502.02734v1
Improving volatility forecasts of the Nikkei 225 stock index using a realized EGARCH model with realized and realized range-based volatilities
conditional heteroskedasticity (REGARCH) model to analyze the Nikkei 225 index
from 2010 to 2017, utilizing realized variance (RV) and realized range-based
volatility (RRV) as high-frequency measures of volatility. The findings show
that REGARCH models outperform standard GARCH family models in both in-sample
fitting and out-of-sample forecasting, driven by the dynamic information
embedded in high-frequency realized measures. Incorporating multiple realized
measures within a joint REGARCH framework further enhances model performance.
Notably, RRV demonstrates superior predictive power compared to RV, as
evidenced by improvements in forecast accuracy metrics. Moreover, the
forecasting results remain robust under both rolling-window and recursive
evaluation schemes.
arXiv link: http://arxiv.org/abs/2502.02695v2
Loss Functions for Inventory Control
function, the complementary loss function and the second-order loss function
for several probability distributions. These loss functions are important
functions in inventory optimization and other quantitative fields. For several
reasons, which will become apparent throughout this paper, the implementation
of these loss functions prefers the use of an analytic expression, only using
standard probability functions. However, complete and consistent references of
analytic expressions for these loss functions are lacking in literature. This
paper aims to close this gap and can serve as a reference for researchers,
software engineers and practitioners that are concerned with the optimization
of a quantitative system. This should lead directly to easily using different
probability distributions in quantitive models which is at the core of
optimization. Also, this paper serves as a broad introduction to loss functions
and their use in inventory control.
arXiv link: http://arxiv.org/abs/2502.05212v1
Estimating Network Models using Neural Networks
network formation but pose difficult estimation challenges due to their
intractable normalizing constant. Existing methods, such as MCMC-MLE, rely on
sequential simulation at every optimization step. We propose a neural network
approach that trains on a single, large set of parameter-simulation pairs to
learn the mapping from parameters to average network statistics. Once trained,
this map can be inverted, yielding a fast and parallelizable estimation method.
The procedure also accommodates extra network statistics to mitigate model
misspecification. Some simple illustrative examples show that the method
performs well in practice.
arXiv link: http://arxiv.org/abs/2502.01810v1
Comment on "Sequential validation of treatment heterogeneity" and "Comment on generic machine learning inference on heterogeneous treatment effects in randomized experiments"
gracious and insightful comments. We are particularly encouraged that both
pieces recognize the importance of the research agenda the lecture laid out,
which we see as critical for applied researchers. It is also great to see that
both underscore the potential of the basic approach we propose - targeting
summary features of the CATE after proxy estimation with sample splitting. We
are also happy that both papers push us (and the reader) to continue thinking
about the inference problem associated with sample splitting. We recognize that
our current paper is only scratching the surface of this interesting agenda.
Our proposal is certainly not the only option, and it is exciting that both
papers provide and assess alternatives. Hopefully, this will generate even more
work in this area.
arXiv link: http://arxiv.org/abs/2502.01548v2
Can We Validate Counterfactual Estimations in the Presence of General Network Interference?
influence outcomes of other units, challenging both causal effect estimation
and its validation. Classic validation approaches fail as outcomes are only
observable under one treatment scenario and exhibit complex correlation
patterns due to interference. To address these challenges, we introduce a new
framework enabling cross-validation for counterfactual estimation. At its core
is our distribution-preserving network bootstrap method -- a
theoretically-grounded approach inspired by approximate message passing. This
method creates multiple subpopulations while preserving the underlying
distribution of network effects. We extend recent causal message-passing
developments by incorporating heterogeneous unit-level characteristics and
varying local interactions, ensuring reliable finite-sample performance through
non-asymptotic analysis. We also develop and publicly release a comprehensive
benchmark toolbox with diverse experimental environments, from networks of
interacting AI agents to opinion formation in real-world communities and
ride-sharing applications. These environments provide known ground truth values
while maintaining realistic complexities, enabling systematic examination of
causal inference methods. Extensive evaluation across these environments
demonstrates our method's robustness to diverse forms of network interference.
Our work provides researchers with both a practical estimation framework and a
standardized platform for testing future methodological developments.
arXiv link: http://arxiv.org/abs/2502.01106v1
Online Generalized Method of Moments for Time Series
to analyse large-scale streaming data, which can be collected in perpetuity and
serially dependent. This motivates us to develop the online generalized method
of moments (OGMM), an explicitly updated estimation and inference framework in
the time series setting. The OGMM inherits many properties of offline GMM, such
as its broad applicability to many problems in econometrics and statistics,
natural accommodation for over-identification, and achievement of
semiparametric efficiency under temporal dependence. As an online method, the
key gain relative to offline GMM is the vast improvement in time complexity and
memory requirement.
Building on the OGMM framework, we propose improved versions of online
Sargan--Hansen and structural stability tests following recent work in
econometrics and statistics. Through Monte Carlo simulations, we observe
encouraging finite-sample performance in online instrumental variables
regression, online over-identifying restrictions test, online quantile
regression, and online anomaly detection. Interesting applications of OGMM to
stochastic volatility modelling and inertial sensor calibration are presented
to demonstrate the effectiveness of OGMM.
arXiv link: http://arxiv.org/abs/2502.00751v1
Serial-Dependence and Persistence Robust Inference in Predictive Regressions
of estimated parameters in predictive regressions. The approach features a new
family of test statistics that are robust to the degree of persistence of the
predictors. Importantly, the method accounts for serial correlation and
conditional heteroskedasticity without requiring any corrections or
adjustments. This is achieved through a mechanism embedded within the test
statistics that effectively decouples serial dependence present in the data.
The limiting null distributions of these test statistics are shown to follow a
chi-square distribution, and their asymptotic power under local alternatives is
derived. A comprehensive set of simulation experiments illustrates their finite
sample size and power properties.
arXiv link: http://arxiv.org/abs/2502.00475v1
Confidence intervals for intentionally biased estimators
estimator that is intentionally biased to reduce mean squared error. The first
CI simply uses an unbiased estimator's standard error; compared to centering at
the unbiased estimator, this CI has higher coverage probability for confidence
levels above 91.7%, even if the biased and unbiased estimators have equal mean
squared error. The second CI trades some of this "excess" coverage for shorter
length. The third CI is centered at a convex combination of the two estimators
to further reduce length. Practically, these CIs apply broadly and are simple
to compute.
arXiv link: http://arxiv.org/abs/2502.00450v1
Fixed-Population Causal Inference for Models of Equilibrium
interference in unit-specific (endogenous) outcomes do not usually produce a
reduced-form representation where outcomes depend on other units' treatment
status only at a short network distance, or only through a known exposure
mapping. This remains true if the structural mechanism depends on outcomes of
peers only at a short network distance, or through a known exposure mapping. In
this paper, we first define causal estimands that are identified and estimable
from a single experiment on the network under minimal assumptions on the
structure of interference, and which represent average partial causal responses
which generally vary with other global features of the realized assignment.
Under a fixed-population, design-based approach, we show unbiasedness and
consistency for inverse-probability weighting (IPW) estimators for those causal
parameters from a randomized experiment on a single network. We also analyze
more closely the case of marginal interventions in a model of equilibrium with
smooth response functions where we can recover LATE-type weighted averages of
derivatives of those response functions. Under additional structural
assumptions, these “agnostic" causal estimands can be combined to recover
model parameters, but also retain their less restrictive causal interpretation.
arXiv link: http://arxiv.org/abs/2501.19394v4
PUATE: Efficient Average Treatment Effect Estimation from Treated (Positive) and Unlabeled Units
in expected outcomes between treatment and control groups, is a central topic
in causal inference. This study develops semiparametric efficient estimators
for ATE in a setting where only a treatment group and an unlabeled group,
consisting of units whose treatment status is unknown, are observed. This
scenario constitutes a variant of learning from positive and unlabeled data (PU
learning) and can be viewed as a special case of ATE estimation with missing
data. For this setting, we derive the semiparametric efficiency bounds, which
characterize the lowest achievable asymptotic variance for regular estimators.
We then construct semiparametric efficient ATE estimators that attain these
bounds. Our results contribute to the literature on causal inference with
missing data and weakly supervised learning.
arXiv link: http://arxiv.org/abs/2501.19345v2
Untestability of Average Slutsky Symmetry
conditions for the rationality of demand functions. While the empirical
implications of Slutsky negative semidefiniteness in repeated cross-sectional
demand data are well understood, the empirical content of Slutsky symmetry
remains largely unexplored. This paper takes an important first step toward
addressing this gap. We demonstrate that the average Slutsky matrix is not
identified and that its identified set always contains a symmetric matrix. A
key implication of our findings is that the symmetry of the average Slutsky
matrix is untestable, and consequently, individual Slutsky symmetry cannot be
tested using the average Slutsky matrix.
arXiv link: http://arxiv.org/abs/2501.18923v1
Model-Adaptive Approach to Dynamic Discrete Choice Models with Large State Spaces
with large state spaces pose computational difficulties. This paper develops a
novel model-adaptive approach to solve the linear system of fixed point
equations of the policy valuation operator. We propose a model-adaptive sieve
space, constructed by iteratively augmenting the space with the residual from
the previous iteration. We show both theoretically and numerically that
model-adaptive sieves dramatically improve performance. In particular, the
approximation error decays at a superlinear rate in the sieve dimension, unlike
a linear rate achieved using conventional methods. Our method works for both
conditional choice probability estimators and full-solution estimators with
policy iteration. We apply the method to analyze consumer demand for laundry
detergent using Kantar's Worldpanel Take Home data. On average, our method is
51.5% faster than conventional methods in solving the dynamic programming
problem, making the Bayesian MCMC estimator computationally feasible.
arXiv link: http://arxiv.org/abs/2501.18746v2
IV Estimation of Heterogeneous Spatial Dynamic Panel Models with Interactive Effects
spatial dynamic panel data models with interactive effects, under large N and T
asymptotics. Unlike existing approaches that typically impose slope-parameter
homogeneity, MGIV accommodates cross-sectional heterogeneity in slope
coefficients. The proposed estimator is linear, making it computationally
efficient and robust. Furthermore, it avoids the incidental parameters problem,
enabling asymptotically valid inferences without requiring bias correction. The
Monte Carlo experiments indicate strong finite-sample performance of the MGIV
estimator across various sample sizes and parameter configurations. The
practical utility of the estimator is illustrated through an application to
regional economic growth in Europe. By explicitly incorporating heterogeneity,
our approach provides fresh insights into the determinants of regional growth,
underscoring the critical roles of spatial and temporal dependencies.
arXiv link: http://arxiv.org/abs/2501.18467v1
Universal Inference for Incomplete Discrete Choice Models
paper develops a tractable inference method with finite-sample validity for
such models. The proposed procedure uses a robust version of the universal
inference framework by Wasserman et al. (2020) and avoids using moment
selection tuning parameters, resampling, or simulations. The method is designed
for constructing confidence intervals for counterfactual objects and other
functionals of the underlying parameter. It can be used in applications that
involve model incompleteness, discrete and continuous covariates, and
parameters containing nuisance components.
arXiv link: http://arxiv.org/abs/2501.17973v1
Uniform Confidence Band for Marginal Treatment Effect Function
the marginal treatment effect (MTE) function. The shape of the MTE function
offers insight into how the unobserved propensity to receive treatment is
related to the treatment effect. Our approach visualizes the statistical
uncertainty of an estimated function, facilitating inferences about the
function's shape. The proposed method is computationally inexpensive and
requires only minimal information: sample size, standard errors, kernel
function, and bandwidth. This minimal data requirement enables applications to
both new analyses and published results without access to original data. We
derive a Gaussian approximation for a local quadratic estimator and consider
the approximation of the distribution of its supremum in polynomial order.
Monte Carlo simulations demonstrate that our bands provide the desired coverage
and are less conservative than those based on the Gumbel approximation. An
empirical illustration regarding the returns to education is included.
arXiv link: http://arxiv.org/abs/2501.17455v2
Demand Analysis under Price Rigidity and Endogenous Assortment: An Application to China's Tobacco Industry
monopolistic seller responds by adjusting product assortments, which remain
unobserved by the analyst. We develop and estimate a logit demand model that
incorporates assortment discrimination and nominal price rigidity. We find that
consumers are significantly more responsive to price changes than conventional
models predict. Simulated tax increases reveal that neglecting the role of
endogenous assortments results in underestimations of the decline in
higher-tier product sales, incorrect directional predictions of lower-tier
product sales, and overestimation of tax revenue by more than 50%. Finally, we
extend our methodology to settings with competition and random coefficient
models.
arXiv link: http://arxiv.org/abs/2501.17251v1
Why is the estimation of metaorder impact with public market data so challenging?
is a very important topic in finance. However, using models of price and trade
based on public market data provide average price trajectories which are
qualitatively different from what is observed during real metaorder executions:
the price increases linearly, rather than in a concave way, during the
execution and the amount of reversion after its end is very limited. We claim
that this is a generic phenomenon due to the fact that even sophisticated
statistical models are unable to correctly describe the origin of the
autocorrelation of the order flow. We propose a modified Transient Impact Model
which provides more realistic trajectories by assuming that only a fraction of
the metaorder trading triggers market order flow. Interestingly, in our model
there is a critical condition on the kernels of the price and order flow
equations in which market impact becomes permanent.
arXiv link: http://arxiv.org/abs/2501.17096v1
Bayesian Analyses of Structural Vector Autoregressions with Sign, Zero, and Narrative Restrictions Using the R Package bsvarSIGNs
Bayesian analysis of Structural Vector Autoregressions identified by sign,
zero, and narrative restrictions. It offers fast and efficient estimation
thanks to the deployment of frontier econometric and numerical techniques and
algorithms written in C++. The core model is based on a flexible Vector
Autoregression with estimated hyper-parameters of the Minnesota prior and the
dummy observation priors. The structural model can be identified by sign, zero,
and narrative restrictions, including a novel solution, making it possible to
use the three types of restrictions at once. The package facilitates predictive
and structural analyses using impulse responses, forecast error variance and
historical decompositions, forecasting and conditional forecasting, as well as
analyses of structural shocks and fitted values. All this is complemented by
colourful plots, user-friendly summary functions, and comprehensive
documentation. The package was granted the Di Cook Open-Source Statistical
Software Award by the Statistical Society of Australia in 2024.
arXiv link: http://arxiv.org/abs/2501.16711v1
Copyright and Competition: Estimating Supply and Demand with Unstructured Data
industries in the face of cost-reducing technologies such as generative
artificial intelligence. Creative products often feature unstructured
attributes (e.g., images and text) that are complex and high-dimensional. To
address this challenge, we study a stylized design product -- fonts -- using
data from the world's largest font marketplace. We construct neural network
embeddings to quantify unstructured attributes and measure visual similarity in
a manner consistent with human perception. Spatial regression and event-study
analyses demonstrate that competition is local in the visual characteristics
space. Building on this evidence, we develop a structural model of supply and
demand that incorporates embeddings and captures product positioning under
copyright-based similarity constraints. Our estimates reveal consumers'
heterogeneous design preferences and producers' cost-effective mimicry
advantages. Counterfactual analyses show that copyright protection can raise
consumer welfare by encouraging product relocation, and that the optimal policy
depends on the interaction between copyright and cost-reducing technologies.
arXiv link: http://arxiv.org/abs/2501.16120v2
Advancing Portfolio Optimization: Adaptive Minimum-Variance Portfolios and Minimum Risk Rate Frameworks
and the Adaptive Minimum-Risk Rate (AMRR) metric, innovative tools designed to
optimize portfolios dynamically in volatile and nonstationary financial
markets. Unlike traditional minimum-variance approaches, the AMVP framework
incorporates real-time adaptability through advanced econometric models,
including ARFIMA-FIGARCH processes and non-Gaussian innovations. Empirical
applications on cryptocurrency and equity markets demonstrate the proposed
framework's superior performance in risk reduction and portfolio stability,
particularly during periods of structural market breaks and heightened
volatility. The findings highlight the practical implications of using the AMVP
and AMRR methodologies to address modern investment challenges, offering
actionable insights for portfolio managers navigating uncertain and rapidly
changing market conditions.
arXiv link: http://arxiv.org/abs/2501.15793v1
Universal Factor Models
loadings that are robust to weak factors in a large $N$ and large $T$ setting.
Our framework, by simultaneously considering all quantile levels of the outcome
variable, induces standard mean and quantile factor models, but the factors can
have an arbitrarily weak influence on the outcome's mean or quantile at most
quantile levels. Our method estimates the factor space at the $N$-rate
without requiring the knowledge of weak factors' presence or strength, and
achieves $N$- and $T$-asymptotic normality for the factors and
loadings based on a novel sample splitting approach that handles incidental
nuisance parameters. We also develop a weak-factor-robust estimator of the
number of factors and consistent selectors of factors of any tolerated level of
influence on the outcome's mean or quantiles. Monte Carlo simulations
demonstrate the effectiveness of our method.
arXiv link: http://arxiv.org/abs/2501.15761v2
Scale-Insensitive Neural Network Significance Tests
significance testing, substantially generalizing existing approaches through
three key innovations. First, we replace metric entropy calculations with
Rademacher complexity bounds, enabling the analysis of neural networks without
requiring bounded weights or specific architectural constraints. Second, we
weaken the regularity conditions on the target function to require only Sobolev
space membership $H^s([-1,1]^d)$ with $s > d/2$, significantly relaxing
previous smoothness assumptions while maintaining optimal approximation rates.
Third, we introduce a modified sieve space construction based on moment bounds
rather than weight constraints, providing a more natural theoretical framework
for modern deep learning practices. Our approach achieves these generalizations
while preserving optimal convergence rates and establishing valid asymptotic
distributions for test statistics. The technical foundation combines
localization theory, sharp concentration inequalities, and scale-insensitive
complexity measures to handle unbounded weights and general Lipschitz
activation functions. This framework better aligns theoretical guarantees with
contemporary deep learning practice while maintaining mathematical rigor.
arXiv link: http://arxiv.org/abs/2501.15753v3
Simple Inference on a Simplex-Valued Weight
weight which is identified as a solution to an optimization problem. Examples
include synthetic control methods with group-level weights and various methods
of model averaging and forecast combination. The simplex constraint on the
weight poses a challenge in statistical inference due to the constraint
potentially binding. In this paper, we propose a simple method of constructing
a confidence set for the weight and prove that the method is asymptotically
uniformly valid. The procedure does not require tuning parameters or
simulations to compute critical values. The confidence set accommodates both
the cases of point-identification or set-identification of the weight. We
illustrate the method with an empirical example.
arXiv link: http://arxiv.org/abs/2501.15692v1
Philip G. Wright, directed acyclic graphs, and instrumental variables
of this book, Philip Wright made several fundamental contributions to causal
inference. He introduced a structural equation model of supply and demand,
established the identification of supply and demand elasticities via the method
of moments and directed acyclical graphs, developed empirical methods for
estimating demand elasticities using weather conditions as instruments, and
proposed methods for counterfactual analysis of the welfare effect of imposing
tariffs and taxes. Moreover, he took all of these methods to data. These ideas
were far ahead, and much more profound than, any contemporary theoretical and
empirical developments on causal inference in statistics or econometrics. This
editorial aims to present P. Wright's work in a more modern framework, in a
lecture note format that can be useful for teaching and linking to contemporary
research.
arXiv link: http://arxiv.org/abs/2501.16395v2
A General Approach to Relaxing Unconfoundedness
assumption. This class includes several previous approaches as special cases,
including the marginal sensitivity model of Tan (2006). This class therefore
allows us to precisely compare and contrast these previously disparate
relaxations. We use this class to derive a variety of new identification
results which can be used to assess sensitivity to unconfoundedness. In
particular, the prior literature focuses on average parameters, like the
average treatment effect (ATE). We move beyond averages by providing sharp
bounds for a large class of parameters, including both the quantile treatment
effect (QTE) and the distribution of treatment effects (DTE), results which
were previously unknown even for the marginal sensitivity model.
arXiv link: http://arxiv.org/abs/2501.15400v1
Influence Function: Local Robustness and Efficiency
concept of functional derivatives. The relative simplicity of our direct method
is demonstrated through well-known examples. Using influence functions as a key
device, we examine the connection and difference between local robustness and
efficiency in both joint and sequential identification/estimation procedures.
We show that the joint procedure is associated with efficiency, while the
sequential procedure is linked to local robustness. Furthermore, we provide
conditions that are theoretically verifiable and empirically testable on when
efficient and locally robust estimation for the parameter of interest in a
semiparametric model can be achieved simultaneously. In addition, we present
straightforward conditions for an adaptive procedure in the presence of
nuisance parameters.
arXiv link: http://arxiv.org/abs/2501.15307v1
Multiscale risk spillovers and external driving factors: Evidence from the global futures and spot markets of staple foods
international staple food markets are increasingly exposed to complex risks,
including intensified risk contagion and escalating external uncertainties.
This paper systematically investigates risk spillovers in global staple food
markets and explores the key determinants of these spillover effects, combining
innovative decomposition-reconstruction techniques, risk connectedness
analysis, and random forest models. The findings reveal that short-term
components exhibit the highest volatility, with futures components generally
more volatile than spot components. Further analysis identifies two main risk
transmission patterns, namely cross-grain and cross-timescale transmission, and
clarifies the distinct roles of each component in various net risk spillover
networks. Additionally, price drivers, external uncertainties, and core
supply-demand indicators significantly influence these spillover effects, with
heterogeneous importance of varying factors in explaining different risk
spillovers. This study provides valuable insights into the risk dynamics of
staple food markets, offers evidence-based guidance for policymakers and market
participants to enhance risk warning and mitigation efforts, and supports the
stabilization of international food markets and the safeguarding of global food
security.
arXiv link: http://arxiv.org/abs/2501.15173v1
Quantitative Theory of Money or Prices? A Historical, Theoretical, and Econometric Analysis
implications analyzing quarterly data from United States (1959-2022), Canada
(1961-2022), United Kingdom (1986-2022), and Brazil (1996-2022). The
historical, logical, and econometric consistency of the logical core of the two
main theories of money is analyzed using objective bayesian and frequentist
machine learning models, bayesian regularized artificial neural networks, and
ensemble learning. It is concluded that money is not neutral at any time
horizon and that, despite money is ultimately subordinated to prices, there is
a reciprocal influence over time between money and prices which constitute a
complex system. Non-neutrality is transmitted through aggregate demand and is
based on the exchange value of money as a monetary unit.
arXiv link: http://arxiv.org/abs/2501.14623v1
Triple Instrumented Difference-in-Differences
(DID-IV). In this design, a triple Wald-DID estimand, which divides the
difference-in-difference-in-differences (DDD) estimand of the outcome by the
DDD estimand of the treatment, captures the local average treatment effect on
the treated. The identifying assumptions mainly comprise a monotonicity
assumption, and the common acceleration assumptions in the treatment and the
outcome. We extend the canonical triple DID-IV design to staggered instrument
cases. We also describe the estimation and inference in this design in
practice.
arXiv link: http://arxiv.org/abs/2501.14405v1
Detecting Sparse Cointegration
settings, focusing on sparse relationships. First, we use the adaptive LASSO to
identify the small subset of integrated covariates driving the equilibrium
relationship with a target series, ensuring model-selection consistency.
Second, we adopt an information-theoretic model choice criterion to distinguish
between stationarity and nonstationarity in the resulting residuals, avoiding
dependence on asymptotic distributional assumptions. Monte Carlo experiments
confirm robust finite-sample performance, even under endogeneity and serial
correlation.
arXiv link: http://arxiv.org/abs/2501.13839v1
A Non-Parametric Approach to Heterogeneity Analysis
in consumer choices. By repeatedly sampling individual observations and
partitioning agents into groups consistent with the Generalized Axiom of
Revealed Preferences (GARP), we construct a similarity matrix capturing latent
preference structures. Under mild assumptions, this matrix consistently and
asymptotically normally estimates the probability that any pair of agents share
a common utility function. Leveraging this, we develop hypothesis tests to
assess whether demographic characteristics systematically explain unobserved
heterogeneity. Simulations confirm the test's validity, and we apply the method
to a standard grocery expenditure dataset.
arXiv link: http://arxiv.org/abs/2501.13721v4
Generalizability with ignorance in mind: learning what we do (not) know for archetypes discovery
identifying for whom the program has the largest effects (heterogeneity) and
ii) determining whether those patterns of treatment effects have predictive
power across environments (generalizability). We develop a framework to learn
when and how to partition observations into groups of individual and
environmental characterstics within which treatment effects are predictively
stable, and when instead extrapolation is unwarranted and further evidence is
needed. Our procedure determines in which contexts effects are generalizable
and when, instead, researchers should admit ignorance and collect more data. We
provide a decision-theoretic foundation, derive finite-sample regret
guarantees, and establish asymptotic inference results. We illustrate the
benefits of our approach by reanalyzing a multifaceted anti-poverty program
across six countries.
arXiv link: http://arxiv.org/abs/2501.13355v2
Continuity of the Distribution Function of the argmax of a Gaussian Process
distribution is non-Gaussian, yet characterizable as the argmax of a Gaussian
process. This paper presents high-level sufficient conditions under which such
asymptotic distributions admit a continuous distribution function. The
plausibility of the sufficient conditions is demonstrated by verifying them in
three prominent examples, namely maximum score estimation, empirical risk
minimization, and threshold regression estimation. In turn, the continuity
result buttresses several recently proposed inference procedures whose validity
seems to require a result of the kind established herein. A notable feature of
the high-level assumptions is that one of them is designed to enable us to
employ the celebrated Cameron-Martin theorem. In a leading special case, the
assumption in question is demonstrably weak and appears to be close to minimal.
arXiv link: http://arxiv.org/abs/2501.13265v1
An Adaptive Moving Average for Macroeconomic Monitoring
particularly for tracking noisy series such as inflation. The choice of the
look-back window is crucial. Too long of a moving average is not timely enough
when faced with rapidly evolving economic conditions. Too narrow averages are
noisy, limiting signal extraction capabilities. As is well known, this is a
bias-variance trade-off. However, it is a time-varying one: the optimal size of
the look-back window depends on current macroeconomic conditions. In this
paper, we introduce a simple adaptive moving average estimator based on a
Random Forest using as sole predictor a time trend. Then, we compare the
narratives inferred from the new estimator to those derived from common
alternatives across series such as headline inflation, core inflation, and real
activity indicators. Notably, we find that this simple tool provides a
different account of the post-pandemic inflation acceleration and subsequent
deceleration.
arXiv link: http://arxiv.org/abs/2501.13222v1
Bias Analysis of Experiments for Multi-Item Multi-Period Inventory Control Policies
interventions but are underutilized in the area of inventory management. This
study addresses this gap by analyzing A/B testing strategies in multi-item,
multi-period inventory systems with lost sales and capacity constraints. We
examine switchback experiments, item-level randomization, pairwise
randomization, and staggered rollouts, analyzing their biases theoretically and
comparing them through numerical experiments. Our findings provide actionable
guidance for selecting experimental designs across various contexts in
inventory management.
arXiv link: http://arxiv.org/abs/2501.11996v1
Estimation of Linear models from Coarsened Observations Estimation of Linear models Estimation from Coarsened Observations A Method of Moments Approach
interest is not exactly observed but only known to be in a specific ordinal
category has become important. In Psychometrics such variables are analysed
under the heading of item response models (IRM). In Econometrics, subjective
well-being (SWB) and self-assessed health (SAH) studies, and in marketing
research, Ordered Probit, Ordered Logit, and Interval Regression models are
common research platforms. To emphasize that the problem is not specific to a
specific discipline we will use the neutral term coarsened observation. For
single-equation models estimation of the latent linear model by Maximum
Likelihood (ML) is routine. But, for higher -dimensional multivariate models it
is computationally cumbersome as estimation requires the evaluation of
multivariate normal distribution functions on a large scale. Our proposed
alternative estimation method, based on the Generalized Method of Moments
(GMM), circumvents this multivariate integration problem. The method is based
on the assumed zero correlations between explanatory variables and generalized
residuals. This is more general than ML but coincides with ML if the error
distribution is multivariate normal. It can be implemented by repeated
application of standard techniques. GMM provides a simpler and faster approach
than the usual ML approach. It is applicable to multiple -equation models with
-dimensional error correlation matrices and response categories for the
equation. It also yields a simple method to estimate polyserial and polychoric
correlations. Comparison of our method with the outcomes of the Stata ML
procedure cmp yields estimates that are not statistically different, while
estimation by our method requires only a fraction of the computing time.
arXiv link: http://arxiv.org/abs/2501.10726v1
Recovering Unobserved Network Links from Aggregated Relational Data: Discussions on Bayesian Latent Surface Modeling and Penalized Regression
and computer science. Aggregated Relational Data (ARD) provides a way to
capture network structures using partial data. This article compares two main
frameworks for recovering network links from ARD: Bayesian Latent Surface
Modeling (BLSM) and Frequentist Penalized Regression (FPR). Using simulation
studies and real-world applications, we evaluate their theoretical properties,
computational efficiency, and practical utility in domains like financial risk
assessment and epidemiology. Key findings emphasize the importance of trait
design, privacy considerations, and hybrid modeling approaches to improve
scalability and robustness.
arXiv link: http://arxiv.org/abs/2501.10675v2
Prediction Sets and Conformal Inference with Interval Outcomes
miscoverage level $\alpha$ is a set of values for $Y$ that contains a randomly
drawn $Y$ with probability $1 - \alpha$, where $\alpha \in (0,1)$. Among all
prediction sets that satisfy this coverage property, the oracle prediction set
is the one with the smallest volume. This paper provides estimation methods of
such prediction sets given observed conditioning covariates when $Y$ is
censored or measured in intervals. We first characterise the
oracle prediction set under interval censoring and develop a consistent
estimator for the shortest prediction {\it interval} that satisfies this
coverage property.These consistency results are extended to accommodate cases
where the prediction set consists of multiple disjoint intervals. We use
conformal inference to construct a prediction set that achieves finite-sample
validity under censoring and maintains consistency as sample size increases,
using a conformity score function designed for interval data. The procedure
accommodates the prediction uncertainty that is irreducible (due to the
stochastic nature of outcomes), the modelling uncertainty due to partial
identification and also sampling uncertainty that gets reduced as samples get
larger. We conduct a set of Monte Carlo simulations and an application to data
from the Current Population Survey. The results highlight the robustness and
efficiency of the proposed methods.
arXiv link: http://arxiv.org/abs/2501.10117v3
Convergence Rates of GMM Estimators with Nonsmooth Moments under Misspecification
underlying moment condition model is correctly specified. Hong and Li (2023,
Econometric Theory) showed that GMM estimators with nonsmooth
(non-directionally differentiable) moment functions are at best
$n^{1/3}$-consistent under misspecification. Through simulations, we verify the
slower convergence rate of GMM estimators in such cases. For the two-step GMM
estimator with an estimated weight matrix, our results align with theory.
However, for the one-step GMM estimator with the identity weight matrix, the
convergence rate remains $n$, even under severe misspecification.
arXiv link: http://arxiv.org/abs/2501.09540v1
Recovering latent linkage structures and spillover effects with structural breaks in panel data models
in panel data. We consider panel models where a unit's outcome depends not only
on its own characteristics (private effects) but also on the characteristics of
other units (spillover effects). The linkage of units is allowed to be latent
and may shift at an unknown breakpoint. We propose a novel procedure to
estimate the breakpoint, linkage structure, spillover and private effects. We
address the high-dimensionality of spillover effect parameters using penalized
estimation, and estimate the breakpoint with refinement. We establish the
super-consistency of the breakpoint estimator, ensuring that inferences about
other parameters can proceed as if the breakpoint were known. The private
effect parameters are estimated using a double machine learning method. The
proposed method is applied to estimate the cross-country R&D spillovers, and we
find that the R&D spillovers become sparser after the financial crisis.
arXiv link: http://arxiv.org/abs/2501.09517v1
Semiparametrics via parametrics and contiguity
task. If one approximates the infinite dimensional part of the semiparametric
model by a parametric function, one obtains a parametric model that is in some
sense close to the semiparametric model and inference may proceed by the method
of maximum likelihood. Under regularity conditions, the ensuing maximum
likelihood estimator is asymptotically normal and efficient in the
approximating parametric model. Thus one obtains a sequence of asymptotically
normal and efficient estimators in a sequence of growing parametric models that
approximate the semiparametric model and, intuitively, the limiting
'semiparametric' estimator should be asymptotically normal and efficient as
well. In this paper we make this intuition rigorous: we move much of the
semiparametric analysis back into classical parametric terrain, and then
translate our parametric results back to the semiparametric world by way of
contiguity. Our approach departs from the conventional sieve literature by
being more specific about the approximating parametric models, by working not
only with but also under these when treating the parametric models, and by
taking full advantage of the mutual contiguity that we require between the
parametric and semiparametric models. We illustrate our theory with two
canonical examples of semiparametric models, namely the partially linear
regression model and the Cox regression model. An upshot of our theory is a
new, relatively simple, and rather parametric proof of the efficiency of the
Cox partial likelihood estimator.
arXiv link: http://arxiv.org/abs/2501.09483v2
The Impact of Digitalisation and Sustainability on Inclusiveness: Inclusive Growth Determinants
military conflicts. This study investigates the main determinants of
inclusiveness at the European level. A multi-method approach is used, with
Principal Component Analysis (PCA) applied to create the Inclusiveness Index
and Generalised Method of Moments (GMM) analysis used to investigate the
determinants of inclusiveness. The data comprises a range of 22 years, from
2000 to 2021, for 32 European countries. The determinants of inclusiveness and
their effects were identified. First, economic growth, industrial upgrading,
electricity consumption, digitalisation, and the quantitative aspect of
governance, all have a positive impact on inclusive growth in Europe. Second,
the level of CO2 emissions and inflation have a negative impact on
inclusiveness. Tomorrow's inclusive and sustainable growth must include
investments in renewable energy, digital infrastructure, inequality policies,
sustainable governance, human capital, and inflation management. These findings
can help decision makers design inclusive growth policies.
arXiv link: http://arxiv.org/abs/2501.07880v1
Bridging Root-$n$ and Non-standard Asymptotics: Adaptive Inference in M-Estimation
the solution of population-level optimization, commonly referred to as
M-estimation. Statistical inference for M-estimation poses significant
challenges due to the non-standard limiting behaviors of the corresponding
estimator, which arise in settings with increasing dimension of parameters,
non-smooth objectives, or constraints. We propose a simple and unified method
that guarantees validity in both regular and irregular cases. Moreover, we
provide a comprehensive width analysis of the proposed confidence set, showing
that the convergence rate of the diameter is adaptive to the unknown degree of
instance-specific regularity. We apply the proposed method to several
high-dimensional and irregular statistical problems.
arXiv link: http://arxiv.org/abs/2501.07772v3
disco: Distributional Synthetic Controls
of policy changes in settings with observational data. Often, researchers aim
to estimate the causal impact of policy interventions on a treated unit at an
aggregate level while also possessing data at a finer granularity. In this
article, we introduce the new disco command, which implements the
Distributional Synthetic Controls method introduced in Gunsilius (2023). This
command allows researchers to construct entire synthetic distributions for the
treated unit based on an optimally weighted average of the distributions of the
control units. Several aggregation schemes are provided to facilitate clear
reporting of the distributional effects of the treatment. The package offers
both quantile-based and CDF-based approaches, comprehensive inference
procedures via bootstrap and permutation methods, and visualization
capabilities. We empirically illustrate the use of the package by replicating
the results in Van Dijcke et al. (2024).
arXiv link: http://arxiv.org/abs/2501.07550v2
Estimating Sequential Search Models Based on a Partial Ranking Representation
increasingly available, opening up new possibilities for empirical research.
Sequential search models offer a structured approach for analyzing such data,
but their estimation remains difficult. This is because consumers make optimal
decisions based on private information revealed in search, which is not
observed in typical data. As a result, the model's likelihood function involves
high-dimensional integrals that require intensive simulation. This paper
introduces a new representation that shows a consumer's optimal search
decision-making can be recast as a partial ranking over all actions available
throughout the consumer's search process. This reformulation yields the same
choice probabilities as the original model but leads to a simpler likelihood
function that relies less on simulation. Based on this insight, we provide
identification arguments and propose a modified GHK-style simulator that
improves both estimation performances and ease of implementation. The proposed
approach also generalizes to a wide range of model variants, including those
with incomplete search data and structural extensions such as search with
product discovery. It enables a tractable and unified estimation strategy
across different settings in sequential search models, offering both a new
perspective on understanding sequential search and a practical tool for its
application.
arXiv link: http://arxiv.org/abs/2501.07514v3
Forecasting for monetary policy
highlighted in the Bernanke (2024) review: the challenges in economic
forecasting, the conditional nature of central bank forecasts, and the
importance of forecast evaluation. In addition, a formal evaluation of the Bank
of England's inflation forecasts indicates that, despite the large forecast
errors in recent years, they were still accurate relative to common benchmarks.
arXiv link: http://arxiv.org/abs/2501.07386v1
Social and Genetic Ties Drive Skewed Cross-Border Media Coverage of Disasters
worldwide. Media coverage of these events may be vital to generate empathy and
mobilize global populations to address the common threat posed by climate
change. Using a dataset of 466 news sources from 123 countries, covering 135
million news articles since 2016, we apply an event study framework to measure
cross-border media activity following natural disasters. Our results shows that
while media attention rises after disasters, it is heavily skewed towards
certain events, notably earthquakes, accidents, and wildfires. In contrast,
climatologically salient events such as floods, droughts, or extreme
temperatures receive less coverage. This cross-border disaster reporting is
strongly related to the number of deaths associated with the event, especially
when the affected populations share strong social ties or genetic similarities
with those in the reporting country. Achieving more balanced media coverage
across different types of natural disasters may be essential to counteract
skewed perceptions. Further, fostering closer social connections between
countries may enhance empathy and mobilize the resources necessary to confront
the global threat of climate change.
arXiv link: http://arxiv.org/abs/2501.07615v1
Doubly Robust Inference on Causal Derivative Effects for Continuous Treatments
focus on estimating the mean potential outcome function, commonly known as the
dose-response curve. However, it is often not the dose-response curve but its
derivative function that signals the treatment effect. In this paper, we
investigate nonparametric inference on the derivative of the dose-response
curve with and without the positivity condition. Under the positivity and other
regularity conditions, we propose a doubly robust (DR) inference method for
estimating the derivative of the dose-response curve using kernel smoothing.
When the positivity condition is violated, we demonstrate the inconsistency of
conventional inverse probability weighting (IPW) and DR estimators, and
introduce novel bias-corrected IPW and DR estimators. In all settings, our DR
estimator achieves asymptotic normality at the standard nonparametric rate of
convergence with nonparametric efficiency guarantees. Additionally, our
approach reveals an interesting connection to nonparametric support and level
set estimation problems. Finally, we demonstrate the applicability of our
proposed estimators through simulations and a case study of evaluating a job
training program.
arXiv link: http://arxiv.org/abs/2501.06969v2
Identification and Estimation of Simultaneous Equation Models Using Higher-Order Cumulant Restrictions
longstanding challenge. Recent work exploits information in higher-order
moments of non-Gaussian data. In this literature, the structural errors are
typically assumed to be uncorrelated so that, after standardizing the
covariance matrix of the observables (whitening), the structural parameter
matrix becomes orthogonal -- a device that underpins many identification proofs
but can be restrictive in econometric applications. We show that neither zero
covariance nor whitening is necessary. For any order $h>2$, a simple
diagonality condition on the $h$th-order cumulants alone identifies the
structural parameter matrix -- up to unknown scaling and permutation -- as the
solution to an eigenvector problem; no restrictions on cumulants of other
orders are required. This general, single-order result enlarges the class of
models covered by our framework and yields a sample-analogue estimator that is
$n$-consistent, asymptotically normal, and easy to compute. Furthermore,
when uncorrelatedness is intrinsic -- as in vector autoregressive (VAR) models
-- our framework provides a transparent overidentification test. Monte Carlo
experiments show favorable finite-sample performance, and two applications --
"Returns to Schooling" and "Uncertainty and the Business Cycle" -- demonstrate
its practical value.
arXiv link: http://arxiv.org/abs/2501.06777v2
The Causal Impact of Dean's List Recognition on Academic Performance: Evidence from a Regression Discontinuity Design
positive education incentive, on future student performance using a regression
discontinuity design. The results suggest that for students with low prior
academic performance and who are native English speakers, there is a positive
impact of being on the Dean's List on the probability of getting onto the
Dean's List in the following year. However, being on the Dean's List does not
appear to have a statistically significant effect on subsequent GPA, total
credits taken, dropout rates, or the probability of graduating within four
years. These findings suggest that a place on the Dean's List may not be a
strong motivator for students to improve their academic performance and achieve
better outcomes.
arXiv link: http://arxiv.org/abs/2501.09763v1
Optimizing Financial Data Analysis: A Comparative Study of Preprocessing Techniques for Regression Modeling of Apple Inc.'s Net Income and Stock Prices
datasets of Apple Inc., encompassing quarterly income and daily stock prices,
spanning from March 31, 2009, to December 31, 2023. Leveraging 60 observations
for quarterly income and 3774 observations for daily stock prices, sourced from
Macrotrends and Yahoo Finance respectively, the study outlines five distinct
datasets crafted through varied preprocessing techniques. Through detailed
explanations of aggregation, interpolation (linear, polynomial, and cubic
spline) and lagged variables methods, the study elucidates the steps taken to
transform raw data into analytically rich datasets. Subsequently, the article
delves into regression analysis, aiming to decipher which of the five data
processing methods best suits capital market analysis, by employing both linear
and polynomial regression models on each preprocessed dataset and evaluating
their performance using a range of metrics, including cross-validation score,
MSE, MAE, RMSE, R-squared, and Adjusted R-squared. The research findings reveal
that linear interpolation with polynomial regression emerges as the
top-performing method, boasting the lowest validation MSE and MAE values,
alongside the highest R-squared and Adjusted R-squared values.
arXiv link: http://arxiv.org/abs/2501.06587v1
A Hybrid Framework for Reinsurance Optimization: Integrating Generative Models and Reinforcement Learning
ensure financial stability, and maintain solvency. Traditional approaches often
struggle with dynamic claim distributions, high-dimensional constraints, and
evolving market conditions. This paper introduces a novel hybrid framework that
integrates {Generative Models}, specifically Variational Autoencoders (VAEs),
with {Reinforcement Learning (RL)} using Proximal Policy Optimization (PPO).
The framework enables dynamic and scalable optimization of reinsurance
strategies by combining the generative modeling of complex claim distributions
with the adaptive decision-making capabilities of reinforcement learning.
The VAE component generates synthetic claims, including rare and catastrophic
events, addressing data scarcity and variability, while the PPO algorithm
dynamically adjusts reinsurance parameters to maximize surplus and minimize
ruin probability. The framework's performance is validated through extensive
experiments, including out-of-sample testing, stress-testing scenarios (e.g.,
pandemic impacts, catastrophic events), and scalability analysis across
portfolio sizes. Results demonstrate its superior adaptability, scalability,
and robustness compared to traditional optimization techniques, achieving
higher final surpluses and computational efficiency.
Key contributions include the development of a hybrid approach for
high-dimensional optimization, dynamic reinsurance parameterization, and
validation against stochastic claim distributions. The proposed framework
offers a transformative solution for modern reinsurance challenges, with
potential applications in multi-line insurance operations, catastrophe
modeling, and risk-sharing strategy design.
arXiv link: http://arxiv.org/abs/2501.06404v1
Sectorial Exclusion Criteria in the Marxist Analysis of the Average Rate of Profit: The United States Case (1960-2020)
adhere to a theoretically grounded standard regarding which economic activities
should or should not be included for such purposes, which is relevant because
methodological non-uniformity can be a significant source of overestimation or
underestimation, generating a less accurate reflection of the capital
accumulation dynamics. This research aims to provide a standard Marxist
decision criterion regarding the inclusion and exclusion of economic activities
for the calculation of the Marxist average profit rate for the case of United
States economic sectors from 1960 to 2020, based on the Marxist definition of
productive labor, its location in the circuit of capital, and its relationship
with the production of surplus value. Using wavelet-transformed Daubechies
filters with increased symmetry, empirical mode decomposition, Hodrick-Prescott
filter embedded in unobserved components model, and a wide variety of unit root
tests the internal theoretical consistency of the presented criteria is
evaluated. Also, the objective consistency of the theory is evaluated by a
dynamic factor auto-regressive model, Principal Component Analysis, Singular
Value Decomposition and Backward Elimination with Linear and Generalized Linear
Models. The results are consistent both theoretically and econometrically with
the logic of Marx's political economy.
arXiv link: http://arxiv.org/abs/2501.06270v1
Comparing latent inequality with ordinal data
data are available and without imposing parametric assumptions on the
underlying continuous distributions. First, we contribute identification
results. We show how certain ordinal conditions provide evidence of
between-group inequality, quantified by particular quantiles being higher in
one latent distribution than in the other. We also show how other ordinal
conditions provide evidence of higher within-group inequality in one
distribution than in the other, quantified by particular interquantile ranges
being wider in one latent distribution than in the other. Second, we propose an
"inner" confidence set for the quantiles that are higher for the first latent
distribution. We also describe frequentist and Bayesian inference on features
of the ordinal distributions relevant to our identification results. Our
contributions are illustrated by empirical examples with mental health and
general health.
arXiv link: http://arxiv.org/abs/2501.05338v1
Time-Varying Bidirectional Causal Relationships Between Transaction Fees and Economic Activity of Subsystems Utilizing the Ethereum Blockchain Network
smart-contract execution through levies of transaction fees, commonly known as
gas fees. This framework mediates economic participation via a market-based
mechanism for gas fees, permitting users to offer higher gas fees to expedite
pro-cessing. Historically, the ensuing gas fee volatility led to critical
disequilibria between supply and demand for block space, presenting stakeholder
challenges. This study examines the dynamic causal interplay between
transaction fees and economic subsystems leveraging the network. By utilizing
data related to unique active wallets and transaction volume of each subsystem
and applying time-varying Granger causality analysis, we reveal temporal
heterogeneity in causal relationships between economic activity and transaction
fees across all subsystems. This includes (a) a bidirectional causal feedback
loop between cross-blockchain bridge user activity and transaction fees, which
diminishes over time, potentially signaling user migration; (b) a bidirectional
relationship between centralized cryptocurrency exchange deposit and withdrawal
transaction volume and fees, indicative of increased competition for block
space; (c) decentralized exchange volumes causally influence fees, while fees
causally influence user activity, although this relationship is weakening,
potentially due to the diminished significance of decentralized finance; (d)
intermittent causal relationships with maximal extractable value bots; (e) fees
causally in-fluence non-fungible token transaction volumes; and (f) a highly
significant and growing causal influence of transaction fees on stablecoin
activity and transaction volumes highlight its prominence.
arXiv link: http://arxiv.org/abs/2501.05299v1
RUM-NN: A Neural Network Model Compatible with Random Utility Maximisation for Discrete Choice Setups
probabilities in neural networks, derived from and fully consistent with the
Random Utility Maximization (RUM) theory, referred to as RUM-NN. Neural network
models show remarkable performance compared with statistical models; however,
they are often criticized for their lack of transparency and interoperability.
The proposed RUM-NN is introduced in both linear and nonlinear structures. The
linear RUM-NN retains the interpretability and identifiability of traditional
econometric discrete choice models while using neural network-based estimation
techniques. The nonlinear RUM-NN extends the model's flexibility and predictive
capabilities to capture nonlinear relationships between variables within
utility functions. Additionally, the RUM-NN allows for the implementation of
various parametric distributions for unobserved error components in the utility
function and captures correlations among error terms. The performance of RUM-NN
in parameter recovery and prediction accuracy is rigorously evaluated using
synthetic datasets through Monte Carlo experiments. Additionally, RUM-NN is
evaluated on the Swissmetro and the London Passenger Mode Choice (LPMC)
datasets with different sets of distribution assumptions for the error
component. The results demonstrate that RUM-NN under a linear utility structure
and IID Gumbel error terms can replicate the performance of the Multinomial
Logit (MNL) model, but relaxing those constraints leads to superior performance
for both Swissmetro and LPMC datasets. By introducing a novel estimation
approach aligned with statistical theories, this study empowers econometricians
to harness the advantages of neural network models.
arXiv link: http://arxiv.org/abs/2501.05221v1
DisSim-FinBERT: Text Simplification for Core Message Extraction in Complex Financial Texts
Discourse Simplification (DisSim) with Aspect-Based Sentiment Analysis (ABSA)
to enhance sentiment prediction in complex financial texts. By simplifying
intricate documents such as Federal Open Market Committee (FOMC) minutes,
DisSim improves the precision of aspect identification, resulting in sentiment
predictions that align more closely with economic events. The model preserves
the original informational content and captures the inherent volatility of
financial language, offering a more nuanced and accurate interpretation of
long-form financial communications. This approach provides a practical tool for
policymakers and analysts aiming to extract actionable insights from central
bank narratives and other detailed economic documents.
arXiv link: http://arxiv.org/abs/2501.04959v2
Identification of dynamic treatment effects when treatment histories are partially observed
identifying path-dependent treatment effects when treatment histories are
partially observed. We introduce a novel robust estimator that adjusts for
missing histories using a combination of outcome, propensity score, and missing
treatment models. We show that this approach identifies the target parameter as
long as any two of the three models are correctly specified. The
method delivers improved robustness against competing alternatives under the
same set of identifying assumptions. Theoretical results and numerical
experiments demonstrate how the proposed method yields more accurate inference
compared to conventional and doubly robust estimators, particularly under
nontrivial missingness and misspecification scenarios. Two applications
demonstrate that the robust method can produce substantively different
estimates of path-dependent treatment effects relative to conventional
approaches.
arXiv link: http://arxiv.org/abs/2501.04853v2
Monthly GDP Growth Estimates for the U.S. States
produce nowcasts and historical estimates of monthly real state-level GDP for
the 50 U.S. states, plus Washington DC, from 1964 through the present day. The
MF-VAR model incorporates state and U.S. data at the monthly, quarterly, and
annual frequencies. Temporal and cross-sectional constraints are imposed to
ensure that the monthly state-level estimates are consistent with official
estimates of quarterly GDP at the U.S. and state-levels. We illustrate the
utility of the historical estimates in better understanding state business
cycles and cross-state dependencies. We show how the model produces accurate
nowcasts of state GDP three months ahead of the BEA's quarterly estimates,
after conditioning on the latest estimates of U.S. GDP.
arXiv link: http://arxiv.org/abs/2501.04607v1
Sequential Monte Carlo for Noncausal Processes
estimation of mixed causal and noncausal models. Unlike previous Bayesian
estimation methods developed for these models, Sequential Monte Carlo offers
extensive parallelization opportunities, significantly reducing estimation time
and mitigating the risk of becoming trapped in local minima, a common issue in
noncausal processes. Simulation studies demonstrate the strong ability of the
algorithm to produce accurate estimates and correctly identify the process. In
particular, we propose a novel identification methodology that leverages the
Marginal Data Density and the Bayesian Information Criterion. Unlike previous
studies, this methodology determines not only the causal and noncausal
polynomial orders but also the error term distribution that best fits the data.
Finally, Sequential Monte Carlo is applied to a bivariate process containing
S$&$P Europe 350 ESG Index and Brent crude oil prices.
arXiv link: http://arxiv.org/abs/2501.03945v1
High-frequency Density Nowcasts of U.S. State-Level Carbon Dioxide Emissions
for shaping climate policies and meeting global decarbonization targets.
However, energy consumption and emissions data are released annually and with
substantial publication lags, hindering timely decision-making. This paper
introduces a panel nowcasting framework to produce higher-frequency predictions
of the state-level growth rate of per-capita energy consumption and CO2
emissions in the United States (U.S.). Our approach employs a panel mixed-data
sampling (MIDAS) model to predict per-capita energy consumption growth,
considering quarterly personal income, monthly electricity consumption, and a
weekly economic conditions index as predictors. A bridge equation linking
per-capita CO2 emissions growth with the nowcasts of energy consumption is
estimated using panel quantile regression methods. A pseudo out-of-sample study
(2009-2018), simulating the real-time data release calendar, confirms the
improved accuracy of our nowcasts with respect to a historical benchmark. Our
results suggest that by leveraging the availability of higher-frequency
indicators, we not only enhance predictive accuracy for per-capita energy
consumption growth but also provide more reliable estimates of the distribution
of CO2 emissions growth.
arXiv link: http://arxiv.org/abs/2501.03380v1
A data-driven merit order: Learning a fundamental electricity price model
models. Data-driven models learn from historical patterns, while fundamental
models simulate electricity markets. Traditionally, fundamental models have
been too computationally demanding to allow for intrinsic parameter estimation
or frequent updates, which are essential for short-term forecasting. In this
paper, we propose a novel data-driven fundamental model that combines the
strengths of both approaches. We estimate the parameters of a fully fundamental
merit order model using historical data, similar to how data-driven models
work. This removes the need for fixed technical parameters or expert
assumptions, allowing most parameters to be calibrated directly to
observations. The model is efficient enough for quick parameter estimation and
forecast generation. We apply it to forecast German day-ahead electricity
prices and demonstrate that it outperforms both classical fundamental and
purely data-driven models. The hybrid model effectively captures price
volatility and sequential price clusters, which are becoming increasingly
important with the expansion of renewable energy sources. It also provides
valuable insights, such as fuel switches, marginal power plant contributions,
estimated parameters, dispatched plants, and power generation.
arXiv link: http://arxiv.org/abs/2501.02963v1
Identifying the Hidden Nexus between Benford Law Establishment in Stock Market and Market Efficiency: An Empirical Investigation
to numerous studies due to its unique applications in financial fields,
especially accounting and auditing. However, studies that addressed the law's
establishment in the stock markets generally concluded that stock prices do not
comply with the underlying distribution. The present research, emphasizing data
randomness as the underlying assumption of Benford's law, has conducted an
empirical investigation of the Warsaw Stock Exchange. The outcomes demonstrated
that since stock prices are not distributed randomly, the law cannot be held in
the stock market. Besides, the Chi-square goodness-of-fit test also supported
the obtained results. Moreover, it is discussed that the lack of randomness
originated from market inefficiency. In other words, violating the efficient
market hypothesis has caused the time series non-randomness and the failure to
establish Benford's law.
arXiv link: http://arxiv.org/abs/2501.02674v1
Re-examining Granger Causality from Causal Bayesian Networks Perspective
critical to understanding these systems. For many, Granger causality (GC)
remains a computational tool of choice to identify causal relations in time
series data. Like other causal discovery tools, GC has limitations and has been
criticized as a non-causal framework. Here, we addressed one of the recurring
criticisms of GC by endowing it with proper causal interpretation. This was
achieved by analyzing GC from Reichenbach's Common Cause Principles (RCCPs) and
causal Bayesian networks (CBNs) lenses. We showed theoretically and graphically
that this reformulation endowed GC with a proper causal interpretation under
certain assumptions and achieved satisfactory results on simulation.
arXiv link: http://arxiv.org/abs/2501.02672v1
Revealed Social Networks
Using choice data and exogenous group variation, we first develop a revealed
preference style test for the linear-in-means model. This test is formulated as
a linear program and can be interpreted as a no money pump condition with an
additional incentive compatibility constraint. We then study the identification
properties of the linear-in-means model. A key takeaway from our analysis is
that there is a close relationship between the dimension of the outcome
variable and the identifiability of the model. Importantly, when the outcome
variable is one-dimensional, failures of identification are generic. On the
other hand, when the outcome variable is multi-dimensional, we provide natural
conditions under which identification is generic.
arXiv link: http://arxiv.org/abs/2501.02609v2
Estimating Discrete Choice Demand Models with Sparse Market-Product Shocks
model for differentiated products when the vector of market-product level
shocks is sparse. Assuming sparsity, we establish nonparametric identification
of the distribution of random coefficients and demand shocks under mild
conditions. Then we develop a Bayesian procedure, which exploits the sparsity
structure using shrinkage priors, to conduct inference about the model
parameters and counterfactual quantities. Comparing to the standard BLP (Berry,
Levinsohn, & Pakes, 1995) method, our approach does not require demand
inversion or instrumental variables (IVs), thus provides a compelling
alternative when IVs are not available or their validity is questionable. Monte
Carlo simulations validate our theoretical findings and demonstrate the
effectiveness of our approach, while empirical applications reveal evidence of
sparse demand shocks in well-known datasets.
arXiv link: http://arxiv.org/abs/2501.02381v2
Prediction with Differential Covariate Classification: Illustrated by Racial/Ethnic Classification in Medical Risk Assessment
conditional probabilities P(y|x) obtained from research studies to predict
outcomes y on the basis of observed covariates x. Given this information,
decisions are then based on the predicted outcomes. Researchers commonly assume
that the predictors used in the generation of the evidence are the same as
those used in applying the evidence: i.e., the meaning of x in the two
circumstances is the same. This may not be the case in real-world settings.
Across a wide-range of settings, ranging from clinical practice or education
policy, demographic attributes (e.g., age, race, ethnicity) are often
classified differently in research studies than in decision settings. This
paper studies identification in such settings. We propose a formal framework
for prediction with what we term differential covariate classification (DCC).
Using this framework, we analyze partial identification of probabilistic
predictions and assess how various assumptions influence the identification
regions. We apply the findings to a range of settings, focusing mainly on
differential classification of individuals' race and ethnicity in clinical
medicine. We find that bounds on P(y|x) can be wide, and the information needed
to narrow them available only in special cases. These findings highlight an
important problem in using evidence in decision making, a problem that has not
yet been fully appreciated in debates on classification in public policy and
medicine.
arXiv link: http://arxiv.org/abs/2501.02318v1
Efficient estimation of average treatment effects with unmeasured confounding and proxies
with unmeasured confounding is the proximal causal inference, which assumes the
availability of outcome and treatment confounding proxies. The key identifying
result relies on the existence of a so-called bridge function. A parametric
specification of the bridge function is usually postulated and estimated using
standard techniques. The estimated bridge function is then plugged in to
estimate the average treatment effect. This approach may have two efficiency
losses. First, the bridge function may not be efficiently estimated since it
solves an integral equation. Second, the sequential procedure may fail to
account for the correlation between the two steps. This paper proposes to
approximate the integral equation with increasing moment restrictions and
jointly estimate the bridge function and the average treatment effect. Under
sufficient conditions, we show that the proposed estimator is efficient. To
assist implementation, we propose a data-driven procedure for selecting the
tuning parameter (i.e., number of moment restrictions). Simulation studies
reveal that the proposed method performs well in finite samples, and
application to the right heart catheterization dataset from the SUPPORT study
demonstrates its practical value.
arXiv link: http://arxiv.org/abs/2501.02214v1
Grid-level impacts of renewable energy on thermal generation: efficiency, emissions and flexibility
supply globally. We find that this leads to shifts in the operational dynamics
of thermal power plants. Using fixed effects panel regression across seven
major U.S. balancing authorities, we analyze the impact of renewable generation
on coal, natural gas combined cycle plants, and natural gas combustion
turbines. Wind generation consistently displaces thermal output, while effects
from solar vary significantly by region, achieving substantial displacement in
areas with high solar penetration such as the California Independent System
Operator but limited impacts in coal reliant grids such as the Midcontinent
Independent System Operator. Renewable energy sources effectively reduce carbon
dioxide emissions in regions with flexible thermal plants, achieving
displacement effectiveness as high as one hundred and two percent in the
California Independent System Operator and the Electric Reliability Council of
Texas. However, in coal heavy areas such as the Midcontinent Independent System
Operator and the Pennsylvania New Jersey Maryland Interconnection,
inefficiencies from ramping and cycling reduce carbon dioxide displacement to
as low as seventeen percent and often lead to elevated nitrogen oxides and
sulfur dioxide emissions. These findings underscore the critical role of grid
design, fuel mix, and operational flexibility in shaping the emissions benefits
of renewables. Targeted interventions, including retrofitting high emitting
plants and deploying energy storage, are essential to maximize emissions
reductions and support the decarbonization of electricity systems.
arXiv link: http://arxiv.org/abs/2501.01954v2
Quantifying A Firm's AI Engagement: Constructing Objective, Data-Driven, AI Stock Indices Using 10-K Filings
reveal the selection criteria for determining which stocks qualify as
AI-related are often opaque and rely on vague phrases and subjective judgments.
This paper proposes a new, objective, data-driven approach using natural
language processing (NLP) techniques to classify AI stocks by analyzing annual
10-K filings from 3,395 NASDAQ-listed firms between 2011 and 2023. This
analysis quantifies each company's engagement with AI through binary indicators
and weighted AI scores based on the frequency and context of AI-related terms.
Using these metrics, we construct four AI stock indices-the Equally Weighted AI
Index (AII), the Size-Weighted AI Index (SAII), and two Time-Discounted AI
Indices (TAII05 and TAII5X)-offering different perspectives on AI investment.
We validate our methodology through an event study on the launch of OpenAI's
ChatGPT, demonstrating that companies with higher AI engagement saw
significantly greater positive abnormal returns, with analyses supporting the
predictive power of our AI measures. Our indices perform on par with or surpass
14 existing AI-themed ETFs and the Nasdaq Composite Index in risk-return
profiles, market responsiveness, and overall performance, achieving higher
average daily returns and risk-adjusted metrics without increased volatility.
These results suggest our NLP-based approach offers a reliable,
market-responsive, and cost-effective alternative to existing AI-related ETF
products. Our innovative methodology can also guide investors, asset managers,
and policymakers in using corporate data to construct other thematic
portfolios, contributing to a more transparent, data-driven, and competitive
approach.
arXiv link: http://arxiv.org/abs/2501.01763v1
Instrumental Variables with Time-Varying Exposure: New Estimates of Revascularization Effects on Quality of Life
an invasive treatment strategy centered on revascularization with a control
group assigned non-invasive medical therapy. As is common in such “strategy
trials,” many participants assigned to treatment remained untreated while many
assigned to control crossed over into treatment. Intention-to-treat (ITT)
analyses of strategy trials preserve randomization-based comparisons, but ITT
effects are diluted by non-compliance. Conventional per-protocol analyses that
condition on treatment received are likely biased by discarding random
assignment. In trials where compliance choices are made shortly after
assignment, instrumental variables (IV) methods solve both problems --
recovering an undiluted average causal effect of treatment for treated subjects
who comply with trial protocol. In ISCHEMIA, however, some controls were
revascularized as long as five years after random assignment. This paper
extends the IV framework for strategy trials, allowing for such dynamic
non-random compliance behavior. IV estimates of long-run revascularization
effects on quality of life are markedly larger than previously reported ITT and
per-protocol estimates. We also show how to estimate complier characteristics
in a dynamic-treatment setting. These estimates reveal increasing selection
bias in naive time-varying per-protocol estimates of revascularization effects.
Compliers have baseline health similar to that of the study population, while
control-group crossovers are far sicker.
arXiv link: http://arxiv.org/abs/2501.01623v1
HMM-LSTM Fusion Model for Economic Forecasting
Short-Term Memory (LSTM) neural networks for economic forecasting, focusing on
predicting CPI inflation rates. The study explores a new approach that
integrates HMM-derived hidden states and means as additional features for LSTM
modeling, aiming to enhance the interpretability and predictive performance of
the models. The research begins with data collection and preprocessing,
followed by the implementation of the HMM to identify hidden states
representing distinct economic conditions. Subsequently, LSTM models are
trained using the original and augmented data sets, allowing for comparative
analysis and evaluation. The results demonstrate that incorporating HMM-derived
data improves the predictive accuracy of LSTM models, particularly in capturing
complex temporal patterns and mitigating the impact of volatile economic
conditions. Additionally, the paper discusses the implementation of Integrated
Gradients for model interpretability and provides insights into the economic
dynamics reflected in the forecasting outcomes.
arXiv link: http://arxiv.org/abs/2501.02002v1
The Impact of Socio-Economic Challenges and Technological Progress on Economic Inequality: An Estimation with the Perelman Model and Ricci Flow Methods
on economic inequality, using the Perelman model and Ricci flow mathematical
methods. The study aims to conduct a deep analysis of the impact of
socio-economic challenges and technological progress on the dynamics of the
Gini coefficient. The article examines the following parameters: income
distribution, productivity (GDP per hour), unemployment rate, investment rate,
inflation rate, migration (net negative), education level, social mobility,
trade infrastructure, capital flows, innovative activities, access to
healthcare, fiscal policy (budget deficit), international trade (turnover
relative to GDP), social protection programs, and technological access. The
results of the study confirm that technological innovations and social
protection programs have a positive impact on reducing inequality. Productivity
growth, improving the quality of education, and strengthening R&D investments
increase the possibility of inclusive development. Sensitivity analysis shows
that social mobility and infrastructure are important factors that affect
economic stability. The accuracy of the model is confirmed by high R^2 values
(80-90%) and the statistical reliability of the Z-statistic (<0.05). The study
uses Ricci flow methods, which allow for a geometric analysis of the
transformation of economic parameters in time and space. Recommendations
include the strategic introduction of technological progress, the expansion of
social protection programs, improving the quality of education, and encouraging
international trade, which will contribute to economic sustainability and
reduce inequality. The article highlights multifaceted approaches that combine
technological innovation and responses to socio-economic challenges to ensure
sustainable and inclusive economic development.
arXiv link: http://arxiv.org/abs/2501.00800v1
Copula Central Asymmetry of Equity Portfolios
dependence between asset returns, causing asymmetry between the lower and upper
tail of return distribution. The detection of asymmetric dependence is now
understood to be essential for market supervision, risk management, and
portfolio allocation. I propose a non-parametric test procedure for the
hypothesis of copula central symmetry based on the Cram\'er-von Mises distance
of the empirical copula and its survival counterpart, deriving the asymptotic
properties of the test under standard assumptions for stationary time series. I
use the powerful tie-break bootstrap that, as the included simulation study
implies, allows me to detect asymmetries with up to 25 series and the number of
observations corresponding to one year of daily returns. Applying the procedure
to US portfolio returns separately for each year shows that the amount of
copula central asymmetry is time-varying and less present in the recent past.
Asymmetry is more critical in portfolios based on size and less in portfolios
based on book-to-market and momentum. In portfolios based on industry
classification, asymmetry is present during market downturns, coherently with
the financial contagion narrative.
arXiv link: http://arxiv.org/abs/2501.00634v2
Panel Estimation of Taxable Income Elasticities with Heterogeneity and Endogenous Budget Sets
elasticities of taxable income (ETI), addressing key econometric challenges
posed by nonlinear budget sets. Building on an isoelastic utility framework, we
derive a linear-in-logs taxable income specification that incorporates the
entire budget set while allowing for individual-specific ETI and productivity
growth. To account for endogenous budget sets, we employ panel data and
estimate individual-specific ridge regressions, constructing a debiased average
of ridge coefficients to obtain the average ETI.
arXiv link: http://arxiv.org/abs/2501.00633v1
Regression discontinuity aggregation, with an application to the union effects on inequality
unit's treatment status is an average or aggregate across multiple
discontinuity events. Such situations arise in many studies where the outcome
is measured at a higher level of spatial or temporal aggregation (e.g., by
state with district-level discontinuities) or when spillovers from
discontinuity events are of interest. We propose two novel estimation
procedures - one at the level at which the outcome is measured and the other in
the sample of discontinuities - and show that both identify a local average
causal effect under continuity assumptions similar to those of standard RD
designs. We apply these ideas to study the effect of unionization on inequality
in the United States. Using credible variation from close unionization
elections at the establishment level, we show that a higher rate of newly
unionized workers in a state-by-industry cell reduces wage inequality within
the cell.
arXiv link: http://arxiv.org/abs/2501.00428v1
Causal Hangover Effects
affected partly by what takes place off the court. We can't observe what
happens between games directly. Instead, we proxy for the possibility of
athletes partying by looking at play following games in party cities. We are
interested to see if teams exhibit a decline in performance the day following a
game in a city with active nightlife; we call this a "hangover effect". Part of
the question is determining a reasonable way to measure levels of nightlife,
and correspondingly which cities are notorious for it; we colloquially refer to
such cities as "party cities". To carry out this study, we exploit data on
bookmaker spreads: the expected score differential between two teams after
conditioning on observable performance in past games and expectations about the
upcoming game. We expect a team to meet the spread half the time, since this is
one of the easiest ways for bookmakers to guarantee a profit. We construct a
model which attempts to estimate the causal effect of visiting a "party city"
on subsequent day performance as measured by the odds of beating the spread. In
particular, we only consider the hangover effect on games played back-to-back
within 24 hours of each other. To the extent that odds of beating the spread
against next day opponent is uncorrelated with playing in a party city the day
before, which should be the case under an efficient betting market, we have
identification in our variable of interest. We find that visiting a city with
active nightlife the day prior to a game does have a statistically significant
negative effect on a team's likelihood of meeting bookmakers' expectations for
both NBA and MLB.
arXiv link: http://arxiv.org/abs/2412.21181v1
Analyzing Country-Level Vaccination Rates and Determinants of Practical Capacity to Administer COVID-19 Vaccines
administration proved an extreme logistics operation of global magnitude.
Global vaccination levels, however, remain a key concern in preventing the
emergence of new strains and minimizing the impact of the pandemic's disruption
of daily life. In this paper, country-level vaccination rates are analyzed
through a queuing framework to extract service rates that represent the
practical capacity of a country to administer vaccines. These rates are further
characterized through regression and interpretable machine learning methods
with country-level demographic, governmental, and socio-economic variates.
Model results show that participation in multi-governmental collaborations such
as COVAX may improve the ability to vaccinate. Similarly, improved
transportation and accessibility variates such as roads per area for low-income
countries and rail lines per area for high-income countries can improve rates.
It was also found that for low-income countries specifically, improvements in
basic and health infrastructure (as measured through spending on healthcare,
number of doctors and hospital beds per 100k, population percent with access to
electricity, life expectancy, and vehicles per 1000 people) resulted in higher
vaccination rates. Of the high-income countries, those with larger 65-plus
populations struggled to vaccinate at high rates, indicating potential
accessibility issues for the elderly. This study finds that improving basic and
health infrastructure, focusing on accessibility in the last mile, particularly
for the elderly, and fostering global partnerships can improve logistical
operations of such a scale. Such structural impediments and inequities in
global health care must be addressed in preparation for future global public
health crises.
arXiv link: http://arxiv.org/abs/2501.01447v2
Econometric Analysis of Pandemic Disruption and Recovery Trajectory in the U.S. Rail Freight Industry
disruptions of the 2007-09 Great Recession and the COVID-19 pandemic, this
paper uses time series analysis with the AutoRegressive Integrated Moving
Average (ARIMA) family of models and covariates to model intermodal and
commodity-specific rail freight volumes based on pre-disruption data. A
framework to construct scenarios and select parameters and variables is
demonstrated. By comparing actual freight volumes during the disruptions
against three counterfactual scenarios, Trend Continuation, Covariate-adapted
Trend Continuation, and Full Covariate-adapted Prediction, the characteristics
and differences in magnitude and timing between the two disruptions and their
effects across nine freight components are examined.
Results show the disruption impacts differ from measurement by simple
comparison with pre-disruption levels or year-on-year comparison depending on
the structural trend and seasonal pattern. Recovery Pace Plots are introduced
to support comparison in recovery speeds across freight components. Accounting
for economic variables helps improve model fitness. It also enables evaluation
of the change in association between freight volumes and covariates, where
intermodal freight was found to respond more slowly during the pandemic,
potentially due to supply constraint.
arXiv link: http://arxiv.org/abs/2412.20669v1
Automated Demand Forecasting in small to medium-sized enterprises
research proposes a generalized automated sales forecasting pipeline tailored
for small- to medium-sized enterprises (SMEs). Unlike large corporations with
dedicated data scientists for sales forecasting, SMEs often lack such
resources. To address this, we developed a comprehensive forecasting pipeline
that automates time series sales forecasting, encompassing data preparation,
model training, and selection based on validation results.
The development included two main components: model preselection and the
forecasting pipeline. In the first phase, state-of-the-art methods were
evaluated on a showcase dataset, leading to the selection of ARIMA, SARIMAX,
Holt-Winters Exponential Smoothing, Regression Tree, Dilated Convolutional
Neural Networks, and Generalized Additive Models. An ensemble prediction of
these models was also included. Long-Short-Term Memory (LSTM) networks were
excluded due to suboptimal prediction accuracy, and Facebook Prophet was
omitted for compatibility reasons.
In the second phase, the proposed forecasting pipeline was tested with SMEs
in the food and electric industries, revealing variable model performance
across different companies. While one project-based company derived no benefit,
others achieved superior forecasts compared to naive estimators.
Our findings suggest that no single model is universally superior. Instead, a
diverse set of models, when integrated within an automated validation
framework, can significantly enhance forecasting accuracy for SMEs. These
results emphasize the importance of model diversity and automated validation in
addressing the unique needs of each business. This research contributes to the
field by providing SMEs access to state-of-the-art sales forecasting tools,
enabling data-driven decision-making and improving operational efficiency.
arXiv link: http://arxiv.org/abs/2412.20420v1
Fitting Dynamically Misspecified Models: An Optimal Transportation Approach
potentially dynamically misspecified state-space models. When dynamics are
misspecified, filtered values of state variables often do not satisfy model
restrictions, making them hard to interpret, and parameter estimates may fail
to characterize the dynamics of filtered variables. To address this, a
sequential optimal transportation approach is used to generate a
model-consistent sample by mapping observations from a flexible reduced-form to
the structural conditional distribution iteratively. Filtered series from the
generated sample are model-consistent. Specializing to linear processes, a
closed-form Optimal Transport Filtering algorithm is derived. Minimizing the
discrepancy between generated and actual observations defines an Optimal
Transport Estimator. Its large sample properties are derived. A specification
test determines if the model can reproduce the sample path, or if the
discrepancy is statistically significant. Empirical applications to trend-cycle
decomposition, DSGE models, and affine term structure models illustrate the
methodology and the results.
arXiv link: http://arxiv.org/abs/2412.20204v2
Debiased Nonparametric Regression for Statistical Inference and Distributionally Robustness
While machine learning techniques such as random forests and neural networks
have demonstrated strong predictive performance, their theoretical properties
remain relatively underexplored. In particular, many modern algorithms lack
guarantees of pointwise and uniform risk convergence, as well as asymptotic
normality. These properties are essential for statistical inference and robust
estimation and have been well-established for classical methods such as
Nadaraya-Watson regression. To ensure these properties for various
nonparametric regression estimators, we introduce a model-free debiasing
method. By incorporating a correction term that estimates the conditional
expected residual of the original estimator, or equivalently, its estimation
error, into the initial nonparametric regression estimator, we obtain a
debiased estimator that satisfies pointwise and uniform risk convergence, along
with asymptotic normality, under mild smoothness conditions. These properties
facilitate statistical inference and enhance robustness to covariate shift,
making the method broadly applicable to a wide range of nonparametric
regression problems.
arXiv link: http://arxiv.org/abs/2412.20173v3
Assets Forecasting with Feature Engineering and Transformation Methods for LightGBM
consumer markets, impacting millions of individuals. Hence, accurately
forecasting it is essential for mitigating risks, including those associated
with inactivity. Although research shows that hybrid models of Deep Learning
(DL) and Machine Learning (ML) yield promising results, their computational
requirements often exceed the capabilities of average personal computers,
rendering them inaccessible to many. In order to address this challenge in this
paper we optimize LightGBM (an efficient implementation of gradient-boosted
decision trees (GBDT)) for maximum performance, while maintaining low
computational requirements. We introduce novel feature engineering techniques
including indicator-price slope ratios and differences of close and open prices
divided by the corresponding 14-period Exponential Moving Average (EMA),
designed to capture market dynamics and enhance predictive accuracy.
Additionally, we test seven different feature and target variable
transformation methods, including returns, logarithmic returns, EMA ratios and
their standardized counterparts as well as EMA difference ratios, so as to
identify the most effective ones weighing in both efficiency and accuracy. The
results demonstrate Log Returns, Returns and EMA Difference Ratio constitute
the best target variable transformation methods, with EMA ratios having a lower
percentage of correct directional forecasts, and standardized versions of
target variable transformations requiring significantly more training time.
Moreover, the introduced features demonstrate high feature importance in
predictive performance across all target variable transformation methods. This
study highlights an accessible, computationally efficient approach to stock
market forecasting using LightGBM, making advanced forecasting techniques more
widely attainable.
arXiv link: http://arxiv.org/abs/2501.07580v1
Asymptotic Properties of the Maximum Likelihood Estimator for Markov-switching Observation-driven Models
$((S_t,Y_t))_{t \in Z}$ where (i) $(S_t)_{t \in Z}$ is an
unobserved Markov process taking values in a finite set and (ii) $(Y_t)_{t \in
Z}$ is an observed process such that the conditional distribution of
$Y_t$ given all past $Y$'s and the current and all past $S$'s depends only on
all past $Y$'s and $S_t$. In this paper, we prove the consistency and
asymptotic normality of the maximum likelihood estimator for such model. As a
special case hereof, we give conditions under which the maximum likelihood
estimator for the widely applied Markov-switching generalised autoregressive
conditional heteroscedasticity model introduced by Haas et al. (2004b) is
consistent and asymptotic normal.
arXiv link: http://arxiv.org/abs/2412.19555v2
Sentiment trading with large language models
analysis of U.S. financial news and their potential in predicting stock market
returns. We analyze a dataset comprising 965,375 news articles that span from
January 1, 2010, to June 30, 2023; we focus on the performance of various LLMs,
including BERT, OPT, FINBERT, and the traditional Loughran-McDonald dictionary
model, which has been a dominant methodology in the finance literature. The
study documents a significant association between LLM scores and subsequent
daily stock returns. Specifically, OPT, which is a GPT-3 based LLM, shows the
highest accuracy in sentiment prediction with an accuracy of 74.4%, slightly
ahead of BERT (72.5%) and FINBERT (72.2%). In contrast, the Loughran-McDonald
dictionary model demonstrates considerably lower effectiveness with only 50.1%
accuracy. Regression analyses highlight a robust positive impact of OPT model
scores on next-day stock returns, with coefficients of 0.274 and 0.254 in
different model specifications. BERT and FINBERT also exhibit predictive
relevance, though to a lesser extent. Notably, we do not observe a significant
relationship between the Loughran-McDonald dictionary model scores and stock
returns, challenging the efficacy of this traditional method in the current
financial context. In portfolio performance, the long-short OPT strategy excels
with a Sharpe ratio of 3.05, compared to 2.11 for BERT and 2.07 for FINBERT
long-short strategies. Strategies based on the Loughran-McDonald dictionary
yield the lowest Sharpe ratio of 1.23. Our findings emphasize the superior
performance of advanced LLMs, especially OPT, in financial market prediction
and portfolio management, marking a significant shift in the landscape of
financial analysis tools with implications to financial regulation and policy
analysis.
arXiv link: http://arxiv.org/abs/2412.19245v1
Using Ordinal Voting to Compare the Utilitarian Welfare of a Status Quo and A Proposed Policy: A Simple Nonparametric Analysis
utilitarian welfare has long been discussed. I consider choice between a status
quo and a proposed policy when persons have interpersonally comparable cardinal
utilities taking values in a bounded interval, voting is compulsory, and each
person votes for a policy that maximizes utility. I show that knowledge of the
attained status quo welfare and the voting outcome yields an informative bound
on welfare with the proposed policy. The bound contains the value of status quo
welfare, so the better utilitarian policy is not known. The minimax-regret
decision and certain Bayes decisions choose the proposed policy if its vote
share exceeds the known value of status quo welfare. This procedure differs
from majority rule, which chooses the proposed policy if its vote share exceeds
1/2.
arXiv link: http://arxiv.org/abs/2412.18714v1
Conditional Influence Functions
conditional distribution. One important example is an average treatment effect
conditional on a subset of covariates. Many of these objects have a conditional
influence function that generalizes the classical influence function of a
functional of a (unconditional) distribution. Conditional influence functions
have important uses analogous to those of the classical influence function.
They can be used to construct Neyman orthogonal estimating equations for
conditional objects of interest that depend on high dimensional regressions.
They can be used to formulate local policy effects and describe the effect of
local misspecification on conditional objects of interest. We derive
conditional influence functions for functionals of conditional means and other
features of the conditional distribution of an outcome variable. We show how
these can be used for locally linear estimation of conditional objects of
interest. We give rate conditions for first step machine learners to have no
effect on asymptotic distributions of locally linear estimators. We also give a
general construction of Neyman orthogonal estimating equations for conditional
objects of interest.
arXiv link: http://arxiv.org/abs/2412.18080v1
Minimax Optimal Simple Regret in Two-Armed Best-Arm Identification
two-armed fixed-budget best-arm identification (BAI) problem. Given two
treatment arms, the objective is to identify the arm with the highest expected
outcome through an adaptive experiment. We focus on the Neyman allocation,
where treatment arms are allocated following the ratio of their outcome
standard deviations. Our primary contribution is to prove the minimax
optimality of the Neyman allocation for the simple regret, defined as the
difference between the expected outcomes of the true best arm and the estimated
best arm. Specifically, we first derive a minimax lower bound for the expected
simple regret, which characterizes the worst-case performance achievable under
the location-shift distributions, including Gaussian distributions. We then
show that the simple regret of the Neyman allocation asymptotically matches
this lower bound, including the constant term, not just the rate in terms of
the sample size, under the worst-case distribution. Notably, our optimality
result holds without imposing locality restrictions on the distribution, such
as the local asymptotic normality. Furthermore, we demonstrate that the Neyman
allocation reduces to the uniform allocation, i.e., the standard randomized
controlled trial, under Bernoulli distributions.
arXiv link: http://arxiv.org/abs/2412.17753v2
A large non-Gaussian structural VAR with application to Monetary Policy
without the need to impose economically motivated restrictions. The model
scales well to higher dimensions, allowing the inclusion of a larger number of
variables. We develop an efficient Gibbs sampler to estimate the model. We also
present an estimator of the deviance information criterion to facilitate model
comparison. Finally, we discuss how economically motivated restrictions can be
added to the model. Experiments with artificial data show that the model
possesses good estimation properties. Using real data we highlight the benefits
of including more variables in the structural analysis. Specifically, we
identify a monetary policy shock and provide empirical evidence that prices and
economic output respond with a large delay to the monetary policy shock.
arXiv link: http://arxiv.org/abs/2412.17598v1
A Necessary and Sufficient Condition for Size Controllability of Heteroskedasticity Robust Test Statistics
(2025) concerning heteroskedasticity robust test statistics in regression
models. For the special, but important, case of testing a single restriction
(e.g., a zero restriction on a single coefficient), we povide a necessary and
sufficient condition for size controllability, whereas the condition in
P\"otscher and Preinerstorfer (2025) is, in general, only sufficient (even in
the case of testing a single restriction).
arXiv link: http://arxiv.org/abs/2412.17470v2
Advanced Models for Hourly Marginal CO2 Emission Factor Estimation: A Synergy between Fundamental and Statistical Approaches
particularly carbon dioxide (CO2). A metric used to quantify the change in CO2
emissions is the marginal emission factor, defined as the marginal change in
CO2 emissions resulting from a marginal change in electricity demand over a
specified period. This paper aims to present two methodologies to estimate the
marginal emission factor in a decarbonized electricity system with high
temporal resolution. First, we present an energy systems model that
incrementally calculates the marginal emission factors. Second, we examine a
Markov Switching Dynamic Regression model, a statistical model designed to
estimate marginal emission factors faster and use an incremental marginal
emission factor as a benchmark to assess its precision. For the German
electricity market, we estimate the marginal emissions factor time series
historically (2019, 2020) using Agora Energiewende and for the future (2025,
2030, and 2040) using estimated energy system data. The results indicate that
the Markov Switching Dynamic Regression model is more accurate in estimating
marginal emission factors than the Dynamic Linear Regression models, which are
frequently used in the literature. Hence, the Markov Switching Dynamic
Regression model is a simpler alternative to the computationally intensive
incremental marginal emissions factor, especially when short-term marginal
emissions factor estimation is needed. The results of the marginal emission
factor estimation are applied to an exemplary low-emission vehicle charging
scenario to estimate CO2 savings by shifting the charge hours to those
corresponding to the lower marginal emissions factor. By implementing this
emission-minimized charging approach, an average reduction of 31% in the
marginal emission factor was achieved over the 5 years.
arXiv link: http://arxiv.org/abs/2412.17379v1
Bayesian penalized empirical likelihood and Markov Chain Monte Carlo sampling
Penalized Empirical Likelihood (BPEL), designed to address the computational
challenges inherent in empirical likelihood (EL) approaches. Our approach has
two primary objectives: (i) to enhance the inherent flexibility of EL in
accommodating diverse model conditions, and (ii) to facilitate the use of
well-established Markov Chain Monte Carlo (MCMC) sampling schemes as a
convenient alternative to the complex optimization typically required for
statistical inference using EL. To achieve the first objective, we propose a
penalized approach that regularizes the Lagrange multipliers, significantly
reducing the dimensionality of the problem while accommodating a comprehensive
set of model conditions. For the second objective, our study designs and
thoroughly investigates two popular sampling schemes within the BPEL context.
We demonstrate that the BPEL framework is highly flexible and efficient,
enhancing the adaptability and practicality of EL methods. Our study highlights
the practical advantages of using sampling techniques over traditional
optimization methods for EL problems, showing rapid convergence to the global
optima of posterior distributions and ensuring the effective resolution of
complex statistical inference challenges.
arXiv link: http://arxiv.org/abs/2412.17354v3
Gaussian and Bootstrap Approximation for Matching-based Average Treatment Effect Estimators
rank-matching-based Average Treatment Effect (ATE) estimators. By analyzing
these estimators through the lens of stabilization theory, we employ the
Malliavin-Stein method to derive our results. Our bounds precisely quantify the
impact of key problem parameters, including the number of matches and treatment
balance, on the accuracy of the Gaussian approximation. Additionally, we
develop multiplier bootstrap procedures to estimate the limiting distribution
in a fully data-driven manner, and we leverage the derived Gaussian
approximation results to further obtain bootstrap approximation bounds. Our
work not only introduces a novel theoretical framework for commonly used ATE
estimators, but also provides data-driven methods for constructing
non-asymptotically valid confidence intervals.
arXiv link: http://arxiv.org/abs/2412.17181v1
Competitive Facility Location with Market Expansion and Customer-centric Objective
modeled and predicted using a discrete choice random utility model. The goal is
to strategically place new facilities to maximize the overall captured customer
demand in a competitive marketplace. In this work, we introduce two novel
considerations. First, the total customer demand in the market is not fixed but
is modeled as an increasing function of the customers' total utilities. Second,
we incorporate a new term into the objective function, aiming to balance the
firm's benefits and customer satisfaction. Our new formulation exhibits a
highly nonlinear structure and is not directly solved by existing approaches.
To address this, we first demonstrate that, under a concave market expansion
function, the objective function is concave and submodular, allowing for a
$(1-1/e)$ approximation solution by a simple polynomial-time greedy algorithm.
We then develop a new method, called Inner-approximation, which enables us to
approximate the mixed-integer nonlinear problem (MINLP), with arbitrary
precision, by an MILP without introducing additional integer variables. We
further demonstrate that our inner-approximation method consistently yields
lower approximations than the outer-approximation methods typically used in the
literature. Moreover, we extend our settings by considering a general
(non-concave) market-expansion function and show that the Inner-approximation
mechanism enables us to approximate the resulting MINLP, with arbitrary
precision, by an MILP. To further enhance this MILP, we show how to
significantly reduce the number of additional binary variables by leveraging
concave areas of the objective function. Extensive experiments demonstrate the
efficiency of our approaches.
arXiv link: http://arxiv.org/abs/2412.17021v1
Sharp Results for Hypothesis Testing with Risk-Sensitive Agents
parties, each with their own incentives, private information, and ability to
influence the distributional properties of the data. We study a game-theoretic
version of hypothesis testing in which a statistician, also known as a
principal, interacts with strategic agents that can generate data. The
statistician seeks to design a testing protocol with controlled error, while
the data-generating agents, guided by their utility and prior information,
choose whether or not to opt in based on expected utility maximization. This
strategic behavior affects the data observed by the statistician and,
consequently, the associated testing error. We analyze this problem for general
concave and monotonic utility functions and prove an upper bound on the Bayes
false discovery rate (FDR). Underlying this bound is a form of prior
elicitation: we show how an agent's choice to opt in implies a certain upper
bound on their prior null probability. Our FDR bound is unimprovable in a
strong sense, achieving equality at a single point for an individual agent and
at any countable number of points for a population of agents. We also
demonstrate that our testing protocols exhibit a desirable maximin property
when the principal's utility is considered. To illustrate the qualitative
predictions of our theory, we examine the effects of risk aversion, reward
stochasticity, and signal-to-noise ratio, as well as the implications for the
Food and Drug Administration's testing protocols.
arXiv link: http://arxiv.org/abs/2412.16452v1
Counting Defiers: A Design-Based Model of an Experiment Can Reveal Evidence Beyond the Average Effect
an experiment with a binary intervention and outcome can reveal evidence beyond
the average effect without additional data. Our proposed statistical decision
rule yields a design-based maximum likelihood estimate (MLE) of the joint
distribution of potential outcomes in intervention and control, specified by
the numbers of always takers, compliers, defiers, and never takers in the
sample. With a visualization, we explain why the likelihood varies with the
number of defiers within the Frechet bounds determined by the estimated
marginal distributions. We illustrate how the MLE varies with all possible data
in samples of 50 and 200: when the estimated average effect is positive, the
MLE includes defiers if takeup is below half in control and above half in
intervention, unless takeup is zero in control or full in intervention. Under
optimality conditions, for increasing sample sizes in which exhaustive grid
search is possible, our rule's performance increases relative to a rule that
places equal probability on all numbers of defiers within the estimated Frechet
bounds. We offer insights into effect heterogeneity in two published
experiments with positive, statistically significant average effects on takeup
of desired health behaviors and plausible defiers. Our 95% smallest credible
sets for defiers include zero and the estimated upper Frechet bound,
demonstrating that evidence is weak. Yet, our rule yields no defiers in one
experiment. In the other, our rule yields the estimated upper Frechet bound on
defiers -- a count representing over 18% of the sample.
arXiv link: http://arxiv.org/abs/2412.16352v3
Testing linearity of spatial interaction functions à la Ramsey
spatial interaction function. Such functions arise commonly, either as
practitioner imposed specifications or due to optimizing behaviour by agents.
Our conditional heteroskedasticity robust test is nonparametric, but based on
the Lagrange Multiplier principle and reminiscent of the Ramsey RESET approach.
This entails estimation only under the null hypothesis, which yields an easy to
estimate linear spatial autoregressive model. Monte Carlo simulations show
excellent size control and power. An empirical study with Finnish data
illustrates the test's practical usefulness, shedding light on debates on the
presence of tax competition among neighbouring municipalities.
arXiv link: http://arxiv.org/abs/2412.14778v2
Good Controls Gone Bad: Difference-in-Differences with Covariates
which is necessary to get an unbiased estimate of the ATT when using
time-varying covariates in existing Difference-in-Differences methods. The
two-way CCC assumption implies that the effect of the covariates remain the
same between groups and across time periods. This assumption has been implied
in previous literature, but has not been explicitly addressed. Through
theoretical proofs and a Monte Carlo simulation study, we show that the
standard TWFE and the CS-DID estimators are biased when the two-way CCC
assumption is violated. We propose a new estimator called the Intersection
Difference-in-differences (DID-INT) which can provide an unbiased estimate of
the ATT under two-way CCC violations. DID-INT can also identify the ATT under
heterogeneous treatment effects and with staggered treatment rollout. The
estimator relies on parallel trends of the residuals of the outcome variable,
after appropriately adjusting for covariates. This covariate residualization
can recover parallel trends that are hidden with conventional estimators.
arXiv link: http://arxiv.org/abs/2412.14447v2
An Analysis of the Relationship Between the Characteristics of Innovative Consumers and the Degree of Serious Leisure in User Innovation
and user innovation. We adopted the characteristics of innovative consumers
identified by Luthje (2004)-product use experience, information exchange, and
new product adoption speed-to analyze their correlation with serious leisure
engagement. The analysis utilized consumer behavior survey data from the
"Marketing Analysis Contest 2023" sponsored by Nomura Research Institute,
examining the relationship between innovative consumer characteristics and the
degree of serious leisure (Serious Leisure Inventory and Measure: SLIM). Since
the contest data did not directly measure innovative consumer characteristics
or serious leisure engagement, we established alternative variables for
quantitative analysis. The results showed that the SLIM alternative variable
had positive correlations with diverse product experiences and early adoption
of new products. However, no clear relationship was found with information
exchange among consumers. These findings suggest that serious leisure practice
may serve as a potential antecedent to user innovation. The leisure career
perspective of the serious leisure concept may capture the motivations of user
innovators that Okada and Nishikawa (2019) identified.
arXiv link: http://arxiv.org/abs/2412.13556v2
Dual Interpretation of Machine Learning Forecasts
contributions of predictors. Yet, each out-of-sample prediction can also be
expressed as a linear combination of in-sample values of the predicted
variable, with weights corresponding to pairwise proximity scores between
current and past economic events. While this dual route leads nowhere in some
contexts (e.g., large cross-sectional datasets), it provides sparser
interpretations in settings with many regressors and little training data-like
macroeconomic forecasting. In this case, the sequence of contributions can be
visualized as a time series, allowing analysts to explain predictions as
quantifiable combinations of historical analogies. Moreover, the weights can be
viewed as those of a data portfolio, inspiring new diagnostic measures such as
forecast concentration, short position, and turnover. We show how weights can
be retrieved seamlessly for (kernel) ridge regression, random forest, boosted
trees, and neural networks. Then, we apply these tools to analyze post-pandemic
forecasts of inflation, GDP growth, and recession probabilities. In all cases,
the approach opens the black box from a new angle and demonstrates how machine
learning models leverage history partly repeating itself.
arXiv link: http://arxiv.org/abs/2412.13076v1
Moderating the Mediation Bootstrap for Causal Inference
effects and causal mechanisms. Confidence intervals for indirect effects play a
central role in conducting inference. The problem is non-standard leading to
coverage rates that deviate considerably from their nominal level. The default
inference method in the mediation model is the paired bootstrap, which
resamples directly from the observed data. However, a residual bootstrap that
explicitly exploits the assumed causal structure (X->M->Y) could also be
applied. There is also a debate whether the bias-corrected (BC) bootstrap
method is superior to the percentile method, with the former showing liberal
behavior (actual coverage too low) in certain circumstances. Moreover,
bootstrap methods tend to be very conservative (coverage higher than required)
when mediation effects are small. Finally, iterated bootstrap methods like the
double bootstrap have not been considered due to their high computational
demands. We investigate the issues mentioned in the simple mediation model by a
large-scale simulation. Results are explained using graphical methods and the
newly derived finite-sample distribution. The main findings are: (i)
conservative behavior of the bootstrap is caused by extreme dependence of the
bootstrap distribution's shape on the estimated coefficients (ii) this
dependence leads to counterproductive correction of the the double bootstrap.
The added randomness of the BC method inflates the coverage in the absence of
mediation, but still leads to (invalid) liberal inference when the mediation
effect is small.
arXiv link: http://arxiv.org/abs/2412.11285v1
VAR models with an index structure: A survey with new results
autoregressive index model [MAI], originally proposed by Reinsel (1983), and
their applications to economic and financial time series. MAI has recently
gained momentum because it can be seen as a link between two popular but
distinct multivariate time series approaches: vector autoregressive modeling
[VAR] and the dynamic factor model [DFM]. Indeed, on the one hand, the MAI is a
VAR model with a peculiar reduced-rank structure; on the other hand, it allows
for identification of common components and common shocks in a similar way as
the DFM. The focus is on recent developments of the MAI, which include
extending the original model with individual autoregressive structures,
stochastic volatility, time-varying parameters, high-dimensionality, and
cointegration. In addition, new insights on previous contributions and a novel
model are also provided.
arXiv link: http://arxiv.org/abs/2412.11278v2
Treatment Evaluation at the Intensive and Extensive Margins
selective samples when neither instruments nor parametric assumptions are
available. We provide sharp bounds for average treatment effects under a
conditional monotonicity assumption for all principal strata, i.e. units
characterizing the complete intensive and extensive margins. Most importantly,
we allow for a large share of units whose selection is indifferent to
treatment, e.g. due to non-compliance. The existence of such a population is
crucially tied to the regularity of sharp population bounds and thus
conventional asymptotic inference for methods such as Lee bounds can be
misleading. It can be solved using smoothed outer identification regions for
inference. We provide semiparametrically efficient debiased machine learning
estimators for both regular and smooth bounds that can accommodate
high-dimensional covariates and flexible functional forms. Our study of active
labor market policy reveals the empirical prevalence of the aforementioned
indifference population and supports results from previous impact analysis
under much weaker assumptions.
arXiv link: http://arxiv.org/abs/2412.11179v1
Forecasting realized covariances using HAR-type models
matrices applied to a set of 30 assets that were included in the DJ30 index at
some point, including two novel methods that use existing (univariate) log of
realized variance models that account for attenuation bias and time-varying
parameters. We consider the implications of some modeling choices within the
class of heterogeneous autoregressive models. The following are our key
findings. First, modeling the logs of the marginal volatilities is strongly
preferred over direct modeling of marginal volatility. Thus, our proposed model
that accounts for attenuation bias (for the log-response) provides superior
one-step-ahead forecasts over existing multivariate realized covariance
approaches. Second, accounting for measurement errors in marginal realized
variances generally improves multivariate forecasting performance, but to a
lesser degree than previously found in the literature. Third, time-varying
parameter models based on state-space models perform almost equally well.
Fourth, statistical and economic criteria for comparing the forecasting
performance lead to some differences in the models' rankings, which can
partially be explained by the turbulent post-pandemic data in our out-of-sample
validation dataset using sub-sample analyses.
arXiv link: http://arxiv.org/abs/2412.10791v1
Do LLMs Act as Repositories of Causal Knowledge?
of tasks that previously have not been possible to automate, including some in
science. There is considerable interest in whether LLMs can automate the
process of causal inference by providing the information about causal links
necessary to build a structural model. We use the case of confounding in the
Coronary Drug Project (CDP), for which there are several studies listing
expert-selected confounders that can serve as a ground truth. LLMs exhibit
mediocre performance in identifying confounders in this setting, even though
text about the ground truth is in their training data. Variables that experts
identify as confounders are only slightly more likely to be labeled as
confounders by LLMs compared to variables that experts consider
non-confounders. Further, LLM judgment on confounder status is highly
inconsistent across models, prompts, and irrelevant concerns like
multiple-choice option ordering. LLMs do not yet have the ability to automate
the reporting of causal links.
arXiv link: http://arxiv.org/abs/2412.10635v1
An overview of meta-analytic methods for economic research
individual studies, providing an estimate of the overall effect size for a
specific outcome of interest. The direction and magnitude of this estimate,
along with its confidence interval, offer valuable insights into the underlying
phenomenon or relationship. As an extension of standard meta-analysis,
meta-regression analysis incorporates multiple moderators -- capturing key
study characteristics -- into the model to explain heterogeneity in true effect
sizes across studies. This study provides a comprehensive overview of
meta-analytic procedures tailored to economic research, addressing key
challenges such as between-study heterogeneity, publication bias, and effect
size dependence. It equips researchers with essential tools and insights to
conduct rigorous and informative meta-analyses in economics and related fields.
arXiv link: http://arxiv.org/abs/2412.10608v2
A Neyman-Orthogonalization Approach to the Incidental Parameter Problem
of nuisance parameters is to construct estimating equations that are orthogonal
to the nuisance parameters, in the sense that their expected first derivative
is zero. Such first-order orthogonalization may, however, not suffice when the
nuisance parameters are very imprecisely estimated. Leading examples where this
is the case are models for panel and network data that feature fixed effects.
In this paper, we show how, in the conditional-likelihood setting, estimating
equations can be constructed that are orthogonal to any chosen order. Combining
these equations with sample splitting yields higher-order bias-corrected
estimators of target parameters. In an empirical application we apply our
method to a fixed-effect model of team production and obtain estimates of
complementarity in production and impacts of counterfactual re-allocations.
arXiv link: http://arxiv.org/abs/2412.10304v2
Geometric Deep Learning for Realized Covariance Matrix Forecasting
the inherent Riemannian manifold structure of symmetric positive definite
matrices, treating them as elements of Euclidean space, which can lead to
suboptimal predictive performance. Moreover, they often struggle to handle
high-dimensional matrices. In this paper, we propose a novel approach for
forecasting realized covariance matrices of asset returns using a
Riemannian-geometry-aware deep learning framework. In this way, we account for
the geometric properties of the covariance matrices, including possible
non-linear dynamics and efficient handling of high-dimensionality. Moreover,
building upon a Fr\'echet sample mean of realized covariance matrices, we are
able to extend the HAR model to the matrix-variate. We demonstrate the efficacy
of our approach using daily realized covariance matrices for the 50 most
capitalized companies in the S&P 500 index, showing that our method outperforms
traditional approaches in terms of predictive accuracy.
arXiv link: http://arxiv.org/abs/2412.09517v1
A Kernel Score Perspective on Forecast Disagreement and the Linear Pool
consists of two components: The average variance of the component distributions
(`average uncertainty'), and the average squared difference between the
components' means and the pool's mean (`disagreement'). This paper shows that
similar decompositions hold for a class of uncertainty measures that can be
constructed as entropy functions of kernel scores. The latter are a rich family
of scoring rules that covers point and distribution forecasts for univariate
and multivariate, discrete and continuous settings. We further show that the
disagreement term is useful for understanding the ex-post performance of the
linear pool (as compared to the component distributions), and motivates using
the linear pool instead of other forecast combination techniques. From a
practical perspective, the results in this paper suggest principled measures of
forecast disagreement in a wide range of applied settings.
arXiv link: http://arxiv.org/abs/2412.09430v2
The Global Carbon Budget as a cointegrated system
Earth's global carbon cycle through four annual time series beginning in 1959:
atmospheric CO$_2$ concentrations, anthropogenic CO$_2$ emissions, and CO$_2$
uptake by land and ocean. We analyze these four time series as a multivariate
(cointegrated) system. Statistical tests show that the four time series are
cointegrated with rank three and identify anthropogenic CO$_2$ emissions as the
single stochastic trend driving the nonstationary dynamics of the system. The
three cointegrated relations correspond to the physical relations that the
sinks are linearly related to atmospheric concentrations and that the change in
concentrations equals emissions minus the combined uptake by land and ocean.
Furthermore, likelihood ratio tests show that a parametrically restricted
error-correction model that embodies these physical relations and accounts for
the El Ni\ no/Southern Oscillation cannot be rejected on the data. The model
can be used for both in-sample and out-of-sample analysis. In an application of
the latter, we demonstrate that projections based on this model, using Shared
Socioeconomic Pathways scenarios, yield results consistent with established
climate science.
arXiv link: http://arxiv.org/abs/2412.09226v3
Panel Stochastic Frontier Models with Latent Group Structures
due to their unique feature of including a distinct inefficiency term alongside
the usual error term. To effectively separate these two components, strong
distributional assumptions are often necessary. To overcome this limitation,
numerous studies have sought to relax or generalize these models for more
robust estimation. In line with these efforts, we introduce a latent group
structure that accommodates heterogeneity across firms, addressing not only the
stochastic frontiers but also the distribution of the inefficiency term. This
framework accounts for the distinctive features of stochastic frontier models,
and we propose a practical estimation procedure to implement it. Simulation
studies demonstrate the strong performance of our proposed method, which is
further illustrated through an application to study the cost efficiency of the
U.S. commercial banking sector.
arXiv link: http://arxiv.org/abs/2412.08831v2
Machine Learning the Macroeconomic Effects of Financial Shocks
shocks using neural networks, and apply it to uncover the effects of US
financial shocks. The results reveal substantial asymmetries with respect to
the sign of the shock. Adverse financial shocks have powerful effects on the US
economy, while benign shocks trigger much smaller reactions. Instead, with
respect to the size of the shocks, we find no discernible asymmetries.
arXiv link: http://arxiv.org/abs/2412.07649v1
Inference after discretizing time-varying unobserved heterogeneity
become increasingly popular in economics. Yet, provably valid post-clustering
inference for target parameters in models that do not impose an exact group
structure is still lacking. This paper fills this gap in the leading case of a
linear panel data model with nonseparable two-way unobserved heterogeneity.
Building on insights from the double machine learning literature, we propose a
simple inference procedure based on a bias-reducing moment. Asymptotic theory
and simulations suggest excellent performance. In the application on fiscal
policy we revisit, the novel approach yields conclusions in line with economic
theory.
arXiv link: http://arxiv.org/abs/2412.07352v3
Automatic Doubly Robust Forests
algorithm for estimating the conditional expectation of a moment functional in
the presence of high-dimensional nuisance functions. DRRF extends the automatic
debiasing framework based on the Riesz representer to the conditional setting
and enables nonparametric, forest-based estimation (Athey et al., 2019; Oprescu
et al., 2019). In contrast to existing methods, DRRF does not require prior
knowledge of the form of the debiasing term or impose restrictive parametric or
semi-parametric assumptions on the target quantity. Additionally, it is
computationally efficient in making predictions at multiple query points. We
establish consistency and asymptotic normality results for the DRRF estimator
under general assumptions, allowing for the construction of valid confidence
intervals. Through extensive simulations in heterogeneous treatment effect
(HTE) estimation, we demonstrate the superior performance of DRRF over
benchmark approaches in terms of estimation accuracy, robustness, and
computational efficiency.
arXiv link: http://arxiv.org/abs/2412.07184v2
Large Language Models: An Applied Econometric Framework
empirical research? And how can we do so while accounting for their
limitations, which are themselves only poorly understood? We develop an
econometric framework to answer this question that distinguishes between two
types of empirical tasks. Using LLMs for prediction problems (including
hypothesis generation) is valid under one condition: no “leakage” between the
LLM's training dataset and the researcher's sample. No leakage can be ensured
by using open-source LLMs with documented training data and published weights.
Using LLM outputs for estimation problems to automate the measurement of some
economic concept (expressed either by some text or from human subjects)
requires the researcher to collect at least some validation data: without such
data, the errors of the LLM's automation cannot be assessed and accounted for.
As long as these steps are taken, LLM outputs can be used in empirical research
with the familiar econometric guarantees we desire. Using two illustrative
applications to finance and political economy, we find that these requirements
are stringent; when they are violated, the limitations of LLMs now result in
unreliable empirical estimates. Our results suggest the excitement around the
empirical uses of LLMs is warranted -- they allow researchers to effectively
use even small amounts of language data for both prediction and estimation --
but only with these safeguards in place.
arXiv link: http://arxiv.org/abs/2412.07031v2
Probabilistic Targeted Factor Analysis
Probabilistic Targeted Factor Analysis (PTFA), which can be used to extract
common factors in predictors that are useful to predict a set of predetermined
target variables. Along with the technique, we provide an efficient
expectation-maximization (EM) algorithm to learn the parameters and forecast
the targets of interest. We develop a number of extensions to missing-at-random
data, stochastic volatility, factor dynamics, and mixed-frequency data for
real-time forecasting. In a simulation exercise, we show that PTFA outperforms
PLS at recovering the common underlying factors affecting both features and
target variables delivering better in-sample fit, and providing valid forecasts
under contamination such as measurement error or outliers. Finally, we provide
three applications in Economics and Finance where PTFA outperforms compared
with PLS and Principal Component Analysis (PCA) at out-of-sample forecasting.
arXiv link: http://arxiv.org/abs/2412.06688v3
Density forecast transformations
individual predictions do not contain information on cross-horizon dependence.
However, this dependence is needed if the forecaster has to construct, based on
$direct$ density forecasts, predictive objects that are functions of several
horizons ($e.g.$ when constructing annual-average growth rates from
quarter-on-quarter growth rates). To address this issue we propose to use
copulas to combine the individual $h$-step-ahead predictive distributions into
a joint predictive distribution. Our method is particularly appealing to
practitioners for whom changing the $direct$ forecasting specification is too
costly. In a Monte Carlo study, we demonstrate that our approach leads to a
better approximation of the true density than an approach that ignores the
potential dependence. We show the superior performance of our method in several
empirical examples, where we construct (i) quarterly forecasts using
month-on-month $direct$ forecasts, (ii) annual-average forecasts using monthly
year-on-year $direct$ forecasts, and (iii) annual-average forecasts using
quarter-on-quarter $direct$ forecasts.
arXiv link: http://arxiv.org/abs/2412.06092v1
Estimating Spillover Effects in the Presence of Isolated Nodes
often use linear regression with either the number or fraction of treated
neighbors as regressors. An often overlooked fact is that the latter is
undefined for units without neighbors (“isolated nodes"). The common practice
is to impute this fraction as zero for isolated nodes. This paper shows that
such practice introduces bias through theoretical derivations and simulations.
Causal interpretations of the commonly used spillover regression coefficients
are also provided.
arXiv link: http://arxiv.org/abs/2412.05919v1
Bundle Choice Model with Endogenous Regressors: An Application to Soda Tax
estimate joint consumption as well as the substitutability and complementarity
of multiple goods in the presence of endogenous regressors. The model extends
the two primary treatments of endogeneity in existing bundle choice models: (1)
endogenous market-level prices and (2) time-invariant unobserved individual
heterogeneity. A Bayesian sparse factor approach is employed to capture
high-dimensional error correlations that induce taste correlation and
endogeneity. Time-varying factor loadings allow for more general
individual-level and time-varying heterogeneity and endogeneity, while the
sparsity induced by the shrinkage prior on loadings balances flexibility with
parsimony. Applied to a soda tax in the context of complementarities, the new
approach captures broader effects of the tax that were previously overlooked.
Results suggest that a soda tax could yield additional health benefits by
marginally decreasing the consumption of salty snacks along with sugary drinks,
extending the health benefits beyond the reduction in sugar consumption alone.
arXiv link: http://arxiv.org/abs/2412.05794v1
Convolution Mode Regression
often fail to capture the central tendencies in the data. Despite being a
viable alternative, estimating the conditional mode given certain covariates
(or mode regression) presents significant challenges. Nonparametric approaches
suffer from the "curse of dimensionality", while semiparametric strategies
often lead to non-convex optimization problems. In order to avoid these issues,
we propose a novel mode regression estimator that relies on an intermediate
step of inverting the conditional quantile density. In contrast to existing
approaches, we employ a convolution-type smoothed variant of the quantile
regression. Our estimator converges uniformly over the design points of the
covariates and, unlike previous quantile-based mode regressions, is uniform
with respect to the smoothing bandwidth. Additionally, the Convolution Mode
Regression is dimension-free, carries no issues regarding optimization and
preliminary simulations suggest the estimator is normally distributed in finite
samples.
arXiv link: http://arxiv.org/abs/2412.05736v1
Property of Inverse Covariance Matrix-based Financial Adjacency Matrix for Detecting Local Groups
that are modeled by a multi-level factor model. When detecting unknown local
group memberships under such a model, employing a covariance matrix as an
adjacency matrix for local group memberships is inadequate due to the
predominant effect of global factors. Thus, to detect a local group structure
more effectively, this study introduces an inverse covariance matrix-based
financial adjacency matrix (IFAM) that utilizes negative values of the inverse
covariance matrix. We show that IFAM ensures that the edge density between
different groups vanishes, while that within the same group remains
non-vanishing. This reduces falsely detected connections and helps identify
local group membership accurately. To estimate IFAM under the multi-level
factor model, we introduce a factor-adjusted GLASSO estimator to address the
prevalent global factor effect in the inverse covariance matrix. An empirical
study using returns from international stocks across 20 financial markets
demonstrates that incorporating IFAM effectively detects latent local groups,
which helps improve the minimum variance portfolio allocation performance.
arXiv link: http://arxiv.org/abs/2412.05664v1
Minimum Sliced Distance Estimation in a Class of Nonregular Econometric Models
econometric models with possibly parameter-dependent supports. In contrast to
likelihood-based estimation, we show that under mild regularity conditions, the
minimum sliced distance estimator is asymptotically normally distributed
leading to simple inference regardless of the presence/absence of parameter
dependent supports. We illustrate the performance of our estimator on an
auction model.
arXiv link: http://arxiv.org/abs/2412.05621v1
Optimizing Returns from Experimentation Programs
making. Specifically, the goal of many experiments is to optimize a metric of
interest. Null hypothesis statistical testing can be ill-suited to this task,
as it is indifferent to the magnitude of effect sizes and opportunity costs.
Given access to a pool of related past experiments, we discuss how
experimentation practice should change when the goal is optimization. We survey
the literature on empirical Bayes analyses of A/B test portfolios, and single
out the A/B Testing Problem (Azevedo et al., 2020) as a starting point, which
treats experimentation as a constrained optimization problem. We show that the
framework can be solved with dynamic programming and implemented by
appropriately tuning $p$-value thresholds. Furthermore, we develop several
extensions of the A/B Testing Problem and discuss the implications of these
results on experimentation programs in industry. For example, under no-cost
assumptions, firms should be testing many more ideas, reducing test allocation
sizes, and relaxing $p$-value thresholds away from $p = 0.05$.
arXiv link: http://arxiv.org/abs/2412.05508v1
Linear Regressions with Combined Data
and some of the covariates are observed in two different datasets that cannot
be matched. Traditional approaches obtain point identification by relying,
often implicitly, on exclusion restrictions. We show that without such
restrictions, coefficients of interest can still be partially identified and we
derive a constructive characterization of the sharp identified set. We then
build on this characterization to develop computationally simple and
asymptotically normal estimators of the corresponding bounds. We show that
these estimators exhibit good finite sample performances.
arXiv link: http://arxiv.org/abs/2412.04816v1
Semiparametric Bayesian Difference-in-Differences
treatment effect on the treated (ATT) within the difference-in-differences
(DiD) research design. We propose two new Bayesian methods with frequentist
validity. The first one places a standard Gaussian process prior on the
conditional mean function of the control group. The second method is a double
robust Bayesian procedure that adjusts the prior distribution of the
conditional mean function and subsequently corrects the posterior distribution
of the resulting ATT. We prove new semiparametric Bernstein-von Mises (BvM)
theorems for both proposals. Monte Carlo simulations and an empirical
application demonstrate that the proposed Bayesian DiD methods exhibit strong
finite-sample performance compared to existing frequentist methods. We also
present extensions of the canonical DiD approach, incorporating both the
staggered design and the repeated cross-sectional design.
arXiv link: http://arxiv.org/abs/2412.04605v3
Large Volatility Matrix Prediction using Tensor Factor Structure
developed based on high-dimensional factor-based It\^o processes. These methods
often impose restrictions to reduce the model complexity, such as constant
eigenvectors or factor loadings over time. However, several studies indicate
that eigenvector processes are also time-varying. To address this feature, this
paper generalizes the factor structure by representing the integrated
volatility matrix process as a cubic (order-3 tensor) form, which is decomposed
into low-rank tensor and idiosyncratic tensor components. To predict
conditional expected large volatility matrices, we propose the Projected Tensor
Principal Orthogonal componEnt Thresholding (PT-POET) procedure and establish
its asymptotic properties. The advantages of PT-POET are validated through a
simulation study and demonstrated in an application to minimum variance
portfolio allocation using high-frequency trading data.
arXiv link: http://arxiv.org/abs/2412.04293v2
On Extrapolation of Treatment Effects in Multiple-Cutoff Regression Discontinuity Designs
multiple-cutoff regression discontinuity designs. Using a microeconomic model,
we demonstrate that the parallel-trend type assumption proposed in the
literature is justified when cutoff positions are assigned as if randomly and
the running variable is non-manipulable (e.g., parental income). However, when
the running variable is partially manipulable (e.g., test scores),
extrapolations based on that assumption can be biased. As a complementary
strategy, we propose a novel partial identification approach based on
empirically motivated assumptions. We also develop a uniform inference
procedure and provide two empirical illustrations.
arXiv link: http://arxiv.org/abs/2412.04265v3
Endogenous Heteroskedasticity in Linear Models
effects. This paper studies a framework that involves two common issues:
endogeneity of the regressors and heteroskedasticity that depends on endogenous
regressors-i.e., endogenous heteroskedasticity. We show that the presence of
endogenous heteroskedasticity in the structural regression renders the
two-stage least squares estimator inconsistent. To address this issue, we
propose sufficient conditions and a control function approach to identify and
estimate the causal parameters of interest. We establish the limiting
properties of the estimator--namely, consistency and asymptotic normality--and
propose inference procedures. Monte Carlo simulations provide evidence on the
finite-sample performance of the proposed methods and evaluate different
implementation strategies. We revisit an empirical application on job training
to illustrate the methods.
arXiv link: http://arxiv.org/abs/2412.02767v3
A Markowitz Approach to Managing a Dynamic Basket of Moving-Band Statistical Arbitrages
arbitrages (MBSAs), inspired by the Markowitz optimization framework. We show
how to manage a dynamic basket of MBSAs, and illustrate the method on recent
historical data, showing that it can perform very well in terms of
risk-adjusted return, essentially uncorrelated with the market.
arXiv link: http://arxiv.org/abs/2412.02660v1
Simple and Effective Portfolio Construction with Crypto Assets
financial assets with crypto assets. We show that despite the documented
attributes of crypto assets, such as high volatility, heavy tails, excess
kurtosis, and skewness, a simple extension of traditional risk allocation
provides robust solutions for integrating these emerging assets into broader
investment strategies. Examination of the risk allocation holdings suggests an
even simpler method, analogous to the traditional 60/40 stocks/bonds
allocation, involving a fixed allocation to crypto and traditional assets,
dynamically diluted with cash to achieve a target risk level.
arXiv link: http://arxiv.org/abs/2412.02654v1
Use of surrogate endpoints in health technology assessment: a review of selected NICE technology appraisals in oncology
clinical outcomes, are increasingly being used to support submissions to health
technology assessment agencies. The increase in use of surrogate endpoints has
been accompanied by literature describing frameworks and statistical methods to
ensure their robust validation. The aim of this review was to assess how
surrogate endpoints have recently been used in oncology technology appraisals
by the National Institute for Health and Care Excellence (NICE) in England and
Wales.
Methods: This paper identified technology appraisals in oncology published by
NICE between February 2022 and May 2023. Data are extracted on methods for the
use and validation of surrogate endpoints.
Results: Of the 47 technology appraisals in oncology available for review, 18
(38 percent) utilised surrogate endpoints, with 37 separate surrogate endpoints
being discussed. However, the evidence supporting the validity of the surrogate
relationship varied significantly across putative surrogate relationships with
11 providing RCT evidence, 7 providing evidence from observational studies, 12
based on clinical opinion and 7 providing no evidence for the use of surrogate
endpoints.
Conclusions: This review supports the assertion that surrogate endpoints are
frequently used in oncology technology appraisals in England and Wales. Despite
increasing availability of statistical methods and guidance on appropriate
validation of surrogate endpoints, this review highlights that use and
validation of surrogate endpoints can vary between technology appraisals which
can lead to uncertainty in decision-making.
arXiv link: http://arxiv.org/abs/2412.02380v2
Selective Reviews of Bandit Problems in AI via a Statistical View
intelligence that focuses on teaching agents decision-making through
interactions with their environment. A key subset includes stochastic
multi-armed bandit (MAB) and continuum-armed bandit (SCAB) problems, which
model sequential decision-making under uncertainty. This review outlines the
foundational models and assumptions of bandit problems, explores non-asymptotic
theoretical tools like concentration inequalities and minimax regret bounds,
and compares frequentist and Bayesian algorithms for managing
exploration-exploitation trade-offs. Additionally, we explore K-armed
contextual bandits and SCAB, focusing on their methodologies and regret
analyses. We also examine the connections between SCAB problems and functional
data analysis. Finally, we highlight recent advances and ongoing challenges in
the field.
arXiv link: http://arxiv.org/abs/2412.02251v3
Endogenous Interference in Randomized Experiments
in randomized controlled trials with social interactions. Two key network
features characterize the setting and introduce endogeneity: (1) latent
variables may affect both network formation and outcomes, and (2) the
intervention may alter network structure, mediating treatment effects. I make
three contributions. First, I define parameters within a post-treatment network
framework, distinguishing direct effects of treatment from indirect effects
mediated through changes in network structure. I provide a causal
interpretation of the coefficients in a linear outcome model. For estimation
and inference, I focus on a specific form of peer effects, represented by the
fraction of treated friends. Second, in the absence of endogeneity, I establish
the consistency and asymptotic normality of ordinary least squares estimators.
Third, if endogeneity is present, I propose addressing it through shift-share
instrumental variables, demonstrating the consistency and asymptotic normality
of instrumental variable estimators in relatively sparse networks. For denser
networks, I propose a denoised estimator based on eigendecomposition to restore
consistency. Finally, I revisit Prina (2015) as an empirical illustration,
demonstrating that treatment can influence outcomes both directly and through
network structure changes.
arXiv link: http://arxiv.org/abs/2412.02183v1
A Dimension-Agnostic Bootstrap Anderson-Rubin Test For Instrumental Variable Regressions
are typically developed separately depending on whether the number of IVs is
treated as fixed or increasing with the sample size, forcing researchers to
make a stance on the asymptotic behavior, which is often ambiguous in practice.
This paper proposes a bootstrap-based, dimension-agnostic Anderson-Rubin (AR)
test that achieves correct asymptotic size regardless of whether the number of
IVs is fixed or diverging, and even accommodates cases where the number of IVs
exceeds the sample size. By incorporating ridge regularization, our approach
reduces the effective rank of the projection matrix and yields regimes where
the limiting distribution of the AR statistic can be a weighted chi-squared, a
normal, or a mixture of the two. Strong approximation results ensure that the
bootstrap procedure remains uniformly valid across all regimes, while also
delivering substantial power gains over existing methods by exploiting rank
reduction.
arXiv link: http://arxiv.org/abs/2412.01603v2
From rotational to scalar invariance: Enhancing identifiability in score-driven factor models
used inverse square-root of the conditional Fisher Information, score-driven
factor models are identifiable up to a multiplicative scalar constant under
very mild restrictions. This result has no analogue in parameter-driven models,
as it exploits the different structure of the score-driven factor dynamics.
Consequently, score-driven models offer a clear advantage in terms of economic
interpretability compared to parameter-driven factor models, which are
identifiable only up to orthogonal transformations. Our restrictions are
order-invariant and can be generalized to scoredriven factor models with
dynamic loadings and nonlinear factor models. We test extensively the
identification strategy using simulated and real data. The empirical analysis
on financial and macroeconomic data reveals a substantial increase of
log-likelihood ratios and significantly improved out-of-sample forecast
performance when switching from the classical restrictions adopted in the
literature to our more flexible specifications.
arXiv link: http://arxiv.org/abs/2412.01367v1
Locally robust semiparametric estimation of sample selection models without exclusion restrictions
selection models rely heavily on exclusion restrictions. However, it is
difficult in practice to find a credible excluded variable that has a
correlation with selection but no correlation with the outcome. In this paper,
we establish a new identification result for a semiparametric sample selection
model without the exclusion restriction. The key identifying assumptions are
nonlinearity on the selection equation and linearity on the outcome equation.
The difference in the functional form plays the role of an excluded variable
and provides identification power. According to the identification result, we
propose to estimate the model by a partially linear regression with a
nonparametrically generated regressor. To accommodate modern machine learning
methods in generating the regressor, we construct an orthogonalized moment by
adding the first-step influence function and develop a locally robust estimator
by solving the cross-fitted orthogonalized moment condition. We prove
root-n-consistency and asymptotic normality of the proposed estimator under
mild regularity conditions. A Monte Carlo simulation shows the satisfactory
performance of the estimator in finite samples, and an application to wage
regression illustrates its usefulness in the absence of exclusion restrictions.
arXiv link: http://arxiv.org/abs/2412.01208v1
Iterative Distributed Multinomial Regression
multinomial logistic regression model with large choice sets. Compared to the
maximum likelihood estimator, the proposed iterative distributed estimator
achieves significantly faster computation and, when initialized with a
consistent estimator, attains asymptotic efficiency under a weak dominance
condition. Additionally, we propose a parametric bootstrap inference procedure
based on the iterative distributed estimator and establish its consistency.
Extensive simulation studies validate the effectiveness of the proposed methods
and highlight the computational efficiency of the iterative distributed
estimator.
arXiv link: http://arxiv.org/abs/2412.01030v1
Optimization of Delivery Routes for Fresh E-commerce in Pre-warehouse Mode
rapid growth. One of the core competitive advantages of fresh food e-commerce
platforms lies in selecting an appropriate logistics distribution model. This
study focuses on the front warehouse model, aiming to minimize distribution
costs. Considering the perishable nature and short shelf life of fresh food, a
distribution route optimization model is constructed, and the saving mileage
method is designed to determine the optimal distribution scheme. The results
indicate that under certain conditions, different distribution schemes
significantly impact the performance of fresh food e-commerce platforms. Based
on a review of domestic and international research, this paper takes Dingdong
Maicai as an example to systematically introduce the basic concepts of
distribution route optimization in fresh food e-commerce platforms under the
front warehouse model, analyze the advantages of logistics distribution, and
thoroughly examine the importance of distribution routes for fresh products.
arXiv link: http://arxiv.org/abs/2412.00634v1
Peer Effects and Herd Behavior: An Empirical Study Based on the "Double 11" Shopping Festival
effects and herd behavior among consumers during the "Double 11" shopping
festival, using data collected through a questionnaire survey. The results
demonstrate that peer effects significantly influence consumer decision-making,
with the probability of participation in the shopping event increasing notably
when roommates are involved. Additionally, factors such as gender, online
shopping experience, and fashion consciousness significantly impact consumers'
herd behavior. This research not only enhances the understanding of online
shopping behavior among college students but also provides empirical evidence
for e-commerce platforms to formulate targeted marketing strategies. Finally,
the study discusses the fragility of online consumption activities, the need
for adjustments in corporate marketing strategies, and the importance of
promoting a healthy online culture.
arXiv link: http://arxiv.org/abs/2412.00233v1
Canonical correlation analysis of stochastic trends via functional approximation
number $s$ of common trends and their loading matrix $\psi$ in $I(1)/I(0)$
systems. It combines functional approximation of limits of random walks and
canonical correlations analysis, performed between the $p$ observed time series
of length $T$ and the first $K$ discretized elements of an $L^2$ basis. Tests
and selection criteria on $s$, and estimators and tests on $\psi$ are proposed;
their properties are discussed as $T$ and $K$ diverge sequentially for fixed
$p$ and $s$. It is found that tests on $s$ are asymptotically pivotal,
selection criteria of $s$ are consistent, estimators of $\psi$ are
$T$-consistent, mixed-Gaussian and efficient, so that Wald tests on $\psi$ are
asymptotically Normal or $\chi^2$. The paper also discusses asymptotically
pivotal misspecification tests for checking model assumptions. The approach can
be coherently applied to subsets or aggregations of variables in a given panel.
Monte Carlo simulations show that these tools have reasonable performance for
$T\geq 10 p$ and $p\leq 300$. An empirical analysis of 20 exchange rates
illustrates the methods.
arXiv link: http://arxiv.org/abs/2411.19572v2
Warfare Ignited Price Contagion Dynamics in Early Modern Europe
dynamics during periods of warfare and global stress, but there is a lack of
model-based evidence on these phenomena. This paper uses an econometric
contagion model, the Diebold-Yilmaz framework, to examine the dynamics of
economic shocks across European markets in the early modern period. Our
findings suggest that key periods of violent conflicts significantly increased
food price spillover across cities, causing widespread disruptions across
Europe. We also demonstrate the ability of this framework to capture relevant
historical dynamics between the main trade centers of the period.
arXiv link: http://arxiv.org/abs/2411.18978v3
Contrasting the optimal resource allocation to cybersecurity and cyber insurance using prospect theory versus expected utility theory
done by investing in cybersecurity controls and purchasing cyber insurance.
However, these are interlinked since insurance premiums could be reduced by
investing more in cybersecurity controls. The expected utility theory and the
prospect theory are two alternative theories explaining decision-making under
risk and uncertainty, which can inform strategies for optimizing resource
allocation. While the former is considered a rational approach, research has
shown that most people make decisions consistent with the latter, including on
insurance uptakes. We compare and contrast these two approaches to provide
important insights into how the two approaches could lead to different optimal
allocations resulting in differing risk exposure as well as financial costs. We
introduce the concept of a risk curve and show that identifying the nature of
the risk curve is a key step in deriving the optimal resource allocation.
arXiv link: http://arxiv.org/abs/2411.18838v1
Difference-in-differences Design with Outcomes Missing Not at Random
political scientists working with difference-in-differences (DID) design:
missingness in panel data. A common practice for handling missing data, known
as complete case analysis, is to drop cases with any missing values over time.
A more principled approach involves using nonparametric bounds on causal
effects or applying inverse probability weighting based on baseline covariates.
Yet, these methods are general remedies that often under-utilize the
assumptions already imposed on panel structure for causal identification. In
this paper, I outline the pitfalls of complete case analysis and propose an
alternative identification strategy based on principal strata. To be specific,
I impose parallel trends assumption within each latent group that shares the
same missingness pattern (e.g., always-respondents, if-treated-respondents) and
leverage missingness rates over time to estimate the proportions of these
groups. Building on this, I tailor Lee bounds, a well-known nonparametric
bounds under selection bias, to partially identify the causal effect within the
DID design. Unlike complete case analysis, the proposed method does not require
independence between treatment selection and missingness patterns, nor does it
assume homogeneous effects across these patterns.
arXiv link: http://arxiv.org/abs/2411.18772v1
Autoencoder Enhanced Realised GARCH on Volatility Forecasting
forecasting due to its ability to capture intraday price fluctuations. With a
growing variety of realised volatility estimators, each with unique advantages
and limitations, selecting an optimal estimator may introduce challenges. In
this thesis, aiming to synthesise the impact of various realised volatility
measures on volatility forecasting, we propose an extension of the Realised
GARCH model that incorporates an autoencoder-generated synthetic realised
measure, combining the information from multiple realised measures in a
nonlinear manner. Our proposed model extends existing linear methods, such as
Principal Component Analysis and Independent Component Analysis, to reduce the
dimensionality of realised measures. The empirical evaluation, conducted across
four major stock markets from January 2000 to June 2022 and including the
period of COVID-19, demonstrates both the feasibility of applying an
autoencoder to synthesise volatility measures and the superior effectiveness of
the proposed model in one-step-ahead rolling volatility forecasting. The model
exhibits enhanced flexibility in parameter estimations across each rolling
window, outperforming traditional linear approaches. These findings indicate
that nonlinear dimension reduction offers further adaptability and flexibility
in improving the synthetic realised measure, with promising implications for
future volatility forecasting applications.
arXiv link: http://arxiv.org/abs/2411.17136v1
Normal Approximation for U-Statistics with Cross-Sectional Dependence
theorems for both non-degenerate and degenerate U-statistics with
cross-sectionally dependent samples using Stein's method. For the
non-degenerate case, our results extend recent studies on the asymptotic
properties of sums of cross-sectionally dependent random variables. The
degenerate case is more challenging due to the additional dependence induced by
the nonlinearity of the U-statistic kernel. Through a specific implementation
of Stein's method, we derive convergence rates under conditions on the mixing
rate, the sparsity of the cross-sectional dependence structure, and the moments
of the U-statistic kernel. Finally, we demonstrate the application of our
theoretical results with a nonparametric specification test for data with
cross-sectional dependence.
arXiv link: http://arxiv.org/abs/2411.16978v2
Anomaly Detection in California Electricity Price Forecasting: Enhancing Accuracy and Reliability Using Principal Component Analysis
implications for grid management, renewable energy integration, power system
planning, and price volatility management. This study focuses on enhancing
electricity price forecasting in California's grid, addressing challenges from
complex generation data and heteroskedasticity. Utilizing principal component
analysis (PCA), we analyze CAISO's hourly electricity prices and demand from
2016-2021 to improve day-ahead forecasting accuracy. Initially, we apply
traditional outlier analysis with the interquartile range method, followed by
robust PCA (RPCA) for more effective outlier elimination. This approach
improves data symmetry and reduces skewness. We then construct multiple linear
regression models using both raw and PCA-transformed features. The model with
transformed features, refined through traditional and SAS Sparse Matrix outlier
removal methods, shows superior forecasting performance. The SAS Sparse Matrix
method, in particular, significantly enhances model accuracy. Our findings
demonstrate that PCA-based methods are key in advancing electricity price
forecasting, supporting renewable integration and grid management in day-ahead
markets.
Keywords: Electricity price forecasting, principal component analysis (PCA),
power system planning, heteroskedasticity, renewable energy integration.
arXiv link: http://arxiv.org/abs/2412.07787v1
A Binary IV Model for Persuasion: Profiling Persuasion Types among Compliers
instrument to encourage individuals to consume information and take some
action. We show that, with a binary Imbens-Angrist instrumental variable model
and the monotone treatment response assumption, it is possible to identify the
joint distribution of potential outcomes among compliers. This is necessary to
identify the percentage of mobilised voters and their statistical
characteristic defined by the moments of the joint distribution of treatment
and covariates. Specifically, we develop a method that enables researchers to
identify the statistical characteristic of persuasion types: always-voters,
never-voters, and mobilised voters among compliers. These findings extend the
kappa weighting results in Abadie (2003). We also provide a sharp test for the
two sets of identification assumptions. The test boils down to testing whether
there exists a nonnegative solution to a possibly under-determined system of
linear equations with known coefficients. An application based on Green et al.
(2003) is provided.
arXiv link: http://arxiv.org/abs/2411.16906v2
A Supervised Machine Learning Approach for Assessing Grant Peer Review Reports
of peer review reports are rarely analyzed. In this work, we develop a
thoroughly tested pipeline to analyze the texts of grant peer review reports
using methods from applied Natural Language Processing (NLP) and machine
learning. We start by developing twelve categories reflecting content of grant
peer review reports that are of interest to research funders. This is followed
by multiple human annotators' iterative annotation of these categories in a
novel text corpus of grant peer review reports submitted to the Swiss National
Science Foundation. After validating the human annotation, we use the annotated
texts to fine-tune pre-trained transformer models to classify these categories
at scale, while conducting several robustness and validation checks. Our
results show that many categories can be reliably identified by human
annotators and machine learning approaches. However, the choice of text
classification approach considerably influences the classification performance.
We also find a high correspondence between out-of-sample classification
performance and human annotators' perceived difficulty in identifying
categories. Our results and publicly available fine-tuned transformer models
will allow researchers and research funders and anybody interested in peer
review to examine and report on the contents of these reports in a structured
manner. Ultimately, we hope our approach can contribute to ensuring the quality
and trustworthiness of grant peer review.
arXiv link: http://arxiv.org/abs/2411.16662v2
When Is Heterogeneity Actionable for Personalization?
the uniform policy that assigns the best performing treatment in an A/B test to
everyone. Personalization relies on the presence of heterogeneity of treatment
effects, yet, as we show in this paper, heterogeneity alone is not sufficient
for personalization to be successful. We develop a statistical model to
quantify "actionable heterogeneity," or the conditions when personalization is
likely to outperform the best uniform policy. We show that actionable
heterogeneity can be visualized as crossover interactions in outcomes across
treatments and depends on three population-level parameters: within-treatment
heterogeneity, cross-treatment correlation, and the variation in average
responses. Our model can be used to predict the expected gain from
personalization prior to running an experiment and also allows for sensitivity
analysis, providing guidance on how changing treatments can affect the
personalization gain. To validate our model, we apply five common
personalization approaches to two large-scale field experiments with many
interventions that encouraged flu vaccination. We find an 18% gain from
personalization in one and a more modest 4% gain in the other, which is
consistent with our model. Counterfactual analysis shows that this difference
in the gains from personalization is driven by a drastic difference in
within-treatment heterogeneity. However, reducing cross-treatment correlation
holds a larger potential to further increase personalization gains. Our
findings provide a framework for assessing the potential from personalization
and offer practical recommendations for improving gains from targeting in
multi-intervention settings.
arXiv link: http://arxiv.org/abs/2411.16552v1
What events matter for exchange rate volatility ?
method to select the macroeconomic events most likely to impact volatility. The
paper identifies and quantifies the effects of macroeconomic events across
multiple countries on exchange rate volatility using high-frequency currency
returns, while accounting for persistent stochastic volatility effects and
seasonal components capturing time-of-day patterns. Given the hundreds of
macroeconomic announcements and their lags, we rely on sparsity-based methods
to select relevant events for the model. We contribute to the exchange rate
literature in four ways: First, we identify the macroeconomic events that drive
currency volatility, estimate their effects and connect them to macroeconomic
fundamentals. Second, we find a link between intraday seasonality, trading
volume, and the opening hours of major markets across the globe. We provide a
simple labor-based explanation for this observed pattern. Third, we show that
including macroeconomic events and seasonal components is crucial for
forecasting exchange rate volatility. Fourth, our proposed model yields the
lowest volatility and highest Sharpe ratio in portfolio allocations when
compared to standard SV and GARCH models.
arXiv link: http://arxiv.org/abs/2411.16244v1
Ranking probabilistic forecasting models with different loss functions
on the pinball loss and the empirical coverage, for the ranking of
probabilistic forecasting models. We tested the ability of the proposed metrics
to determine the top performing forecasting model and investigated the use of
which metric corresponds to the highest average per-trade profit in the
out-of-sample period. Our findings show that for the considered trading
strategy, ranking the forecasting models according to the coverage of quantile
forecasts used in the trading hours exhibits a superior economic performance.
arXiv link: http://arxiv.org/abs/2411.17743v1
Homeopathic Modernization and the Middle Science Trap: conceptual context of ergonomics, econometrics and logic of some national scientific case
development of scientific systems in transition economies, such as Kazakhstan.
The main focus is on the concept of the "middle science trap," which is
characterized by steady growth in quantitative indicators (publications,
grants) but a lack of qualitative advancement. Excessive bureaucracy, weak
integration into the international scientific community, and ineffective
science management are key factors limiting development. This paper proposes an
approach of "homeopathic modernization," which focuses on minimal yet
strategically significant changes aimed at reducing bureaucratic barriers and
enhancing the effectiveness of the scientific ecosystem. A comparative analysis
of international experience (China, India, and the European Union) is provided,
demonstrating how targeted reforms in the scientific sector can lead to
significant results. Social and cultural aspects, including the influence of
mentality and institutional structure, are also examined, and practical
recommendations for reforming the scientific system in Kazakhstan and Central
Asia are offered. The conclusions of the article could be useful for developing
national science modernization programs, particularly in countries with high
levels of bureaucracy and conservatism.
arXiv link: http://arxiv.org/abs/2411.15996v1
Utilization and Profitability of Tractor Services for Maize Farming in Ejura-Sekyedumase Municipality, Ghana
Unfortunately, farmers usually do not obtain the expected returns on their
investment due to reliance on rudimentary, labor-intensive, and inefficient
methods of production. Using cross-sectional data from 359 maize farmers, this
study investigates the profitability and determinants of the use of tractor
services for maize production in Ejura-Sekyedumase, Ashanti Region of Ghana.
Results from descriptive and profitability analyses reveal that tractor
services such as ploughing and shelling are widely used, but their
profitability varies significantly among farmers. Key factors influencing
profitability include farm size, fertilizer quantity applied, and farmer
experience. Results from a multivariate probit analysis also showed that
farming experience, fertilizer quantity, and profit per acre have a positive
influence on tractor service use for shelling, while household size, farm size,
and FBO have a negative influence. Farming experience, fertilizer quantity, and
profit per acre positively influence tractor service use for ploughing, while
farm size has a negative influence. A t-test result reveals a statistically
significant difference in profit between farmers who use tractor services and
those who do not. Specifically, farmers who utilize tractor services on their
maize farm had a return to cost of 9 percent more than those who do not
(p-value < 0.05). The Kendall's result showed a moderate agreement among the
maize farmers (first ranked being financial issues) in their ability to
access/utilize tractor services on their farm.
arXiv link: http://arxiv.org/abs/2411.15797v2
Canonical Correlation Analysis: review
have been studied across various fields, with contributions dating back to
Jordan [1875] and Hotelling [1936]. This text surveys the evolution of
canonical correlation analysis, a fundamental statistical tool, beginning with
its foundational theorems and progressing to recent developments and open
research problems. Along the way we introduce and review methods, notions, and
fundamental concepts from linear algebra, random matrix theory, and
high-dimensional statistics, placing particular emphasis on rigorous
mathematical treatment.
The survey is intended for technically proficient graduate students and other
researchers with an interest in this area. The content is organized into five
chapters, supplemented by six sets of exercises found in Chapter 6. These
exercises introduce additional material, reinforce key concepts, and serve to
bridge ideas across chapters. We recommend the following sequence: first, solve
Problem Set 0, then proceed with Chapter 1, solve Problem Set 1, and so on
through the text.
arXiv link: http://arxiv.org/abs/2411.15625v1
From Replications to Revelations: Heteroskedasticity-Robust Inference
leading economic journals, we find that, among the 40,571 regressions
specifying heteroskedasticity-robust standard errors, 98.1% adhere to Stata's
default HC1 specification. We then compare several heteroskedasticity-robust
inference methods with a large-scale Monte Carlo study based on regressions
from 155 reproduction packages. Our results show that t-tests based on HC1 or
HC2 with default degrees of freedom exhibit substantial over-rejection.
Inference methods with customized degrees of freedom, as proposed by Bell and
McCaffrey (2002), Hansen (2024), and a novel approach based on partial
leverages, perform best. Additionally, we provide deeper insights into the role
of leverages and partial leverages across different inference methods.
arXiv link: http://arxiv.org/abs/2411.14763v2
Dynamic Spatial Interaction Models for a Resource Allocator's Decisions and Local Agents' Multiple Activities
decision-making processes of a resource allocator and local agents, with
central and local governments serving as empirical representations. The model
captures two key features: (i) resource allocations from the allocator to local
agents and the resulting strategic interactions, and (ii) local agents'
multiple activities and their interactions. We develop a network game for the
micro-foundations of these processes. In this game, local agents engage in
multiple activities, while the allocator distributes resources by monitoring
the externalities arising from their interactions. The game's unique Nash
equilibrium establishes our econometric framework. To estimate the agent payoff
parameters, we employ the quasi-maximum likelihood (QML) estimation method and
examine the asymptotic properties of the QML estimator to ensure robust
statistical inference. Empirically, we study interactions among U.S. states in
public welfare and housing and community development expenditures, focusing on
how federal grants influence these expenditures and the interdependencies among
state governments. Our findings reveal significant spillovers across the
states' two expenditures. Additionally, we detect positive effects of federal
grants on both types of expenditures, inducing a responsive grant scheme based
on states' decisions. Last, we compare state expenditures and social welfare
through counterfactual simulations under two scenarios: (i) responsive
intervention by monitoring states' decisions and (ii) autonomous transfers. We
find that responsive intervention enhances social welfare by leading to an
increase in the states' two expenditures. However, due to the heavy reliance on
autonomous transfers, the magnitude of these improvements remains relatively
small compared to the share of federal grants in total state revenues.
arXiv link: http://arxiv.org/abs/2411.13810v2
Clustering with Potential Multidimensionality: Inference and Practice
justified in M-estimation when there is sampling or assignment uncertainty.
Since existing procedures for variance estimation are either conservative or
invalid, we propose a variance estimator that refines a conservative procedure
and remains valid. We then interpret environments where clustering is
frequently employed in empirical work from our design-based perspective and
provide insights on their estimands and inference procedures.
arXiv link: http://arxiv.org/abs/2411.13372v1
Revealed Information
actions, but not the frequency conditional on payoff-relevant states. We ask
when the analyst can rationalize the DM's choices as if the DM first learns
something about the state before acting. We provide a support-function
characterization of the triples of utility functions, prior beliefs, and
(marginal) distributions over actions such that the DM's action distribution is
consistent with information given the DM's prior and utility function.
Assumptions on the cardinality of the state space and the utility function
allow us to refine this characterization, obtaining a sharp system of finitely
many inequalities the utility function, prior, and action distribution must
satisfy. We apply our characterization to study comparative statics and to
identify conditions under which a single information structure rationalizes
choices across multiple decision problems. We characterize the set of
distributions over posterior beliefs that are consistent with the DM's choices.
We extend our results to settings with a continuum of actions and states
assuming the first-order approach applies, and to simple multi-agent settings.
arXiv link: http://arxiv.org/abs/2411.13293v2
Prediction-Guided Active Experiments
Prediction-Guided Active Experiment (PGAE), which leverages predictions from an
existing machine learning model to guide sampling and experimentation.
Specifically, at each time step, an experimental unit is sampled according to a
designated sampling distribution, and the actual outcome is observed based on
an experimental probability. Otherwise, only a prediction for the outcome is
available. We begin by analyzing the non-adaptive case, where full information
on the joint distribution of the predictor and the actual outcome is assumed.
For this scenario, we derive an optimal experimentation strategy by minimizing
the semi-parametric efficiency bound for the class of regular estimators. We
then introduce an estimator that meets this efficiency bound, achieving
asymptotic optimality. Next, we move to the adaptive case, where the predictor
is continuously updated with newly sampled data. We show that the adaptive
version of the estimator remains efficient and attains the same semi-parametric
bound under certain regularity assumptions. Finally, we validate PGAE's
performance through simulations and a semi-synthetic experiment using data from
the US Census Bureau. The results underscore the PGAE framework's effectiveness
and superiority compared to other existing methods.
arXiv link: http://arxiv.org/abs/2411.12036v2
Debiased Regression for Root-N-Consistent Conditional Mean Estimation
high-dimensional and nonparametric regression estimators. For example,
nonparametric regression methods allow for the estimation of regression
functions in a data-driven manner with minimal assumptions; however, these
methods typically fail to achieve $n$-consistency in their convergence
rates, and many, including those in machine learning, lack guarantees that
their estimators asymptotically follow a normal distribution. To address these
challenges, we propose a debiasing technique for nonparametric estimators by
adding a bias-correction term to the original estimators, extending the
conventional one-step estimator used in semiparametric analysis. Specifically,
for each data point, we estimate the conditional expected residual of the
original nonparametric estimator, which can, for instance, be computed using
kernel (Nadaraya-Watson) regression, and incorporate it as a bias-reduction
term. Our theoretical analysis demonstrates that the proposed estimator
achieves $n$-consistency and asymptotic normality under a mild
convergence rate condition for both the original nonparametric estimator and
the conditional expected residual estimator. Notably, this approach remains
model-free as long as the original estimator and the conditional expected
residual estimator satisfy the convergence rate condition. The proposed method
offers several advantages, including improved estimation accuracy and
simplified construction of confidence intervals.
arXiv link: http://arxiv.org/abs/2411.11748v3
Treatment Effect Estimators as Weighted Outcomes
tradition. Their outcome weights are widely used in established procedures,
such as checking covariate balance, characterizing target populations, or
detecting and managing extreme weights. This paper introduces a general
framework for deriving such outcome weights. It establishes when and how
numerical equivalence between an original estimator representation as moment
condition and a unique weighted representation can be obtained. The framework
is applied to derive novel outcome weights for the six seminal instances of
double machine learning and generalized random forests, while recovering
existing results for other estimators as special cases. The analysis highlights
that implementation choices determine (i) the availability of outcome weights
and (ii) their properties. Notably, standard implementations of partially
linear regression-based estimators, like causal forests, employ outcome weights
that do not sum to (minus) one in the (un)treated group, not fulfilling a
property often considered desirable.
arXiv link: http://arxiv.org/abs/2411.11559v2
Econometrics and Formalism of Psychological Archetypes of Scientific Workers with Introverted Thinking Type
individuals are examined. The anomalous nature of psychological activity in
individuals involved in scientific work is highlighted. Certain aspects of the
introverted thinking type in scientific activities are analyzed. For the first
time, psychological archetypes of scientists with pronounced introversion are
postulated in the context of twelve hypotheses about the specifics of
professional attributes of introverted scientific activities.
A linear regression and Bayesian equation are proposed for quantitatively
assessing the econometric degree of introversion in scientific employees,
considering a wide range of characteristics inherent to introverts in
scientific processing. Specifically, expressions for a comprehensive assessment
of introversion in a linear model and the posterior probability of the
econometric (scientometric) degree of introversion in a Bayesian model are
formulated.
The models are based on several econometric (scientometric) hypotheses
regarding various aspects of professional activities of introverted scientists,
such as a preference for solo publications, low social activity, narrow
specialization, high research depth, and so forth. Empirical data and multiple
linear regression methods can be used to calibrate the equations. The model can
be applied to gain a deeper understanding of the psychological characteristics
of scientific employees, which is particularly useful in ergonomics and the
management of scientific teams and projects. The proposed method also provides
scientists with pronounced introversion the opportunity to develop their
careers, focusing on individual preferences and features.
arXiv link: http://arxiv.org/abs/2411.11058v1
Program Evaluation with Remotely Sensed Outcomes
sensed variables (RSVs), e.g. satellite images or mobile phone activity, in
place of directly measured economic outcomes. A common practice is to use an
observational sample to train a predictor of the economic outcome from the RSV,
and then to use its predictions as the outcomes in the experiment. We show that
this method is biased whenever the RSV is post-outcome, i.e. if variation in
the economic outcome causes variation in the RSV. In program evaluation,
changes in poverty or environmental quality cause changes in satellite images,
but not vice versa. As our main result, we nonparametrically identify the
treatment effect by formalizing the intuition that underlies common practice:
the conditional distribution of the RSV given the outcome and treatment is
stable across the samples.Based on our identifying formula, we find that the
efficient representation of RSVs for causal inference requires three
predictions rather than one. Valid inference does not require any rate
conditions on RSV predictions, justifying the use of complex deep learning
algorithms with unknown statistical properties. We re-analyze the effect of an
anti-poverty program in India using satellite images.
arXiv link: http://arxiv.org/abs/2411.10959v2
Building Interpretable Climate Emulators for Economics
emulators (CEs) for economic models of climate change. The paper makes two main
contributions. First, we propose a general framework for constructing
carbon-cycle emulators (CCEs) for macroeconomic models. The framework is
implemented as a generalized linear multi-reservoir (box) model that conserves
key physical quantities and can be customized for specific applications. We
consider three versions of the CCE, which we evaluate within a simple
representative agent economic model: (i) a three-box setting comparable to
DICE-2016, (ii) a four-box extension, and (iii) a four-box version that
explicitly captures land-use change. While the three-box model reproduces
benchmark results well and the fourth reservoir adds little, incorporating the
impact of land-use change on the carbon storage capacity of the terrestrial
biosphere substantially alters atmospheric carbon stocks, temperature
trajectories, and the optimal mitigation path. Second, we investigate
pattern-scaling techniques that transform global-mean temperature projections
from CEs into spatially heterogeneous warming fields. We show how regional
baseline climates, non-uniform warming, and the associated uncertainties
propagate into economic damages.
arXiv link: http://arxiv.org/abs/2411.10768v2
Feature Importance of Climate Vulnerability Indicators with Gradient Boosting across Five Global Cities
climate hazards and the social vulnerabilities that interact with these
hazards, but the science of validating hazard vulnerability indicators is still
in its infancy. Progress is needed to improve: 1) the selection of variables
that are used as proxies to represent hazard vulnerability; 2) the
applicability and scale for which these indicators are intended, including
their transnational applicability. We administered an international urban
survey in Buenos Aires, Argentina; Johannesburg, South Africa; London, United
Kingdom; New York City, United States; and Seoul, South Korea in order to
collect data on exposure to various types of extreme weather events,
socioeconomic characteristics commonly used as proxies for vulnerability (i.e.,
income, education level, gender, and age), and additional characteristics not
often included in existing composite indices (i.e., queer identity, disability
identity, non-dominant primary language, and self-perceptions of both
discrimination and vulnerability to flood risk). We then use feature importance
analysis with gradient-boosted decision trees to measure the importance that
these variables have in predicting exposure to various types of extreme weather
events. Our results show that non-traditional variables were more relevant to
self-reported exposure to extreme weather events than traditionally employed
variables such as income or age. Furthermore, differences in variable relevance
across different types of hazards and across urban contexts suggest that
vulnerability indicators need to be fit to context and should not be used in a
one-size-fits-all fashion.
arXiv link: http://arxiv.org/abs/2411.10628v1
Monetary Incentives, Landowner Preferences: Estimating Cross-Elasticities in Farmland Conversion to Renewable Energy
farmland to renewable energy generation, specifically solar and wind, in the
context of expanding U.S. energy production. We propose a new econometric
method that accounts for the diverse circumstances of landowners, including
their unordered alternative land use options, non-monetary benefits from
farming, and the influence of local regulations. We demonstrate that
identifying the cross elasticity of landowners' farming income in relation to
the conversion of farmland to renewable energy requires an understanding of
their preferences. By utilizing county legislation that we assume to be shaped
by land-use preferences, we estimate the cross-elasticities of farming income.
Our findings indicate that monetary incentives may only influence landowners'
decisions in areas with potential for future residential development,
underscoring the importance of considering both preferences and regulatory
contexts.
arXiv link: http://arxiv.org/abs/2411.10600v1
Dynamic Causal Effects in a Nonlinear World: the Good, the Bad, and the Ugly
by linear models. We study whether the estimands of such procedures have a
causal interpretation when the true data generating process is in fact
nonlinear. We show that vector autoregressions and linear local projections
onto observed shocks or proxies identify weighted averages of causal effects
regardless of the extent of nonlinearities. By contrast, identification
approaches that exploit heteroskedasticity or non-Gaussianity of latent shocks
are highly sensitive to departures from linearity. Our analysis is based on new
results on the identification of marginal treatment effects through weighted
regressions, which may also be of interest to researchers outside
macroeconomics.
arXiv link: http://arxiv.org/abs/2411.10415v4
Semiparametric inference for impulse response functions using double/debiased machine learning
impulse response function (IRF) in settings where a time series of interest is
subjected to multiple discrete treatments, assigned over time, which can have a
causal effect on future outcomes. The proposed estimator can rely on fully
nonparametric relations between treatment and outcome variables, opening up the
possibility to use flexible machine learning approaches to estimate IRFs. To
this end, we extend the theory of DML from an i.i.d. to a time series setting
and show that the proposed DML estimator for the IRF is consistent and
asymptotically normally distributed at the parametric rate, allowing for
semiparametric inference for dynamic effects in a time series setting. The
properties of the estimator are validated numerically in finite samples by
applying it to learn the IRF in the presence of serial dependence in both the
confounder and observation innovation processes. We also illustrate the
methodology empirically by applying it to the estimation of the effects of
macroeconomic shocks.
arXiv link: http://arxiv.org/abs/2411.10009v1
Sharp Testable Implications of Encouragement Designs
outcome, a discrete multi-valued treatment, and a discrete multi-valued
instrument. We derive sharp, closed-form testable implications for a class of
restrictions on potential treatments where each value of the instrument
encourages towards at most one unique treatment choice; such restrictions serve
as the key identifying assumption in several prominent recent empirical papers.
Borrowing the terminology used in randomized experiments, we call such a
setting an encouragement design. The testable implications are inequalities in
terms of the conditional distributions of choices and the outcome given the
instrument. Through a novel constructive argument, we show these inequalities
are sharp in the sense that any distribution of the observed data that
satisfies these inequalities is compatible with this class of restrictions on
potential treatments. Based on these inequalities, we propose tests of the
restrictions. In an empirical application, we show some of these restrictions
are violated and pinpoint the substitution pattern that leads to the violation.
arXiv link: http://arxiv.org/abs/2411.09808v3
Bayesian estimation of finite mixtures of Tobit models
models. The method consists of an MCMC approach that combines Gibbs sampling
with data augmentation and is simple to implement. I show through simulations
that the flexibility provided by this method is especially helpful when
censoring is not negligible. In addition, I demonstrate the broad utility of
this methodology with applications to a job training program, labor supply, and
demand for medical care. I find that this approach allows for non-trivial
additional flexibility that can alter results considerably and beyond improving
model fit.
arXiv link: http://arxiv.org/abs/2411.09771v1
Sparse Interval-valued Time Series Modeling with Machine Learning
learning regressions for high-dimensional interval-valued time series. With
LASSO or adaptive LASSO techniques, we develop a penalized minimum distance
estimation, which covers point-based estimators are special cases. We establish
the consistency and oracle properties of the proposed penalized estimator,
regardless of whether the number of predictors is diverging with the sample
size. Monte Carlo simulations demonstrate the favorable finite sample
properties of the proposed estimation. Empirical applications to
interval-valued crude oil price forecasting and sparse index-tracking portfolio
construction illustrate the robustness and effectiveness of our method against
competing approaches, including random forest and multilayer perceptron for
interval-valued data. Our findings highlight the potential of machine learning
techniques in interval-valued time series analysis, offering new insights for
financial forecasting and portfolio management.
arXiv link: http://arxiv.org/abs/2411.09452v1
On Asymptotic Optimality of Least Squares Model Averaging When True Model Is Included
to technical difficulties, existing studies rely on restricted weight sets or
the assumption that there is no true model with fixed dimensions in the
candidate set. The focus of this paper is to overcome these difficulties.
Surprisingly, we discover that when the penalty factor in the weight selection
criterion diverges with a certain order and the true model dimension is fixed,
asymptotic loss optimality does not hold, but asymptotic risk optimality does.
This result differs from the corresponding result of Fang et al. (2023,
Econometric Theory 39, 412-441) and reveals that using the discrete weight set
of Hansen (2007, Econometrica 75, 1175-1189) can yield opposite asymptotic
properties compared to using the usual weight set. Simulation studies
illustrate the theoretical findings in a variety of settings.
arXiv link: http://arxiv.org/abs/2411.09258v1
Difference-in-Differences with Sample Selection
within the difference-in-differences (DiD) framework in the presence of
endogenous sample selection. First, we establish that the usual DiD estimand
fails to recover meaningful treatment effects, even if selection and treatment
assignment are independent. Next, we partially identify the ATT for individuals
who are always observed post-treatment regardless of their treatment status,
and derive bounds on this parameter under different sets of assumptions about
the relationship between sample selection and treatment assignment. Extensions
to the repeated cross-section and two-by-two comparisons in the staggered
adoption case are explored. Furthermore, we provide identification results for
the ATT of three additional empirically relevant latent groups by incorporating
outcome mean dominance assumptions which have intuitive appeal in applications.
Finally, two empirical illustrations demonstrate the approach's usefulness by
revisiting (i) the effect of a job training program on earnings(Calonico &
Smith, 2017) and (ii) the effect of a working-from-home policy on employee
performance (Bloom, Liang, Roberts, & Ying, 2015).
arXiv link: http://arxiv.org/abs/2411.09221v2
On the (Mis)Use of Machine Learning with Panel Data
of machine learning on panel data. Our organizing framework clarifies why
neglecting the cross-sectional and longitudinal structure of these data leads
to hard-to-detect data leakage, inflated out-of-sample performance, and an
inadvertent overestimation of the real-world usefulness and applicability of
machine learning models. We then offer empirical guidelines for practitioners
to ensure the correct implementation of supervised machine learning in panel
data environments. An empirical application, using data from over 3,000 U.S.
counties spanning 2000-2019 and focused on income prediction, illustrates the
practical relevance of these points across nearly 500 models for both
classification and regression tasks.
arXiv link: http://arxiv.org/abs/2411.09218v2
Covariate Adjustment in Randomized Experiments Motivated by Higher-Order Influence Functions
the past twenty years, is a fundamental theoretical device for constructing
rate-optimal causal-effect estimators from observational studies. However, the
value of HOIF for analyzing well-conducted randomized controlled trials (RCTs)
has not been explicitly explored. In the recent U.S. Food and Drug
Administration (FDA) and European Medicines Agency (EMA) guidelines on the
practice of covariate adjustment in analyzing RCTs, in addition to the simple,
unadjusted difference-in-mean estimator, it was also recommended to report the
estimator adjusting for baseline covariates via a simple parametric working
model, such as a linear model. In this paper, we show that a HOIF-motivated
estimator for the treatment-specific mean has significantly improved
statistical properties compared to popular adjusted estimators in practice when
the number of baseline covariates $p$ is relatively large compared to the
sample size $n$. We also characterize the conditions under which the
HOIF-motivated estimator improves upon the unadjusted one. Furthermore, we
demonstrate that a novel debiased adjusted estimator proposed recently by Lu et
al. is, in fact, another HOIF-motivated estimator in disguise. Numerical and
empirical studies are conducted to corroborate our theoretical findings.
arXiv link: http://arxiv.org/abs/2411.08491v2
MSTest: An R-Package for Testing Markov Switching Models
procedures to identify the number of regimes in Markov switching models. These
models have wide-ranging applications in economics, finance, and numerous other
fields. The MSTest package includes the Monte Carlo likelihood ratio test
procedures proposed by Rodriguez-Rondon and Dufour (2024), the moment-based
tests of Dufour and Luger (2017), the parameter stability tests of Carrasco,
Hu, and Ploberger (2014), and the likelihood ratio test of Hansen (1992).
Additionally, the package enables users to simulate and estimate univariate and
multivariate Markov switching and hidden Markov processes, using the
expectation-maximization (EM) algorithm or maximum likelihood estimation (MLE).
We demonstrate the functionality of the MSTest package through both simulation
experiments and an application to U.S. GNP growth data.
arXiv link: http://arxiv.org/abs/2411.08188v1
A Note on Doubly Robust Estimator in Regression Discontinuity Designs
discontinuity (RD) designs. RD designs provide a quasi-experimental framework
for estimating treatment effects, where treatment assignment depends on whether
a running variable surpasses a predefined cutoff. A common approach in RD
estimation is the use of nonparametric regression methods, such as local linear
regression. However, the validity of these methods still relies on the
consistency of the nonparametric estimators. In this study, we propose the
DR-RD estimator, which combines two distinct estimators for the conditional
expected outcomes. The primary advantage of the DR-RD estimator lies in its
ability to ensure the consistency of the treatment effect estimation as long as
at least one of the two estimators is consistent. Consequently, our DR-RD
estimator enhances robustness of treatment effect estimators in RD designs.
arXiv link: http://arxiv.org/abs/2411.07978v4
Matching $\leq$ Hybrid $\leq$ Difference in Differences
estimating treatment effects using pre- and post-intervention data. Scholars
have traditionally used experimental benchmarks to evaluate the accuracy of
alternative econometric methods, including Matching, Difference-in-Differences
(DID), and their hybrid forms (e.g., Heckman et al., 1998b; Dehejia and Wahba,
2002; Smith and Todd, 2005). We revisit these methodologies in the evaluation
of job training and educational programs using four datasets (LaLonde, 1986;
Heckman et al., 1998a; Smith and Todd, 2005; Chetty et al., 2014a; Athey et
al., 2020), and show that the inequality relationship, Matching $\leq$ Hybrid
$\leq$ DID, appears as a consistent norm, rather than a mere coincidence. We
provide a formal theoretical justification for this puzzling phenomenon under
plausible conditions such as negative selection, by generalizing the classical
bracketing (Angrist and Pischke, 2009, Section 5). Consequently, when
treatments are expected to be non-negative, DID tends to provide optimistic
estimates, while Matching offers more conservative ones.
arXiv link: http://arxiv.org/abs/2411.07952v2
Impact of R&D and AI Investments on Economic Growth and Credit Rating
innovation and aligns with long-term strategies in both public and private
sectors. This study addresses two primary research questions: (1) assessing the
relationship between R&D investments and GDP through regression analysis, and
(2) estimating the economic value added (EVA) that Georgia must generate to
progress from a BB to a BBB credit rating. Using World Bank data from
2014-2022, this analysis found that increasing R&D, with an emphasis on AI, by
30-35% has a measurable impact on GDP. Regression results reveal a coefficient
of 7.02%, indicating a 10% increase in R&D leads to a 0.70% GDP rise, with an
81.1% determination coefficient and a strong 90.1% correlation.
Georgia's EVA model was calculated to determine the additional value needed
for a BBB rating, comparing indicators from Greece, Hungary, India, and
Kazakhstan as benchmarks. Key economic indicators considered were nominal GDP,
GDP per capita, real GDP growth, and fiscal indicators (government balance/GDP,
debt/GDP). The EVA model projects that to achieve a BBB rating within nine
years, Georgia requires $61.7 billion in investments. Utilizing EVA and
comprehensive economic indicators will support informed decision-making and
enhance the analysis of Georgia's economic trajectory.
arXiv link: http://arxiv.org/abs/2411.07817v1
Spatial Competition on Psychological Pricing Strategies -- Preliminary Evidence from an Online Marketplace
psychological-pricing choices on Austria's C2C marketplace willhaben. Two
web-scraped snapshots of 826 Woom Bike listings - a standardised product sold
on the platform reveal that sellers near direct competitors are more likely to
adopt 9-, 90-, or 99-ending prices, who also use such pricing strategy
unconditional on product characteristics or underlying spatiotemporal
differences. Such strategy is associated with an average premium of
approximately cet. par. 3.4 %. Information asymmetry persists: buyer trust
hinges on signals such as the "Trusted Seller" badge, and missing data on the
"PayLivery" feature. Lacking final transaction prices limits inference.
arXiv link: http://arxiv.org/abs/2411.07808v3
Dynamic Evolutionary Game Analysis of How Fintech in Banking Mitigates Risks in Agricultural Supply Chain Finance
in the agricultural supply chain, focusing on the secondary allocation of
commercial credit. The study constructs a three-player evolutionary game model
involving banks, core enterprises, and SMEs to analyze how fintech innovations,
such as big data credit assessment, blockchain, and AI-driven risk evaluation,
influence financial risks and access to credit. The findings reveal that
banking fintech reduces financing costs and mitigates financial risks by
improving transaction reliability, enhancing risk identification, and
minimizing information asymmetry. By optimizing cooperation between banks, core
enterprises, and SMEs, fintech solutions enhance the stability of the
agricultural supply chain, contributing to rural revitalization goals and
sustainable agricultural development. The study provides new theoretical
insights and practical recommendations for improving agricultural finance
systems and reducing financial risks.
Keywords: banking fintech, agricultural supply chain, financial risk,
commercial credit, SMEs, evolutionary game model, big data, blockchain,
AI-driven risk evaluation.
arXiv link: http://arxiv.org/abs/2411.07604v1
Evaluating the Accuracy of Chatbots in Financial Literature
versions), and Gemini Advanced, in providing references on financial literature
and employing novel methodologies. Alongside the conventional binary approach
commonly used in the literature, we developed a nonbinary approach and a
recency measure to assess how hallucination rates vary with how recent a topic
is. After analyzing 150 citations, ChatGPT-4o had a hallucination rate of 20.0%
(95% CI, 13.6%-26.4%), while the o1-preview had a hallucination rate of 21.3%
(95% CI, 14.8%-27.9%). In contrast, Gemini Advanced exhibited higher
hallucination rates: 76.7% (95% CI, 69.9%-83.4%). While hallucination rates
increased for more recent topics, this trend was not statistically significant
for Gemini Advanced. These findings emphasize the importance of verifying
chatbot-provided references, particularly in rapidly evolving fields.
arXiv link: http://arxiv.org/abs/2411.07031v1
Return and Volatility Forecasting Using On-Chain Flows in Cryptocurrency Markets
of on-chain flow data for Bitcoin(BTC), Ethereum(ETH), and Tether(USDT). We
find ETH net inflows to strongly predict ETH returns and volatility in the
2017-2023 period. Our intraday frequencies are 1-6 hours. We find that
differing significantly from forecasting patterns for BTC, ETH net inflows
negatively predict ETH returns and volatility. First, we find that USDT flowing
out of investors wallets and into cryptocurrency exchanges, namely, USDT net
inflows into the exchanges, positively predicts BTC and ETH returns at multiple
intervals and negatively predicts ETH volatility at various intervals and BTC
volatility at the 6-hour interval. Second, we find that ETH net inflows
negatively predict ETH returns and volatility for all intraday intervals.
Third, BTC net inflows generally lack predictive power for BTC returns(except
at 4 hours) but are negatively associated with volatility across all intraday
intervals. We illustrate our findings on return forecasting via case studies.
Moreover, we develop option strategies to assess profits and losses on ETH
investments based on ETH net inflows. Our findings contribute to the growing
literature on on-chain activity and its asset pricing implications, offering
economically relevant insights for intraday portfolio management in
cryptocurrency markets.
arXiv link: http://arxiv.org/abs/2411.06327v2
On the limiting variance of matching estimators
estimators for average treatment effects with a fixed number of matches. We
present, for the first time, a closed-form expression for this limit. Here the
key is the establishment of the limiting second moment of the catchment area's
volume, which resolves a question of Abadie and Imbens. At the core of our
approach is a new universality theorem on the measures of high-order Voronoi
cells, extending a result by Devroye, Gy\"orfi, Lugosi, and Walk.
arXiv link: http://arxiv.org/abs/2411.05758v1
Firm Heterogeneity and Macroeconomic Fluctuations: a Functional VAR model
explicitly incorporate firm-level heterogeneity observed in more than one
dimension and study its interaction with aggregate macroeconomic fluctuations.
Our methodology employs dimensionality reduction techniques for tensor data
objects to approximate the joint distribution of firm-level characteristics.
More broadly, our framework can be used for assessing predictions from
structural models that account for micro-level heterogeneity observed on
multiple dimensions. Leveraging firm-level data from the Compustat database, we
use the FunVAR model to analyze the propagation of total factor productivity
(TFP) shocks, examining their impact on both macroeconomic aggregates and the
cross-sectional distribution of capital and labor across firms.
arXiv link: http://arxiv.org/abs/2411.05695v1
Nowcasting distributions: a functional MIDAS model
for forecasting and nowcasting distributions observed at a lower frequency. We
approximate the low-frequency distribution using Functional Principal Component
Analysis and consider a group lasso spike-and-slab prior to identify the
relevant predictors in the finite-dimensional SUR-MIDAS approximation of the
functional MIDAS model. In our application, we use the model to nowcast the
U.S. households' income distribution. Our findings indicate that the model
enhances forecast accuracy for the entire target distribution and for key
features of the distribution that signal changes in inequality.
arXiv link: http://arxiv.org/abs/2411.05629v1
Detecting Cointegrating Relations in Non-stationary Matrix-Valued Time Series
relations in matrix-valued time series. We hereby allow separate cointegrating
relations along the rows and columns of the matrix-valued time series and use
information criteria to select the cointegration ranks. Through Monte Carlo
simulations and a macroeconomic application, we demonstrate that our approach
provides a reliable estimation of the number of cointegrating relationships.
arXiv link: http://arxiv.org/abs/2411.05601v2
Inference for Treatment Effects Conditional on Generalized Principal Strata using Instrumental Variables
effect parameters in a setting of a discrete valued treatment and instrument
with a general outcome variable. The class of parameters considered are those
that can be expressed as the expectation of a function of the response type
conditional on a generalized principal stratum. Here, the response type refers
to the vector of potential outcomes and potential treatments, and a generalized
principal stratum is a set of possible values for the response type. In
addition to instrument exogeneity, the main substantive restriction imposed
rules out certain values for the response types in the sense that they are
assumed to occur with probability zero. It is shown through a series of
examples that this framework includes a wide variety of parameters and
assumptions that have been considered in the previous literature. A key result
in our analysis is a characterization of the identified set for such parameters
under these assumptions in terms of existence of a non-negative solution to
linear systems of equations with a special structure. We propose methods for
inference exploiting this special structure and recent results in Fang et al.
(2023).
arXiv link: http://arxiv.org/abs/2411.05220v2
The role of expansion strategies and operational attributes on hotel performance: a compositional approach
attributes of hotel establishments on the performance of international hotel
chains, focusing on four key performance indicators: RevPAR, efficiency,
occupancy, and asset turnover. Data were collected from 255 hotels across
various international hotel chains, providing a comprehensive assessment of how
different expansion strategies and hotel attributes influence performance. The
research employs compositional data analysis (CoDA) to address the
methodological limitations of traditional financial ratios in statistical
analysis. The findings indicate that ownership-based expansion strategies
result in higher operational performance, as measured by revenue per available
room, but yield lower economic performance due to the high capital investment
required. Non-ownership strategies, such as management contracts and
franchising, show superior economic efficiency, offering more flexibility and
reduced financial risk. This study contributes to the hospitality management
literature by applying CoDA, a novel methodological approach in this field, to
examine the performance of different hotel expansion strategies with a sound
and more appropriate method. The insights provided can guide hotel managers and
investors in making informed decisions to optimize both operational and
economic performance.
arXiv link: http://arxiv.org/abs/2411.04640v1
Partial Identification of Distributional Treatment Effects in Panel Data using Copula Equality Assumptions
(DTEs) that depend on the unknown joint distribution of treated and untreated
potential outcomes. We construct the DTE bounds using panel data and allow
individuals to switch between the treated and untreated states more than once
over time. Individuals are grouped based on their past treatment history, and
DTEs are allowed to be heterogeneous across different groups. We provide two
alternative group-wise copula equality assumptions to bound the unknown joint
and the DTEs, both of which leverage information from the past observations.
Testability of these two assumptions are also discussed, and test results are
presented. We apply this method to study the treatment effect heterogeneity of
exercising on the adults' body weight. These results demonstrate that our
method improves the identification power of the DTE bounds compared to the
existing methods.
arXiv link: http://arxiv.org/abs/2411.04450v1
Identification of Long-Term Treatment Effects via Temporal Links, Observational, and Experimental Data
observational data to provide credible alternatives to conventional
observational studies for identification of long-term average treatment effects
(LTEs). I show that experimental data have an auxiliary role in this context.
They bring no identifying power without additional modeling assumptions. When
modeling assumptions are imposed, experimental data serve to amplify their
identifying power. If the assumptions fail, adding experimental data may only
yield results that are farther from the truth. Motivated by this, I introduce
two assumptions on treatment response that may be defensible based on economic
theory or intuition. To utilize them, I develop a novel two-step identification
approach that centers on bounding temporal link functions -- the relationship
between short-term and mean long-term potential outcomes. The approach provides
sharp bounds on LTEs for a general class of assumptions, and allows for
imperfect experimental compliance -- extending existing results.
arXiv link: http://arxiv.org/abs/2411.04380v1
Lee Bounds with a Continuous Treatment in Sample Selection
multivalued treatment affects both outcomes and their observability (e.g.,
employment or survey responses). We generalized the widely used Lee (2009)'s
bounds for binary treatment effects. Our key innovation is a sufficient
treatment values assumption that imposes weak restrictions on selection
heterogeneity and is implicit in separable threshold-crossing models, including
monotone effects on selection. Our double debiased machine learning estimator
enables nonparametric and high-dimensional methods, using covariates to tighten
the bounds and capture heterogeneity. Applications to Job Corps and CCC program
evaluations reinforce prior findings under weaker assumptions.
arXiv link: http://arxiv.org/abs/2411.04312v4
Bounded Rationality in Central Bank Communication
focusing on cognitive differences between experts and non-experts. Using
sentiment analysis of FOMC minutes, we integrate these insights into a bounded
rationality model to examine the impact on inflation expectations. Results show
that experts form more conservative expectations, anticipating FOMC
stabilization actions, while non-experts react more directly to inflation
concerns. A lead-lag analysis indicates that institutions adjust faster, though
the gap with individual investors narrows in the short term. These findings
highlight the need for tailored communication strategies to better align public
expectations with policy goals.
arXiv link: http://arxiv.org/abs/2411.04286v1
An Adversarial Approach to Identification
and counterfactual parameters in econometric models. By reformulating the
identification problem as a set membership question, we leverage the separating
hyperplane theorem in the space of observed probability measures to
characterize the identified set through the zeros of a discrepancy function
with an adversarial game interpretation. The set can be a singleton, resulting
in point identification. A feature of many econometric models, with or without
distributional assumptions on the error terms, is that the probability measure
of observed variables can be expressed as a linear transformation of the
probability measure of latent variables. This structure provides a unifying
framework and facilitates computation and inference via linear programming. We
demonstrate the versatility of our approach by applying it to nonlinear panel
models with fixed effects, with parametric and nonparametric error
distributions, and across various exogeneity restrictions, including strict and
sequential.
arXiv link: http://arxiv.org/abs/2411.04239v2
Identification and Inference in General Bunching Designs
and inference of a structural parameter in general bunching designs. We present
point and partial identification results, which generalize previous approaches
in the literature. The key assumption for point identification is the
analyticity of the counterfactual density, which defines a broader class of
distributions than many commonly used parametric families. In the partial
identification approach, the analyticity condition is relaxed and various
inequality restrictions can be incorporated. Both of our identification
approaches allow for observed covariates in the model, which has previously
been permitted only in limited ways. These covariates allow us to account for
observable factors that influence decisions regarding the running variable. We
provide a suite of counterfactual estimation and inference methods, termed the
generalized polynomial strategy. Our method restores the merits of the original
polynomial strategy proposed by Chetty et al. (2011) while addressing several
weaknesses in the widespread practice. The efficacy of the proposed method is
demonstrated compared to the polynomial estimator in a series of Monte Carlo
studies within the augmented isoelastic model. We revisit the data used in Saez
(2010) and find substantially different results relative to those from the
polynomial strategy.
arXiv link: http://arxiv.org/abs/2411.03625v3
Improving precision of A/B experiments using trigger intensity
is a standard approach to measure the impact of a causal change. These
experiments have small treatment effect to reduce the potential blast radius.
As a result, these experiments often lack statistical significance due to low
signal-to-noise ratio. A standard approach for improving the precision (or
reducing the standard error) focuses only on the trigger observations, where
the output of the treatment and the control model are different. Although
evaluation with full information about trigger observations (full knowledge)
improves the precision, detecting all such trigger observations is a costly
affair. In this paper, we propose a sampling based evaluation method (partial
knowledge) to reduce this cost. The randomness of sampling introduces bias in
the estimated outcome. We theoretically analyze this bias and show that the
bias is inversely proportional to the number of observations used for sampling.
We also compare the proposed evaluation methods using simulation and empirical
data. In simulation, bias in evaluation with partial knowledge effectively
reduces to zero when a limited number of observations (<= 0.1%) are sampled for
trigger estimation. In empirical setup, evaluation with partial knowledge
reduces the standard error by 36.48%.
arXiv link: http://arxiv.org/abs/2411.03530v2
Randomly Assigned First Differences?
of an outcome evolution $\Delta Y$ on a treatment evolution $\Delta D$. Under a
causal model in levels with a time-varying effect, the regression residual is a
function of the period-one treatment $D_{1}$. Then, researchers should test if
$\Delta D$ and $D_{1}$ are correlated: if they are, the regression may suffer
from an omitted variable bias. To solve it, researchers may control
nonparametrically for $E(\Delta D|D_{1})$. We use our results to revisit
first-difference regressions estimated on the data of
acemoglu2016import, who study the effect of imports from China on US
employment. $\Delta D$ and $D_{1}$ are strongly correlated, thus implying that
first-difference regressions may be biased if the effect of Chinese imports
changes over time. The coefficient on $\Delta D$ is no longer significant when
controlling for $E(\Delta D|D_{1})$.
arXiv link: http://arxiv.org/abs/2411.03208v7
Robust Market Interventions
i.e., with high probability -- accounting for uncertainty due to imprecise
information about economic primitives? In a setting with many strategic firms,
each possessing some market power, we present conditions for such interventions
to exist. The key condition, recoverable structure, requires large-scale
complementarities among families of products. The analysis works by decomposing
the incidence of interventions in terms of principal components of a Slutsky
matrix. Under recoverable structure, a noisy signal of this matrix reveals
enough about these principal components to design robust interventions. Our
results demonstrate the usefulness of spectral methods for analyzing
imperfectly observed strategic interactions with many agents.
arXiv link: http://arxiv.org/abs/2411.03026v2
Beyond the Traditional VIX: A Novel Approach to Identifying Uncertainty Shocks in Financial Markets
macroeconomic volatility in financial markets. The Chicago Board Options
Exchange Volatility Index (VIX) measures market expectations of future
volatility, but traditional methods based on second-moment shocks and
time-varying volatility of the VIX often fail to capture the non-Gaussian,
heavy-tailed nature of asset returns. To address this, we construct a revised
VIX by fitting a double-subordinated Normal Inverse Gaussian Levy process to
S&P 500 option prices, providing a more comprehensive measure of volatility
that reflects the extreme movements and heavy tails observed in financial data.
Using an axiomatic approach, we introduce a general family of risk-reward
ratios, computed with our revised VIX and fitted over a fractional time series
to more accurately identify uncertainty shocks in financial markets.
arXiv link: http://arxiv.org/abs/2411.02804v1
Does Regression Produce Representative Causal Rankings?
estimated effects when using linear regression or its popular
double-machine-learning variant, the Partially Linear Model (PLM), in the
presence of treatment effect heterogeneity. We demonstrate by example that
overlap-weighting performed by linear models like PLM can produce Weighted
Average Treatment Effects (WATE) that have rankings that are inconsistent with
the rankings of the underlying Average Treatment Effects (ATE). We define this
as ranking reversals and derive a necessary and sufficient condition for
ranking reversals under the PLM. We conclude with several simulation studies
conditions under which ranking reversals occur.
arXiv link: http://arxiv.org/abs/2411.02675v1
Comment on 'Sparse Bayesian Factor Analysis when the Number of Factors is Unknown' by S. Frühwirth-Schnatter, D. Hosszejni, and H. Freitas Lopes
sparsity and factor selection and have enormous potential beyond standard
factor analysis applications. We show how these techniques can be applied to
Latent Space (LS) models for network data. These models suffer from well-known
identification issues of the latent factors due to likelihood invariance to
factor translation, reflection, and rotation (see Hoff et al., 2002). A set of
observables can be instrumental in identifying the latent factors via auxiliary
equations (see Liu et al., 2021). These, in turn, share many analogies with the
equations used in factor modeling, and we argue that the factor loading
restrictions may be beneficial for achieving identification.
arXiv link: http://arxiv.org/abs/2411.02531v3
Identifying Economic Factors Affecting Unemployment Rates in the United States
inflation, Unemployment Insurance, and S&P 500 index; as well as microeconomic
factors such as health, race, and educational attainment impacted the
unemployment rate for about 20 years in the United States. Our research
question is to identify which factor(s) contributed the most to the
unemployment rate surge using linear regression. Results from our studies
showed that GDP (negative), inflation (positive), Unemployment Insurance
(contrary to popular opinion; negative), and S&P 500 index (negative) were all
significant factors, with inflation being the most important one. As for health
issue factors, our model produced resultant correlation scores for occurrences
of Cardiovascular Disease, Neurological Disease, and Interpersonal Violence
with unemployment. Race as a factor showed a huge discrepancies in the
unemployment rate between Black Americans compared to their counterparts.
Asians had the lowest unemployment rate throughout the years. As for education
attainment, results showed that having a higher education attainment
significantly reduced one chance of unemployment. People with higher degrees
had the lowest unemployment rate. Results of this study will be beneficial for
policymakers and researchers in understanding the unemployment rate during the
pandemic.
arXiv link: http://arxiv.org/abs/2411.02374v1
On the Asymptotic Properties of Debiased Machine Learning Estimators
estimators under a novel asymptotic framework, offering insights for improving
the performance of these estimators in applications. DML is an estimation
method suited to economic models where the parameter of interest depends on
unknown nuisance functions that must be estimated. It requires weaker
conditions than previous methods while still ensuring standard asymptotic
properties. Existing theoretical results do not distinguish between two
alternative versions of DML estimators, DML1 and DML2. Under a new asymptotic
framework, this paper demonstrates that DML2 asymptotically dominates DML1 in
terms of bias and mean squared error, formalizing a previous conjecture based
on simulation results regarding their relative performance. Additionally, this
paper provides guidance for improving the performance of DML2 in applications.
arXiv link: http://arxiv.org/abs/2411.01864v1
Estimating Nonseparable Selection Models: A Functional Contraction Approach
show that, given the selection rule and the observed selected outcome
distribution, the potential outcome distribution can be characterized as the
fixed point of an operator, which we prove to be a functional contraction. We
propose a two-step semiparametric maximum likelihood estimator to estimate the
selection model and the potential outcome distribution. The consistency and
asymptotic normality of the estimator are established. Our approach performs
well in Monte Carlo simulations and is applicable in a variety of empirical
settings where only a selected sample of outcomes is observed. Examples include
consumer demand models with only transaction prices, auctions with incomplete
bid data, and Roy models with data on accepted wages.
arXiv link: http://arxiv.org/abs/2411.01799v2
Understanding the decision-making process of choice modellers
choice behaviour across various disciplines. Building a choice model is a semi
structured research process that involves a combination of a priori
assumptions, behavioural theories, and statistical methods. This complex set of
decisions, coupled with diverse workflows, can lead to substantial variability
in model outcomes. To better understand these dynamics, we developed the
Serious Choice Modelling Game, which simulates the real world modelling process
and tracks modellers' decisions in real time using a stated preference dataset.
Participants were asked to develop choice models to estimate Willingness to Pay
values to inform policymakers about strategies for reducing noise pollution.
The game recorded actions across multiple phases, including descriptive
analysis, model specification, and outcome interpretation, allowing us to
analyse both individual decisions and differences in modelling approaches.
While our findings reveal a strong preference for using data visualisation
tools in descriptive analysis, it also identifies gaps in missing values
handling before model specification. We also found significant variation in the
modelling approach, even when modellers were working with the same choice
dataset. Despite the availability of more complex models, simpler models such
as Multinomial Logit were often preferred, suggesting that modellers tend to
avoid complexity when time and resources are limited. Participants who engaged
in more comprehensive data exploration and iterative model comparison tended to
achieve better model fit and parsimony, which demonstrate that the
methodological choices made throughout the workflow have significant
implications, particularly when modelling outcomes are used for policy
formulation.
arXiv link: http://arxiv.org/abs/2411.01704v2
Changes-In-Changes For Discrete Treatment
treatments with more than two categories, extending the binary case of Athey
and Imbens (2006). While the original CIC model is well-suited for binary
treatments, it cannot accommodate multi-category discrete treatments often
found in economic and policy settings. Although recent work has extended CIC to
continuous treatments, there remains a gap for multi-category discrete
treatments. I introduce a generalized CIC model that adapts the rank invariance
assumption to multiple treatment levels, allowing for robust modeling while
capturing the distinct effects of varying treatment intensities.
arXiv link: http://arxiv.org/abs/2411.01617v1
Educational Effects in Mathematics: Conditional Average Treatment Effect depending on the Number of Treatments
Kogakuin University. Following the initial assessment, it was suggested that
group bias had led to an underestimation of the Center's true impact. To
address this issue, the authors applied the theory of causal inference. By
using T-learner, the conditional average treatment effect (CATE) of the
Center's face-to-face (F2F) personal assistance program was evaluated.
Extending T-learner, the authors produced a new CATE function that depends on
the number of treatments (F2F sessions) and used the estimated function to
predict the CATE performance of F2F assistance.
arXiv link: http://arxiv.org/abs/2411.01498v1
Empirical Welfare Analysis with Hedonic Budget Constraints
product attributes subject to a nonlinear budget constraint. We develop
nonparametric methods for welfare-analysis of interventions that change the
constraint. Two new findings are Roy's identity for smooth, nonlinear budgets,
which yields a Partial Differential Equation system, and a Slutsky-like
symmetry condition for demand. Under scalar unobserved heterogeneity and
single-crossing preferences, the coefficient functions in the PDEs are
nonparametrically identified, and under symmetry, lead to path-independent,
money-metric welfare. We illustrate our methods with welfare evaluation of a
hypothetical change in relationship between property rent and neighborhood
school-quality using British microdata.
arXiv link: http://arxiv.org/abs/2411.01064v1
Higher-Order Causal Message Passing for Experimentation with Complex Interference
across various scientific fields. This task, however, becomes challenging in
areas like social sciences and online marketplaces, where treating one
experimental unit can influence outcomes for others through direct or indirect
interactions. Such interference can lead to biased treatment effect estimates,
particularly when the structure of these interactions is unknown. We address
this challenge by introducing a new class of estimators based on causal
message-passing, specifically designed for settings with pervasive, unknown
interference. Our estimator draws on information from the sample mean and
variance of unit outcomes and treatments over time, enabling efficient use of
observed data to estimate the evolution of the system state. Concretely, we
construct non-linear features from the moments of unit outcomes and treatments
and then learn a function that maps these features to future mean and variance
of unit outcomes. This allows for the estimation of the treatment effect over
time. Extensive simulations across multiple domains, using synthetic and real
network data, demonstrate the efficacy of our approach in estimating total
treatment effect dynamics, even in cases where interference exhibits
non-monotonic behavior in the probability of treatment.
arXiv link: http://arxiv.org/abs/2411.00945v2
Calibrated quantile prediction for Growth-at-Risk
distributions is an essential task for a wide range of applicative fields,
including economic policymaking and the financial industry. Such estimates are
particularly critical in calculating risk measures, such as Growth-at-Risk
(GaR). % and Value-at-Risk (VaR). This work proposes a conformal framework to
estimate calibrated quantiles, and presents an extensive simulation study and a
real-world analysis of GaR to examine its benefits with respect to the state of
the art. Our findings show that CP methods consistently improve the calibration
and robustness of quantile estimates at all levels. The calibration gains are
appreciated especially at extremal quantiles, which are critical for risk
assessment and where traditional methods tend to fall short. In addition, we
introduce a novel property that guarantees coverage under the exchangeability
assumption, providing a valuable tool for managing risks by quantifying and
controlling the likelihood of future extreme observations.
arXiv link: http://arxiv.org/abs/2411.00520v1
Inference in a Stationary/Nonstationary Autoregressive Time-Varying-Parameter Model
autoregressive (AR(1)) models with deterministically time-varying parameters. A
key feature of the proposed approach is to allow for time-varying stationarity
in some time periods, time-varying nonstationarity (i.e., unit root or
local-to-unit root behavior) in other periods, and smooth transitions between
the two. The estimation of the AR parameter at any time point is based on a
local least squares regression method, where the relevant initial condition is
endogenous. We obtain limit distributions for the AR parameter estimator and
t-statistic at a given point $\tau$ in time when the parameter exhibits unit
root, local-to-unity, or stationary/stationary-like behavior at time $\tau$.
These results are used to construct confidence intervals and median-unbiased
interval estimators for the AR parameter at any specified point in time. The
confidence intervals have correct asymptotic coverage probabilities with the
coverage holding uniformly over stationary and nonstationary behavior of the
observations.
arXiv link: http://arxiv.org/abs/2411.00358v1
The ET Interview: Professor Joel L. Horowitz
econometrics and statistics. These include bootstrap methods, semiparametric
and nonparametric estimation, specification testing, nonparametric instrumental
variables estimation, high-dimensional models, functional data analysis, and
shape restrictions, among others. Originally trained as a physicist, Joel made
a pivotal transition to econometrics, greatly benefiting our profession.
Throughout his career, he has collaborated extensively with a diverse range of
coauthors, including students, departmental colleagues, and scholars from
around the globe. Joel was born in 1941 in Pasadena, California. He attended
Stanford for his undergraduate studies and obtained his Ph.D. in physics from
Cornell in 1967. He has been Charles E. and Emma H. Morrison Professor of
Economics at Northwestern University since 2001. Prior to that, he was a
faculty member at the University of Iowa (1982-2001). He has served as a
co-editor of Econometric Theory (1992-2000) and Econometrica (2000-2004). He is
a Fellow of the Econometric Society and of the American Statistical
Association, and an elected member of the International Statistical Institute.
The majority of this interview took place in London during June 2022.
arXiv link: http://arxiv.org/abs/2411.00886v1
Bagging the Network
formation model with nontransferable utilities, incorporating observed
covariates and unobservable individual fixed effects. We address both
theoretical and computational challenges of maximum likelihood estimation in
this complex network model by proposing a new bootstrap aggregating (bagging)
estimator, which is asymptotically normal, unbiased, and efficient. We extend
the approach to estimating average partial effects and analyzing link function
misspecification. Simulations demonstrate strong finite-sample performance. Two
empirical applications to Nyakatoke risk-sharing networks and Indian
microfinance data find insignificant roles of wealth differences in link
formation and the strong influence of caste in Indian villages, respectively.
arXiv link: http://arxiv.org/abs/2410.23852v2
Machine Learning Debiasing with Conditional Moment Restrictions: An Application to LATE
These models involve finite and infinite dimensional parameters. The infinite
dimensional components include conditional expectations, conditional choice
probabilities, or policy functions, which might be flexibly estimated using
Machine Learning tools. This paper presents a characterization of locally
debiased moments for regular models defined by general semiparametric CMRs with
possibly different conditioning variables. These moments are appealing as they
are known to be less affected by first-step bias. Additionally, we study their
existence and relevance. Such results apply to a broad class of smooth
functionals of finite and infinite dimensional parameters that do not
necessarily appear in the CMRs. As a leading application of our theory, we
characterize debiased machine learning for settings of treatment effects with
endogeneity, giving necessary and sufficient conditions. We present a large
class of relevant debiased moments in this context. We then propose the
Compliance Machine Learning Estimator (CML), based on a practically convenient
orthogonal relevant moment. We show that the resulting estimand can be written
as a convex combination of conditional local average treatment effects (LATE).
Altogether, CML enjoys three appealing properties in the LATE framework: (1)
local robustness to first-stage estimation, (2) an estimand that can be
identified under a minimal relevance condition, and (3) a meaningful causal
interpretation. Our numerical experimentation shows satisfactory relative
performance of such an estimator. Finally, we revisit the Oregon Health
Insurance Experiment, analyzed by Finkelstein et al. (2012). We find that the
use of machine learning and CML suggest larger positive effects on health care
utilization than previously determined.
arXiv link: http://arxiv.org/abs/2410.23785v1
Moments by Integrating the Moment-Generating Function
random variable with a well-defined moment-generating function (MGF). We derive
new expressions for fractional moments and fractional absolute moments, both
central and non-central moments. The new moment expressions are relatively
simple integrals that involve the MGF, but do not require its derivatives. We
label the new method CMGF because it uses a complex extension of the MGF and
can be used to obtain complex moments. We illustrate the new method with three
applications where the MGF is available in closed-form, while the corresponding
densities and the derivatives of the MGF are either unavailable or very
difficult to obtain.
arXiv link: http://arxiv.org/abs/2410.23587v3
On the consistency of bootstrap for matching estimators
is inconsistent when applied to nearest neighbor matching estimators of the
average treatment effect with a fixed number of matches. Since then, this
finding has inspired numerous efforts to address the inconsistency issue,
typically by employing alternative bootstrap methods. In contrast, this paper
shows that the naive bootstrap is provably consistent for the original matching
estimator, provided that the number of matches, $M$, diverges. The bootstrap
inconsistency identified by Abadie and Imbens (2008) thus arises solely from
the use of a fixed $M$.
arXiv link: http://arxiv.org/abs/2410.23525v2
Inference in Partially Linear Models under Dependent Data with Deep Neural Networks
$\beta$-mixing data after first stage deep neural network (DNN) estimation.
Using the DNN results of Brown (2024), I show that the estimator for the finite
dimensional parameter, constructed using DNN-estimated nuisance components,
achieves $n$-consistency and asymptotic normality. By avoiding sample
splitting, I address one of the key challenges in applying machine learning
techniques to econometric models with dependent data. In a future version of
this work, I plan to extend these results to obtain general conditions for
semiparametric inference after DNN estimation of nuisance components, which
will allow for considerations such as more efficient estimation procedures, and
instrumental variable settings.
arXiv link: http://arxiv.org/abs/2410.22574v1
Forecasting Political Stability in GCC Countries
particularly in geopolitically sensitive regions such as the Gulf Cooperation
Council Countries, Saudi Arabia, UAE, Kuwait, Qatar, Oman, and Bahrain. This
study focuses on predicting the political stability index for these six
countries using machine learning techniques. The study uses data from the World
Banks comprehensive dataset, comprising 266 indicators covering economic,
political, social, and environmental factors. Employing the Edit Distance on
Real Sequence method for feature selection and XGBoost for model training, the
study forecasts political stability trends for the next five years. The model
achieves high accuracy, with mean absolute percentage error values under 10,
indicating reliable predictions. The forecasts suggest that Oman, the UAE, and
Qatar will experience relatively stable political conditions, while Saudi
Arabia and Bahrain may continue to face negative political stability indices.
The findings underscore the significance of economic factors such as GDP and
foreign investment, along with variables related to military expenditure and
international tourism, as key predictors of political stability. These results
provide valuable insights for policymakers, enabling proactive measures to
enhance governance and mitigate potential risks.
arXiv link: http://arxiv.org/abs/2410.21516v2
Economic Diversification and Social Progress in the GCC Countries: A Study on the Transition from Oil-Dependency to Knowledge-Based Economies
and Saudi Arabia -- holds strategic significance due to its large oil reserves.
However, these nations face considerable challenges in shifting from
oil-dependent economies to more diversified, knowledge-based systems. This
study examines the progress of Gulf Cooperation Council (GCC) countries in
achieving economic diversification and social development, focusing on the
Social Progress Index (SPI), which provides a broader measure of societal
well-being beyond just economic growth. Using data from the World Bank,
covering 2010 to 2023, the study employs the XGBoost machine learning model to
forecast SPI values for the period of 2024 to 2026. Key components of the
methodology include data preprocessing, feature selection, and the simulation
of independent variables through ARIMA modeling. The results highlight
significant improvements in education, healthcare, and women's rights,
contributing to enhanced SPI performance across the GCC countries. However,
notable challenges persist in areas like personal rights and inclusivity. The
study further indicates that despite economic setbacks caused by global
disruptions, including the COVID-19 pandemic and oil price volatility, GCC
nations are expected to see steady improvements in their SPI scores through
2027. These findings underscore the critical importance of economic
diversification, investment in human capital, and ongoing social reforms to
reduce dependence on hydrocarbons and build knowledge-driven economies. This
research offers valuable insights for policymakers aiming to strengthen both
social and economic resilience in the region while advancing long-term
sustainable development goals.
arXiv link: http://arxiv.org/abs/2410.21505v2
Difference-in-Differences with Time-varying Continuous Treatments using Double/Debiased Machine Learning
continuous treatment and multiple time periods. Our framework assesses the
average treatment effect on the treated (ATET) when comparing two non-zero
treatment doses. The identification is based on a conditional parallel trend
assumption imposed on the mean potential outcome under the lower dose, given
observed covariates and past treatment histories. We employ kernel-based ATET
estimators for repeated cross-sections and panel data adopting the
double/debiased machine learning framework to control for covariates and past
treatment histories in a data-adaptive manner. We also demonstrate the
asymptotic normality of our estimation approach under specific regularity
conditions. In a simulation study, we find a compelling finite sample
performance of undersmoothed versions of our estimators in setups with several
thousand observations.
arXiv link: http://arxiv.org/abs/2410.21105v1
On Spatio-Temporal Stochastic Frontier Models
joint consideration of spatial and temporal dimensions was often inadequately
addressed, if not completely neglected. However, from an evolutionary economics
perspective, the production process of the decision-making units constantly
changes over both dimensions: it is not stable over time due to managerial
enhancements and/or internal or external shocks, and is influenced by the
nearest territorial neighbours. This paper proposes an extension of the Fusco
and Vidoli [2013] SEM-like approach, which globally accounts for spatial and
temporal effects in the term of inefficiency. In particular, coherently with
the stochastic panel frontier literature, two different versions of the model
are proposed: the time-invariant and the time-varying spatial stochastic
frontier models. In order to evaluate the inferential properties of the
proposed estimators, we first run Monte Carlo experiments and we then present
the results of an application to a set of commonly referenced data,
demonstrating robustness and stability of estimates across all scenarios.
arXiv link: http://arxiv.org/abs/2410.20915v1
A Distributed Lag Approach to the Generalised Dynamic Factor Model (GDFM)
(GDFM) under the assumption that the dynamic common component can be expressed
in terms of a finite number of lags of contemporaneously pervasive factors. The
proposed estimator is simply an OLS regression of the observed variables on
factors extracted via static principal components and therefore avoids
frequency domain techniques entirely.
arXiv link: http://arxiv.org/abs/2410.20885v1
Robust Network Targeting with Multiple Nash Equilibria
rules to maximize the equilibrium social welfare of interacting agents.
Focusing on large-scale simultaneous decision games with strategic
complementarities, we develop a method to estimate an optimal treatment
allocation rule that is robust to the presence of multiple equilibria. Our
approach remains agnostic about changes in the equilibrium selection mechanism
under counterfactual policies, and we provide a closed-form expression for the
boundary of the set-identified equilibrium outcomes. To address the
incompleteness that arises when an equilibrium selection mechanism is not
specified, we use the maximin welfare criterion to select a policy, and
implement this policy using a greedy algorithm. We establish a performance
guarantee for our method by deriving a welfare regret bound, which accounts for
sampling uncertainty and the use of the greedy algorithm. We demonstrate our
method with an application to the microfinance dataset of Banerjee et al.
(2013).
arXiv link: http://arxiv.org/abs/2410.20860v2
International vulnerability of inflation
responsive to domestic economic activity, while being increasingly determined
by international conditions. Consequently, understanding the international
sources of vulnerability of domestic inflation is turning fundamental for
policy makers. In this paper, we propose the construction of Inflation-at-risk
and Deflation-at-risk measures of vulnerability obtained using factor-augmented
quantile regressions estimated with international factors extracted from a
multi-level Dynamic Factor Model with overlapping blocks of inflations
corresponding to economies grouped either in a given geographical region or
according to their development level. The methodology is implemented to
inflation observed monthly from 1999 to 2022 for over 115 countries. We
conclude that, in a large number of developed countries, international factors
are relevant to explain the right tail of the distribution of inflation, and,
consequently, they are more relevant for the vulnerability related to high
inflation than for average or low inflation. However, while inflation of
developing low-income countries is hardly affected by international conditions,
the results for middle-income countries are mixed. Finally, based on a
rolling-window out-of-sample forecasting exercise, we show that the predictive
power of international factors has increased in the most recent years of high
inflation.
arXiv link: http://arxiv.org/abs/2410.20628v2
Jacobian-free Efficient Pseudo-Likelihood (EPL) Algorithm
(EPL) estimator proposed by Dearing and Blevins (2024) for estimating dynamic
discrete games, without computing Jacobians of equilibrium constraints. EPL
estimator is efficient, convergent, and computationally fast. However, the
original algorithm requires deriving and coding the Jacobians, which are
cumbersome and prone to coding mistakes especially when considering complicated
models. The current study proposes to avoid the computation of Jacobians by
combining the ideas of numerical derivatives (for computing Jacobian-vector
products) and the Krylov method (for solving linear equations). It shows good
computational performance of the proposed method by numerical experiments.
arXiv link: http://arxiv.org/abs/2410.20029v1
Testing the effects of an unobservable factor: Do marriage prospects affect college major choice?
major choices, this paper develops a new econometric test for analyzing the
effects of an unobservable factor in a setting where this factor potentially
influences both agents' decisions and a binary outcome variable. Our test is
built upon a flexible copula-based estimation procedure and leverages the
ordered nature of latent utilities of the polychotomous choice model. Using the
proposed method, we demonstrate that marriage prospects significantly influence
the college major choices of college graduates participating in the National
Longitudinal Study of Youth (97) Survey. Furthermore, we validate the
robustness of our findings with alternative tests that use stated marriage
expectation measures from our data, thereby demonstrating the applicability and
validity of our testing procedure in real-life scenarios.
arXiv link: http://arxiv.org/abs/2410.19947v1
Unified Causality Analysis Based on the Degrees of Freedom
challenge in accurate modeling is understanding the causal relationships
between subsystems, as well as identifying the presence and influence of
unobserved hidden drivers on the observed dynamics. This paper presents a
unified method capable of identifying fundamental causal relationships between
pairs of systems, whether deterministic or stochastic. Notably, the method also
uncovers hidden common causes beyond the observed variables. By analyzing the
degrees of freedom in the system, our approach provides a more comprehensive
understanding of both causal influence and hidden confounders. This unified
framework is validated through theoretical models and simulations,
demonstrating its robustness and potential for broader application.
arXiv link: http://arxiv.org/abs/2410.19469v1
Robust Time Series Causal Discovery for Agent-Based Model Validation
reliability of simulations, and causal discovery has become a powerful tool in
this context. However, current causal discovery methods often face accuracy and
robustness challenges when applied to complex and noisy time series data, which
is typical in ABM scenarios. This study addresses these issues by proposing a
Robust Cross-Validation (RCV) approach to enhance causal structure learning for
ABM validation. We develop RCV-VarLiNGAM and RCV-PCMCI, novel extensions of two
prominent causal discovery algorithms. These aim to reduce the impact of noise
better and give more reliable causal relation results, even with
high-dimensional, time-dependent data. The proposed approach is then integrated
into an enhanced ABM validation framework, which is designed to handle diverse
data and model structures.
The approach is evaluated using synthetic datasets and a complex simulated
fMRI dataset. The results demonstrate greater reliability in causal structure
identification. The study examines how various characteristics of datasets
affect the performance of established causal discovery methods. These
characteristics include linearity, noise distribution, stationarity, and causal
structure density. This analysis is then extended to the RCV method to see how
it compares in these different situations. This examination helps confirm
whether the results are consistent with existing literature and also reveals
the strengths and weaknesses of the novel approaches.
By tackling key methodological challenges, the study aims to enhance ABM
validation with a more resilient valuation framework presented. These
improvements increase the reliability of model-driven decision making processes
in complex systems analysis.
arXiv link: http://arxiv.org/abs/2410.19412v1
Inference on Multiple Winners with Applications to Economic Mobility
setting, a winner is defined abstractly as any population whose rank according
to some random quantity, such as an estimated treatment effect, a measure of
value-added, or benefit (net of cost), falls in a pre-specified range of
values. As such, this framework generalizes the inference on a single winner
setting previously considered in Andrews et al. (2023), in which a winner is
understood to be the single population whose rank according to some random
quantity is highest. We show that this richer setting accommodates a broad
variety of empirically-relevant applications. We develop a two-step method for
inference in the spirit of Romano et al. (2014), which we compare to existing
methods or their natural generalizations to this setting. We first show the
finite-sample validity of this method in a normal location model and then
develop asymptotic counterparts to these results by proving uniform validity
over a large class of distributions satisfying a weak uniform integrability
condition. Importantly, our results permit degeneracy in the covariance matrix
of the limiting distribution, which arises naturally in many applications. In
an application to the literature on economic mobility, we find that it is
difficult to distinguish between high and low mobility census tracts when
correcting for selection. Finally, we demonstrate the practical relevance of
our theoretical results through an extensive set of simulations.
arXiv link: http://arxiv.org/abs/2410.19212v4
Heterogeneous Treatment Effects via Linear Dynamic Panel Data Models
(TE) when potential outcomes depend on past treatments. First, applying a
dynamic panel data model to observed outcomes, we show that an instrumental
variable (IV) version of the estimand in Arellano and Bond (1991) recovers a
non-convex (negatively weighted) aggregate of TE plus non-vanishing trends. We
then provide conditions on sequential exchangeability (SE) of treatment and on
TE heterogeneity that reduce such an IV estimand to a convex (positively
weighted) aggregate of TE. Second, even when SE is generically violated, such
estimands identify causal parameters when potential outcomes are generated by
dynamic panel data models with some homogeneity or mild selection assumptions.
Finally, we motivate SE and compare it with parallel trends (PT) in various
settings with experimental data (when treatments are sequentially randomized)
and observational data (when treatments are dynamic, rational choices under
learning).
arXiv link: http://arxiv.org/abs/2410.19060v2
Inference on High Dimensional Selective Labeling Models
observed binary outcomes are themselves a consequence of the existing choices
of of one of the agents in the model. These models are gaining increasing
interest in the computer science and machine learning literatures where they
refer the potentially endogenous sample selection as the {\em selective labels}
problem. Empirical settings for such models arise in fields as diverse as
criminal justice, health care, and insurance. For important recent work in this
area, see for example Lakkaruju et al. (2017), Kleinberg et al. (2018), and
Coston et al.(2021) where the authors focus on judicial bail decisions, and
where one observes the outcome of whether a defendant filed to return for their
court appearance only if the judge in the case decides to release the defendant
on bail. Identifying and estimating such models can be computationally
challenging for two reasons. One is the nonconcavity of the bivariate
likelihood function, and the other is the large number of covariates in each
equation. Despite these challenges, in this paper we propose a novel
distribution free estimation procedure that is computationally friendly in many
covariates settings. The new method combines the semiparametric batched
gradient descent algorithm introduced in Khan et al.(2023) with a novel sorting
algorithms incorporated to control for selection bias. Asymptotic properties of
the new procedure are established under increasing dimension conditions in both
equations, and its finite sample properties are explored through a simulation
study and an application using judicial bail data.
arXiv link: http://arxiv.org/abs/2410.18381v3
Partially Identified Rankings from Pairwise Interactions
merits using data from pairwise interactions. We allow for incomplete
observation of these interactions and study what can be inferred about rankings
in such settings. First, we show that identification of the ranking depends on
a trade-off between the tournament graph and the interaction function: in
parametric models, such as the Bradley-Terry-Luce, rankings are point
identified even with sparse graphs, whereas nonparametric models require dense
graphs. Second, moving beyond point identification, we characterize the
identified set in the nonparametric model under any tournament structure and
represent it through moment inequalities. Finally, we propose a
likelihood-based statistic to test whether a ranking belongs to the identified
set. We study two testing procedures: one is finite-sample valid but
computationally intensive; the other is easy to implement and valid
asymptotically. We illustrate our results using Brazilian employer-employee
data to study how workers rank firms when moving across jobs.
arXiv link: http://arxiv.org/abs/2410.18272v2
Detecting Spatial Outliers: the Role of the Local Influence Function
outliers is essential for accurately interpreting geographical phenomena. While
spatial correlation measures, particularly Local Indicators of Spatial
Association (LISA), are widely used to detect spatial patterns, the presence of
abnormal observations frequently distorts the landscape and conceals critical
spatial relationships. These outliers can significantly impact analysis due to
the inherent spatial dependencies present in the data. Traditional influence
function (IF) methodologies, commonly used in statistical analysis to measure
the impact of individual observations, are not directly applicable in the
spatial context because the influence of an observation is determined not only
by its own value but also by its spatial location, its connections with
neighboring regions, and the values of those neighboring observations. In this
paper, we introduce a local version of the influence function (LIF) that
accounts for these spatial dependencies. Through the analysis of both simulated
and real-world datasets, we demonstrate how the LIF provides a more nuanced and
accurate detection of spatial outliers compared to traditional LISA measures
and local impact assessments, improving our understanding of spatial patterns.
arXiv link: http://arxiv.org/abs/2410.18261v1
On the Existence of One-Sided Representations for the Generalised Dynamic Factor Model
(GDFM) can be represented using only current and past observations basically
whenever it is purely non-deterministic.
arXiv link: http://arxiv.org/abs/2410.18159v3
A Bayesian Perspective on the Maximum Score Problem
threshold-crossing binary choice model that satisfies a median independence
restriction. The key idea is that the model is observationally equivalent to a
probit model with nonparametric heteroskedasticity. Consequently, Gibbs
sampling techniques from Albert and Chib (1993) and Chib and Greenberg (2013)
lead to a computationally attractive Bayesian inference procedure in which a
Gaussian process forms a conditionally conjugate prior for the natural
logarithm of the skedastic function.
arXiv link: http://arxiv.org/abs/2410.17153v1
General Seemingly Unrelated Local Projections
responses using Local Projections (LPs) with instrumental variables. It
accommodates multiple shocks and instruments, accounts for autocorrelation in
multi-step forecasts by jointly modeling all LPs as a seemingly unrelated
system of equations, defines a flexible yet parsimonious joint prior for
impulse responses based on a Gaussian Process, and allows for joint inference
about the entire vector of impulse responses. We show via Monte Carlo
simulations that our approach delivers more accurate point and uncertainty
estimates than standard methods. To address potential misspecification, we
propose an optional robustification step based on power posteriors.
arXiv link: http://arxiv.org/abs/2410.17105v3
Identifying Conduct Parameters with Separable Demand: A Counterexample to Lau (1982)
established in the foundational work of Lau (1982), which generalizes the
identification theorem of Bresnahan (1982) by relaxing the linearity
assumptions. We identify a separable demand function that still permits
identification and validate this case both theoretically and through numerical
simulations.
arXiv link: http://arxiv.org/abs/2410.16998v1
A Dynamic Spatiotemporal and Network ARCH Model with Common Factors
traditional approaches by incorporating spatial, temporal, and spatiotemporal
spillover effects, along with volatility-specific observed and latent factors.
The model offers a more general network interpretation, making it applicable
for studying various types of network spillovers. The primary innovation lies
in incorporating volatility-specific latent factors into the dynamic
spatiotemporal volatility model. Using Bayesian estimation via the Markov Chain
Monte Carlo (MCMC) method, the model offers a robust framework for analyzing
the spatial, temporal, and spatiotemporal effects of a log-squared outcome
variable on its volatility. We recommend using the deviance information
criterion (DIC) and a regularized Bayesian MCMC method to select the number of
relevant factors in the model. The model's flexibility is demonstrated through
two applications: a spatiotemporal model applied to the U.S. housing market and
another applied to financial stock market networks, both highlighting the
model's ability to capture varying degrees of interconnectedness. In both
applications, we find strong spatial/network interactions with relatively
stronger spillover effects in the stock market.
arXiv link: http://arxiv.org/abs/2410.16526v1
Asymmetries in Financial Spillovers
financial shocks originating in the US. To do so, we develop a flexible
nonlinear multi-country model. Our framework is capable of producing
asymmetries in the responses to financial shocks for shock size and sign, and
over time. We show that international reactions to US-based financial shocks
are asymmetric along these dimensions. Particularly, we find that adverse
shocks trigger stronger declines in output, inflation, and stock markets than
benign shocks. Further, we investigate time variation in the estimated dynamic
effects and characterize the responsiveness of three major central banks to
financial shocks.
arXiv link: http://arxiv.org/abs/2410.16214v1
Dynamic Biases of Static Panel Data Estimators
effects panel estimators that arises when dynamic feedback is ignored in the
estimating equation. Dynamic feedback occurs if past outcomes impact current
outcomes, a feature of many settings ranging from economic growth to
agricultural and labor markets. When estimating equations omit past outcomes,
dynamic bias can lead to significantly inaccurate treatment effect estimates,
even with randomly assigned treatments. This dynamic bias in simulations is
larger than Nickell bias. I show that dynamic bias stems from the estimation of
fixed effects, as their estimation generates confounding in the data. To
recover consistent treatment effects, I develop a flexible estimator that
provides fixed-T bias correction. I apply this approach to study the impact of
temperature shocks on GDP, a canonical example where economic theory points to
an important feedback from past to future outcomes. Accounting for dynamic bias
lowers the estimated effects of higher yearly temperatures on GDP growth by 10%
and GDP levels by 120%.
arXiv link: http://arxiv.org/abs/2410.16112v1
Semiparametric Bayesian Inference for a Conditional Moment Equality Model
economics, yet they are difficult to estimate. These models map a conditional
distribution of data to a structural parameter via the restriction that a
conditional mean equals zero. Using this observation, I introduce a Bayesian
inference framework in which an unknown conditional distribution is replaced
with a nonparametric posterior, and structural parameter inference is then
performed using an implied posterior. The method has the same flexibility as
frequentist semiparametric estimators and does not require converting
conditional moments to unconditional moments. Importantly, I prove a
semiparametric Bernstein-von Mises theorem, providing conditions under which,
in large samples, the posterior for the structural parameter is approximately
normal, centered at an efficient estimator, and has variance equal to the
Chamberlain (1987) semiparametric efficiency bound. As byproducts, I show that
Bayesian uncertainty quantification methods are asymptotically optimal
frequentist confidence sets and derive low-level sufficient conditions for
Gaussian process priors. The latter sheds light on a key prior stability
condition and relates to the numerical aspects of the paper in which these
priors are used to predict the welfare effects of price changes.
arXiv link: http://arxiv.org/abs/2410.16017v1
Quantifying world geography as seen through the lens of Soviet propaganda
geographical locations are unequally portrayed in media, creating a distorted
representation of the world. Identifying and measuring such biases is crucial
to understand both the data and the socio-cultural processes that have produced
them. Here we suggest to measure geographical biases in a large historical news
media corpus by studying the representation of cities. Leveraging ideas of
quantitative urban science, we develop a mixed quantitative-qualitative
procedure, which allows us to get robust quantitative estimates of the biases.
These biases can be further qualitatively interpreted resulting in a
hermeneutic feedback loop. We apply this procedure to a corpus of the Soviet
newsreel series 'Novosti Dnya' (News of the Day) and show that city
representation grows super-linearly with city size, and is further biased by
city specialization and geographical location. This allows to systematically
identify geographical regions which are explicitly or sneakily emphasized by
Soviet propaganda and quantify their importance.
arXiv link: http://arxiv.org/abs/2410.15938v2
A Kernelization-Based Approach to Nonparametric Binary Choice Models
not impose a parametric structure on either the systematic function of
covariates or the distribution of the error term. A key advantage of our
approach is its computational efficiency. For instance, even when assuming a
normal error distribution as in probit models, commonly used sieves for
approximating an unknown function of covariates can lead to a large-dimensional
optimization problem when the number of covariates is moderate. Our approach,
motivated by kernel methods in machine learning, views certain reproducing
kernel Hilbert spaces as special sieve spaces, coupled with spectral cut-off
regularization for dimension reduction. We establish the consistency of the
proposed estimator for both the systematic function of covariates and the
distribution function of the error term, and asymptotic normality of the
plug-in estimator for weighted average partial derivatives. Simulation studies
show that, compared to parametric estimation methods, the proposed method
effectively improves finite sample performance in cases of misspecification,
and has a rather mild efficiency loss if the model is correctly specified.
Using administrative data on the grant decisions of US asylum applications to
immigration courts, along with nine case-day variables on weather and
pollution, we re-examine the effect of outdoor temperature on court judges'
"mood", and thus, their grant decisions.
arXiv link: http://arxiv.org/abs/2410.15734v1
Distributionally Robust Instrumental Variables Estimation
econometrics and statistics for estimating causal effects in the presence of
unobserved confounding. However, challenges such as untestable model
assumptions and poor finite sample properties have undermined its reliability
in practice. Viewing common issues in IV estimation as distributional
uncertainties, we propose DRIVE, a distributionally robust IV estimation
method. We show that DRIVE minimizes a square root variant of ridge regularized
two stage least squares (TSLS) objective when the ambiguity set is based on a
Wasserstein distance. In addition, we develop a novel asymptotic theory for
this estimator, showing that it achieves consistency without requiring the
regularization parameter to vanish. This novel property ensures that the
estimator is robust to distributional uncertainties that persist in large
samples. We further derive the asymptotic distribution of Wasserstein DRIVE and
propose data-driven procedures to select the regularization parameter based on
theoretical results. Simulation studies demonstrate the superior finite sample
performance of Wasserstein DRIVE in terms of estimation error and out-of-sample
prediction. Due to its regularization and robustness properties, Wasserstein
DRIVE presents an appealing option when the practitioner is uncertain about
model assumptions or distributional shifts in data.
arXiv link: http://arxiv.org/abs/2410.15634v2
Conformal Predictive Portfolio Selection
returns. Portfolio selection is a fundamental task in finance, and a variety of
methods have been developed to achieve this goal. For instance, the
mean-variance approach constructs portfolios by balancing the trade-off between
the mean and variance of asset returns, while the quantile-based approach
optimizes portfolios by considering tail risk. These methods often depend on
distributional information estimated from historical data using predictive
models, each of which carries its own uncertainty. To address this, we propose
a framework for predictive portfolio selection via conformal prediction ,
called Conformal Predictive Portfolio Selection (CPPS). Our approach
forecasts future portfolio returns, computes the corresponding prediction
intervals, and selects the portfolio of interest based on these intervals. The
framework is flexible and can accommodate a wide range of predictive models,
including autoregressive (AR) models, random forests, and neural networks. We
demonstrate the effectiveness of the CPPS framework by applying it to an AR
model and validate its performance through empirical studies, showing that it
delivers superior returns compared to simpler strategies.
arXiv link: http://arxiv.org/abs/2410.16333v2
Predictive Quantile Regression with High-Dimensional Predictors: The Variable Screening Approach
quantile forecasts using high-dimensional predictors. We have refined and
augmented the quantile partial correlation (QPC)-based variable screening
proposed by Ma et al. (2017) to accommodate $\beta$-mixing time-series data.
Our approach is inclusive of i.i.d scenarios but introduces new convergence
bounds for time-series contexts, suggesting the performance of QPC-based
screening is influenced by the degree of time-series dependence. Through Monte
Carlo simulations, we validate the effectiveness of QPC under weak dependence.
Our empirical assessment of variable selection for growth-at-risk (GaR)
forecasting underscores the method's advantages, revealing that specific labor
market determinants play a pivotal role in forecasting GaR. While prior
empirical research has predominantly considered a limited set of predictors, we
employ the comprehensive Fred-QD dataset, retaining a richer breadth of
information for GaR forecasts.
arXiv link: http://arxiv.org/abs/2410.15097v1
Fast and Efficient Bayesian Analysis of Structural Vector Autoregressions Using the R Package bsvars
macroeconomic and financial analyses using Bayesian Structural Vector
Autoregressions. It uses frontier econometric techniques and C++ code to ensure
fast and efficient estimation of these multivariate dynamic structural models,
possibly with many variables, complex identification strategies, and non-linear
characteristics. The models can be identified using adjustable exclusion
restrictions and heteroskedastic or non-normal shocks. They feature a flexible
three-level equation-specific local-global hierarchical prior distribution for
the estimated level of shrinkage for autoregressive and structural parameters.
Additionally, the package facilitates predictive and structural analyses such
as impulse responses, forecast error variance and historical decompositions,
forecasting, statistical verification of identification and hypotheses on
autoregressive parameters, and analyses of structural shocks, volatilities, and
fitted values. These features differentiate bsvars from existing R packages
that either focus on a specific structural model, do not consider
heteroskedastic shocks, or lack the implementation using compiled code.
arXiv link: http://arxiv.org/abs/2410.15090v2
Switchback Price Experiments with Forward-Looking Demand
single product, with infinite supply. In each period, the seller chooses a
price $p$ from a set of predefined prices that consist of a reference price and
a few discounted price levels. The goal is to estimate the demand gradient at
the reference price point, with the goal of adjusting the reference price to
improve revenue after the experiment. In our model, in each period, a unit mass
of buyers arrives on the market, with values distributed based on a
time-varying process. Crucially, buyers are forward looking with a discounted
utility and will choose to not purchase now if they expect to face a discounted
price in the near future. We show that forward-looking demand introduces bias
in naive estimators of the demand gradient, due to intertemporal interference.
Furthermore, we prove that there is no estimator that uses data from price
experiments with only two price points that can recover the correct demand
gradient, even in the limit of an infinitely long experiment with an
infinitesimal price discount. Moreover, we characterize the form of the bias of
naive estimators. Finally, we show that with a simple three price level
experiment, the seller can remove the bias due to strategic forward-looking
behavior and construct an estimator for the demand gradient that asymptotically
recovers the truth.
arXiv link: http://arxiv.org/abs/2410.14904v1
Learning the Effect of Persuasion via Difference-In-Differences
impact of informational treatments on behavior. We introduce two causal
parameters, the forward and backward average persuasion rates on the treated,
which refine the average treatment effect on the treated. The forward rate
excludes cases of "preaching to the converted," while the backward rate omits
"talking to a brick wall" cases. We propose both regression-based and
semiparametrically efficient estimators. The framework applies to both
two-period and staggered treatment settings, including event studies, and we
demonstrate its usefulness with applications to a British election and a
Chinese curriculum reform.
arXiv link: http://arxiv.org/abs/2410.14871v3
A GARCH model with two volatility components and two driving factors
to better capture the rich, multi-component dynamics often observed in the
volatility of financial assets. This model provides a quasi closed-form
representation of the characteristic function for future log-returns, from
which semi-analytical formulas for option pricing can be derived. A theoretical
analysis is conducted to establish sufficient conditions for strict
stationarity and geometric ergodicity, while also obtaining the continuous-time
diffusion limit of the model. Empirical evaluations, conducted both in-sample
and out-of-sample using S&P500 time series data, show that our model
outperforms widely used single-factor models in predicting returns and option
prices.
arXiv link: http://arxiv.org/abs/2410.14585v1
GARCH option valuation with long-run and short-run volatility components: A novel framework ensuring positive variance
improved Generalized Autoregressive Conditional Heteroskedasticity (GARCH)
model for valuing European options, where the return volatility is comprised of
two distinct components. Empirical studies indicate that the model developed by
CJOW outperforms widely-used single-component GARCH models and provides a
superior fit to options data than models that combine conditional
heteroskedasticity with Poisson-normal jumps. However, a significant limitation
of this model is that it allows the variance process to become negative. Oh and
Park [2023] partially addressed this issue by developing a related model, yet
the positivity of the volatility components is not guaranteed, both
theoretically and empirically. In this paper we introduce a new GARCH model
that improves upon the models by CJOW and Oh and Park [2023], ensuring the
positivity of the return volatility. In comparison to the two earlier GARCH
approaches, our novel methodology shows comparable in-sample performance on
returns data and superior performance on S&P500 options data.
arXiv link: http://arxiv.org/abs/2410.14513v1
Identification of a Rank-dependent Peer Effect Model
endogenous spillover to be linear in ordered peer outcomes. Unlike the
canonical linear-in-means model, our approach accounts for the distribution of
peer outcomes as well as the size of peer groups. Under a minimal condition,
our model admits a unique equilibrium and is therefore tractable and
identified. Simulations show our estimator has good finite sample performance.
Finally, we apply our model to educational data from Norway, finding that
higher-performing friends disproportionately drive GPA spillovers. Our
framework provides new insights into the structure of peer effects beyond
aggregate measures.
arXiv link: http://arxiv.org/abs/2410.14317v2
The Subtlety of Optimal Paternalism in a Population with Bounded Rationality
power to design a discrete choice set for a heterogeneous population with
bounded rationality. We show that the policy that most effectively constrains
or influences choices depends in a particular multiplicative way on the
preferences of the population and on the choice probabilities conditional on
preferences that measure the suboptimality of behavior. We first consider the
planning problem in abstraction. We then study two settings in which the
planner may mandate an action or decentralize decision making. In one setting,
we suppose that individuals measure utility with additive random error and
maximize mismeasured rather than actual utility. Then optimal planning requires
knowledge of the distribution of measurement errors. In the second setting, we
consider binary treatment choice under uncertainty when the planner can mandate
a treatment conditional on publicly observed personal covariates or can enable
individuals to choose their own treatments conditional on private information.
We focus on situations where bounded rationality takes the form of deviations
between subjective personal beliefs and objective probabilities of uncertain
outcomes. To illustrate, we consider clinical decision making in medicine. In
toto, our analysis is cautionary. It characterizes the subtle nature of optimal
policy, whose determination requires the planner to possess extensive knowledge
that is rarely available. We conclude that studies of policy choice by a
paternalistic utilitarian planner should view not only the population but also
the planner to be boundedly rational.
arXiv link: http://arxiv.org/abs/2410.13658v2
Counterfactual Analysis in Empirical Games
partially identified parameters, and multiple equilibria and/or randomized
strategies, by constructing and analyzing the counterfactual predictive
distribution set (CPDS). This framework accommodates various outcomes of
interest, including behavioral and welfare outcomes. It allows a variety of
changes to the environment to generate the counterfactual, including
modifications of the utility functions, the distribution of utility
determinants, the number of decision makers, and the solution concept. We use a
Bayesian approach to summarize statistical uncertainty. We establish conditions
under which the population CPDS is sharp from the point of view of
identification. We also establish conditions under which the posterior CPDS is
consistent if the posterior distribution for the underlying model parameter is
consistent. Consequently, our results can be employed to conduct counterfactual
analysis after a preliminary step of identifying and estimating the underlying
model parameter based on the existing literature. Our consistency results
involve the development of a new general theory for Bayesian consistency of
posterior distributions for mappings of sets. Although we primarily focus on a
model of a strategic game, our approach is applicable to other structural
models with similar features.
arXiv link: http://arxiv.org/abs/2410.12731v1
A Simple Interactive Fixed Effects Estimator for Short Panels
conventional additive effects (AE) model. For the AE model, the fixed effects
estimator can be obtained by applying least squares to a regression that adds a
linear projection of the fixed effect on the explanatory variables (Mundlak,
1978; Chamberlain, 1984). In this paper, we develop a novel estimator -- the
projection-based IE (PIE) estimator -- for the IE model that is based on a
similar approach. We show that, for the IE model, fixed effects estimators that
have appeared in the literature are not equivalent to our PIE estimator, though
both can be expressed as a generalized within estimator. Unlike the fixed
effects estimators for the IE model, the PIE estimator is consistent for a
fixed number of time periods with no restrictions on serial correlation or
conditional heteroskedasticity in the errors. We also derive a statistic for
testing the consistency of the two-way fixed effects estimator in the possible
presence of iterative effects. Moreover, although the PIE estimator is the
solution to a high-dimensional nonlinear least squares problem, we show that it
can be computed by iterating between two steps, both of which have simple
analytical solutions. The computational simplicity is an important advantage
relative to other strategies that have been proposed for estimating the IE
model for short panels. Finally, we compare the finite sample performance of IE
estimators through simulations.
arXiv link: http://arxiv.org/abs/2410.12709v1
Testing Identifying Assumptions in Parametric Separable Models: A Conditional Moment Inequality Approach
in parametric separable models, namely treatment exogeneity, instrument
validity, and/or homoskedasticity. We show that the testable implications can
be written in the intersection bounds framework, which is easy to implement
using the inference method proposed in Chernozhukov, Lee, and Rosen (2013), and
the Stata package of Chernozhukov et al. (2015). Monte Carlo simulations
confirm that our test is consistent and controls size. We use our proposed
method to test the validity of some commonly used instrumental variables, such
as the average price in other markets in Nevo and Rosen (2012), the Bartik
instrument in Card (2009), and the test rejects both instrumental variable
models. When the identifying assumptions are rejected, we discuss solutions
that allow researchers to identify some causal parameters of interest after
relaxing functional form assumptions. We show that the IV model is nontestable
if no functional form assumption is made on the outcome equation, when there
exists a one-to-one mapping between the continuous treatment variable, the
instrument, and the first-stage unobserved heterogeneity.
arXiv link: http://arxiv.org/abs/2410.12098v1
Aggregation Trees
is a key concern for researchers and policymakers. A common approach is to
report average treatment effects across subgroups based on observable
covariates. However, the choice of subgroups is crucial as it poses the risk of
$p$-hacking and requires balancing interpretability with granularity. This
paper proposes a nonparametric approach to construct heterogeneous subgroups.
The approach enables a flexible exploration of the trade-off between
interpretability and the discovery of more granular heterogeneity by
constructing a sequence of nested groupings, each with an optimality property.
By integrating our approach with "honesty" and debiased machine learning, we
provide valid inference about the average treatment effect of each group. We
validate the proposed methodology through an empirical Monte-Carlo study and
apply it to revisit the impact of maternal smoking on birth weight, revealing
systematic heterogeneity driven by parental and birth-related characteristics.
arXiv link: http://arxiv.org/abs/2410.11408v2
Closed-form estimation and inference for panels with attrition and refreshment samples
auxiliary (refreshment) sampling restores full identification under additional
assumptions that still allow for nontrivial attrition mechanisms. Such
identification results rely on implausible assumptions about the attrition
process or lead to theoretically and computationally challenging estimation
procedures. We propose an alternative identifying assumption that, despite its
nonparametric nature, suggests a simple estimation algorithm based on a
transformation of the empirical cumulative distribution function of the data.
This estimation procedure requires neither tuning parameters nor optimization
in the first step, i.e., has a closed form. We prove that our estimator is
consistent and asymptotically normal and demonstrate its good performance in
simulations. We provide an empirical illustration with income data from the
Understanding America Study.
arXiv link: http://arxiv.org/abs/2410.11263v2
Statistical Properties of Deep Neural Networks with Dependent Data
estimators under dependent data. Two general results for nonparametric sieve
estimators directly applicable to DNN estimators are given. The first
establishes rates for convergence in probability under nonstationary data. The
second provides non-asymptotic probability bounds on $L^{2}$-errors
under stationary $\beta$-mixing data. I apply these results to DNN estimators
in both regression and classification contexts imposing only a standard
H\"older smoothness assumption. The DNN architectures considered are common in
applications, featuring fully connected feedforward networks with any
continuous piecewise linear activation function, unbounded weights, and a width
and depth that grows with sample size. The framework provided also offers
potential for research into other DNN architectures and time-series
applications.
arXiv link: http://arxiv.org/abs/2410.11113v3
Testing the order of fractional integration in the presence of smooth trends, with an application to UK Great Ratios
stochastic process is fractionally integrated of order $\delta$, where
$|\delta| < 1/2$, when smooth trends are present in the model. We combine the
semi-parametric approach by Iacone, Nielsen & Taylor (2022) to model the short
range dependence with the use of Chebyshev polynomials by Cuestas & Gil-Alana
to describe smooth trends. Our proposed statistics have standard limiting null
distributions and match the asymptotic local power of infeasible tests based on
unobserved errors. We also establish the conditions under which an information
criterion can consistently estimate the order of the Chebyshev polynomial. The
finite sample performance is evaluated using simulations, and an empirical
application is given for the UK Great Ratios.
arXiv link: http://arxiv.org/abs/2410.10749v1
Large Scale Longitudinal Experiments: Estimation and Inference
methods because of computational challenges arising from the presence of
millions of nuisance parameters. We leverage Mundlak's insight that unit
intercepts can be eliminated by using carefully chosen averages of the
regressors to rewrite several common estimators in a form that is amenable to
weighted-least squares estimation with frequency weights. This renders
regressions involving arbitrary strata intercepts tractable with very large
datasets, optionally with the key compression step computed out-of-memory in
SQL. We demonstrate that these methods yield more precise estimates than other
commonly used estimators, and also find that the compression strategy greatly
increases computational efficiency. We provide in-memory (pyfixest) and
out-of-memory (duckreg) python libraries to implement these estimators.
arXiv link: http://arxiv.org/abs/2410.09952v1
Nickell Meets Stambaugh: A Tale of Two Biases in Panel Predictive Regressions
the Nickell bias and the Stambaugh bias imposes challenges for hypothesis
testing. This paper introduces a new estimator, the IVX-X-Jackknife (IVXJ),
which effectively removes this composite bias and reinstates standard
inferential procedures. The IVXJ estimator is inspired by the IVX technique in
time series. In panel data where the cross section is of the same order as the
time dimension, the bias of the baseline panel IVX estimator can be corrected
via an analytical formula by leveraging an innovative X-Jackknife scheme that
divides the time dimension into the odd and even indices. IVXJ is the first
procedure that achieves unified inference across a wide range of modes of
persistence in panel predictive regressions, whereas such unified inference is
unattainable for the popular within-group estimator. Extended to accommodate
long-horizon predictions with multiple regressions, IVXJ is used to examine the
impact of debt levels on financial crises by panel local projection. Our
empirics provide comparable results across different categories of debt.
arXiv link: http://arxiv.org/abs/2410.09825v1
Variance reduction combining pre-experiment and in-experiment data
decision-making for many companies. Increasing the sensitivity of these
experiments, particularly with a fixed sample size, relies on reducing the
variance of the estimator for the average treatment effect (ATE). Existing
methods like CUPED and CUPAC use pre-experiment data to reduce variance, but
their effectiveness depends on the correlation between the pre-experiment data
and the outcome. In contrast, in-experiment data is often more strongly
correlated with the outcome and thus more informative. In this paper, we
introduce a novel method that combines both pre-experiment and in-experiment
data to achieve greater variance reduction than CUPED and CUPAC, without
introducing bias or additional computation complexity. We also establish
asymptotic theory and provide consistent variance estimators for our method.
Applying this method to multiple online experiments at Etsy, we reach
substantial variance reduction over CUPAC with the inclusion of only a few
in-experiment covariates. These results highlight the potential of our approach
to significantly improve experiment sensitivity and accelerate decision-making.
arXiv link: http://arxiv.org/abs/2410.09027v1
On the Lower Confidence Band for the Optimal Welfare in Policy Learning
propose reporting a lower confidence band (LCB). A natural approach to
constructing an LCB is to invert a one-sided t-test based on an efficient
estimator for the optimal welfare. However, we show that for an empirically
relevant class of DGPs, such an LCB can be first-order dominated by an LCB
based on a welfare estimate for a suitable suboptimal treatment policy. We show
that such first-order dominance is possible if and only if the optimal
treatment policy is not “well-separated” from the rest, in the sense of the
commonly imposed margin condition. When this condition fails, standard debiased
inference methods are not applicable. We show that uniformly valid and
easy-to-compute LCBs can be constructed analytically by inverting
moment-inequality tests with the maximum and quasi-likelihood-ratio test
statistics. As an empirical illustration, we revisit the National JTPA study
and find that the proposed LCBs achieve reliable coverage and competitive
length.
arXiv link: http://arxiv.org/abs/2410.07443v3
Collusion Detection with Graph Neural Networks
engage in fraudulent practices. This paper presents an innovative methodology
for detecting and predicting collusion patterns in different national markets
using neural networks (NNs) and graph neural networks (GNNs). GNNs are
particularly well suited to this task because they can exploit the inherent
network structures present in collusion and many other economic problems. Our
approach consists of two phases: In Phase I, we develop and train models on
individual market datasets from Japan, the United States, two regions in
Switzerland, Italy, and Brazil, focusing on predicting collusion in single
markets. In Phase II, we extend the models' applicability through zero-shot
learning, employing a transfer learning approach that can detect collusion in
markets in which training data is unavailable. This phase also incorporates
out-of-distribution (OOD) generalization to evaluate the models' performance on
unseen datasets from other countries and regions. In our empirical study, we
show that GNNs outperform NNs in detecting complex collusive patterns. This
research contributes to the ongoing discourse on preventing collusion and
optimizing detection methodologies, providing valuable guidance on the use of
NNs and GNNs in economic applications to enhance market fairness and economic
welfare.
arXiv link: http://arxiv.org/abs/2410.07091v1
Group Shapley Value and Counterfactual Simulations in a Structural Model
interpret counterfactual simulations in structural economic models by
quantifying the importance of different components. Our framework compares two
sets of parameters, partitioned into multiple groups, and applying group
Shapley value decomposition yields unique additive contributions to the changes
between these sets. The relative contributions sum to one, enabling us to
generate an importance table that is as easily interpretable as a regression
table. The group Shapley value can be characterized as the solution to a
constrained weighted least squares problem. Using this property, we develop
robust decomposition methods to address scenarios where inputs for the group
Shapley value are missing. We first apply our methodology to a simple Roy model
and then illustrate its usefulness by revisiting two published papers.
arXiv link: http://arxiv.org/abs/2410.06875v1
Green bubbles: a four-stage paradigm for detection and propagation
increasing attention worldwide. While green bubbles may be examined through a
social bubble hypothesis, it is essential not to neglect a Climate Minsky
moment triggered by sudden asset price changes. The significant increase in
green investments highlights the urgent need for a comprehensive understanding
of these market dynamics. Therefore, the current paper introduces a novel
paradigm for studying such phenomena. Focusing on the renewable energy sector,
Statistical Process Control (SPC) methodologies are employed to identify green
bubbles within time series data. Furthermore, search volume indexes and social
factors are incorporated into established econometric models to reveal
potential implications for the financial system. Inspired by Joseph
Schumpeter's perspectives on business cycles, this study recognizes green
bubbles as a necessary evil for facilitating a successful transition towards a
more sustainable future.
arXiv link: http://arxiv.org/abs/2410.06564v1
Persistence-Robust Break Detection in Predictive Quantile and CoVaR Regressions
Adrian and Brunnermeiers's (2016) CoVaR) is important in economics and finance.
However, past research has shown that predictive relationships may be unstable
over time. Therefore, this paper develops structural break tests in predictive
quantile and CoVaR regressions. These tests can detect changes in the
forecasting power of covariates, and are based on the principle of
self-normalization. We show that our tests are valid irrespective of whether
the predictors are stationary or near-stationary, rendering the tests suitable
for a range of practical applications. Simulations illustrate the good
finite-sample properties of our tests. Two empirical applications concerning
equity premium and systemic risk forecasting models show the usefulness of the
tests.
arXiv link: http://arxiv.org/abs/2410.05861v1
The Transmission of Monetary Policy via Common Cycles in the Euro Area
investigate the role of the euro area's common output and inflation cycles in
the transmission of monetary policy shocks. Our findings indicate that common
cycles explain most of the variation in output and inflation across member
countries. However, Southern European economies exhibit a notable divergence
from these cycles in the aftermath of the financial crisis. Building on this
evidence, we demonstrate that monetary policy is homogeneously propagated to
member countries via the common cycles. In contrast, country-specific
transmission channels lead to heterogeneous country responses to monetary
policy shocks. Consequently, our empirical results suggest that the divergent
effects of ECB monetary policy are attributable to heterogeneous
country-specific exposures to financial markets, rather than to
dis-synchronized economies within the euro area.
arXiv link: http://arxiv.org/abs/2410.05741v3
Identification and estimation for matrix time series CP-factor models
for matrix time series. Unlike the generalized eigenanalysis-based method of
Chang et al. (2023) for which the convergence rates of the associated
estimators may suffer from small eigengaps as the asymptotic theory is based on
some matrix perturbation analysis, the proposed new method enjoys faster
convergence rates which are free from any eigengaps. It achieves this by
turning the problem into a joint diagonalization of several matrices whose
elements are determined by a basis of a linear system, and by choosing the
basis carefully to avoid near co-linearity (see Proposition 5 and Section 4.3).
Furthermore, unlike Chang et al. (2023) which requires the two factor loading
matrices to be full-ranked, the proposed new method can handle rank-deficient
factor loading matrices. Illustration with both simulated and real matrix time
series data shows the advantages of the proposed new method.
arXiv link: http://arxiv.org/abs/2410.05634v3
Navigating Inflation in Ghana: How Can Machine Learning Enhance Economic Stability and Growth Strategies
research investigates the critical role of machine learning (ML) in
understanding and managing inflation in Ghana, emphasizing its significance for
the country's economic stability and growth. Utilizing a comprehensive dataset
spanning from 2010 to 2022, the study aims to employ advanced ML models,
particularly those adept in time series forecasting, to predict future
inflation trends. The methodology is designed to provide accurate and reliable
inflation forecasts, offering valuable insights for policymakers and advocating
for a shift towards data-driven approaches in economic decision-making. This
study aims to significantly advance the academic field of economic analysis by
applying machine learning (ML) and offering practical guidance for integrating
advanced technological tools into economic governance, ultimately demonstrating
ML's potential to enhance Ghana's economic resilience and support sustainable
development through effective inflation management.
arXiv link: http://arxiv.org/abs/2410.05630v1
$\texttt{rdid}$ and $\texttt{rdidstag}$: Stata commands for robust difference-in-differences
difference-in-differences (RDID) method developed in Ban and K\'edagni (2023).
It contains three main commands: $rdid$, $rdid_dy$,
$rdidstag$, which we describe in the introduction and the main text.
We illustrate these commands through simulations and empirical examples.
arXiv link: http://arxiv.org/abs/2410.05212v1
Large datasets for the Euro Area and its member countries and the dynamic effects of the common monetary policy
encompasses quarterly and monthly macroeconomic time series for both the Euro
Area (EA) as a whole and its ten primary member countries. The dataset, which
is called EA-MD-QD, includes more than 800 time series and spans the period
from January 2000 to the latest available month. Since January 2024 EA-MD-QD is
updated on a monthly basis and constantly revised, making it an essential
resource for conducting policy analysis related to economic outcomes in the EA.
To illustrate the usefulness of EA-MD-QD, we study the country specific Impulse
Responses of the EA wide monetary policy shock by means of the Common Component
VAR plus either Instrumental Variables or Sign Restrictions identification
schemes. The results reveal asymmetries in the transmission of the monetary
policy shock across countries, particularly between core and peripheral
countries. Additionally, we find comovements across Euro Area countries'
business cycles to be driven mostly by real variables, compared to nominal
ones.
arXiv link: http://arxiv.org/abs/2410.05082v1
Democratizing Strategic Planning in Master-Planned Communities
communities designed specifically to quantify residents' subjective preferences
about large investments in amenities and infrastructure projects. Drawing on
data obtained from brief online surveys, the tool ranks alternative plans by
considering the aggregate anticipated utilization of each proposed amenity and
cost sensitivity to it (or risk sensitivity for infrastructure plans). In
addition, the tool estimates the percentage of households that favor the
preferred plan and predicts whether residents would actually be willing to fund
the project. The mathematical underpinnings of the tool are borrowed from
utility theory, incorporating exponential functions to model diminishing
marginal returns on quality, cost, and risk mitigation.
arXiv link: http://arxiv.org/abs/2410.04676v2
A Structural Approach to Growth-at-Risk
variable to a shock. Our estimation strategy explicitly distinguishes treatment
from control variables, allowing us to model responses of unconditional
quantiles while using controls for identification. Disentangling the effect of
adding control variables on identification versus interpretation brings our
structural quantile impulse responses conceptually closer to structural mean
impulse responses. Applying our methodology to study the impact of financial
shocks on lower quantiles of output growth confirms that financial shocks have
an outsized effect on growth-at-risk, but the magnitude of our estimates is
more extreme than in previous studies.
arXiv link: http://arxiv.org/abs/2410.04431v1
Inference in High-Dimensional Linear Projections: Multi-Horizon Granger Causality and Network Connectedness
high-dimensional sparse Vector Autoregression (VAR) framework. The null
hypothesis focuses on the causal coefficients of interest in a local projection
(LP) at a given horizon. Nevertheless, the post-double-selection method on LP
may not be applicable in this context, as a sparse VAR model does not
necessarily imply a sparse LP for horizon h>1. To validate the proposed test,
we develop two types of de-biased estimators for the causal coefficients of
interest, both relying on first-step machine learning estimators of the VAR
slope parameters. The first estimator is derived from the Least Squares method,
while the second is obtained through a two-stage approach that offers potential
efficiency gains. We further derive heteroskedasticity- and
autocorrelation-consistent (HAC) inference for each estimator. Additionally, we
propose a robust inference method for the two-stage estimator, eliminating the
need to correct for serial correlation in the projection residuals. Monte Carlo
simulations show that the two-stage estimator with robust inference outperforms
the Least Squares method in terms of the Wald test size, particularly for
longer projection horizons. We apply our methodology to analyze the
interconnectedness of policy-related economic uncertainty among a large set of
countries in both the short and long run. Specifically, we construct a causal
network to visualize how economic uncertainty spreads across countries over
time. Our empirical findings reveal, among other insights, that in the short
run (1 and 3 months), the U.S. influences China, while in the long run (9 and
12 months), China influences the U.S. Identifying these connections can help
anticipate a country's potential vulnerabilities and propose proactive
solutions to mitigate the transmission of economic uncertainty.
arXiv link: http://arxiv.org/abs/2410.04330v1
How to Compare Copula Forecasts?
strictly consistent scores. We first establish the negative result that, in
general, copulas fail to be elicitable, implying that copula predictions cannot
sensibly be compared on their own. A notable exception is on Fr\'echet classes,
that is, when the marginal distribution structure is given and fixed, in which
case we give suitable scores for the copula forecast comparison. As a remedy
for the general non-elicitability of copulas, we establish novel
multi-objective scores for copula forecast along with marginal forecasts. They
give rise to two-step tests of equal or superior predictive ability which admit
attribution of the forecast ranking to the accuracy of the copulas or the
marginals. Simulations show that our two-step tests work well in terms of size
and power. We illustrate our new methodology via an empirical example using
copula forecasts for international stock market indices.
arXiv link: http://arxiv.org/abs/2410.04165v1
A Dynamic Approach to Stock Price Prediction: Comparing RNN and Mixture of Experts Models Across Different Volatility Profiles
for stock price prediction by comparing it to a Recurrent Neural Network (RNN)
and a linear regression model. The MoE framework combines an RNN for volatile
stocks and a linear model for stable stocks, dynamically adjusting the weight
of each model through a gating network. Results indicate that the MoE approach
significantly improves predictive accuracy across different volatility
profiles. The RNN effectively captures non-linear patterns for volatile
companies but tends to overfit stable data, whereas the linear model performs
well for predictable trends. The MoE model's adaptability allows it to
outperform each individual model, reducing errors such as Mean Squared Error
(MSE) and Mean Absolute Error (MAE). Future work should focus on enhancing the
gating mechanism and validating the model with real-world datasets to optimize
its practical applicability.
arXiv link: http://arxiv.org/abs/2410.07234v1
A new GARCH model with a deterministic time-varying intercept
unconditional volatility. We propose a new model that captures this type of
nonstationarity in a parsimonious way. The model augments the volatility
equation of a standard GARCH model by a deterministic time-varying intercept.
It captures structural change that slowly affects the amplitude of a time
series while keeping the short-run dynamics constant. We parameterize the
intercept as a linear combination of logistic transition functions. We show
that the model can be derived from a multiplicative decomposition of volatility
and preserves the financial motivation of variance decomposition. We use the
theory of locally stationary processes to show that the quasi maximum
likelihood estimator (QMLE) of the parameters of the model is consistent and
asymptotically normally distributed. We examine the quality of the asymptotic
approximation in a small simulation study. An empirical application to Oracle
Corporation stock returns demonstrates the usefulness of the model. We find
that the persistence implied by the GARCH parameter estimates is reduced by
including a time-varying intercept in the volatility equation.
arXiv link: http://arxiv.org/abs/2410.03239v2
Smaller Confidence Intervals From IPW Estimators via Data-Dependent Coarsening
inference for estimating average treatment effects in observational studies.
Under unconfoundedness, given accurate propensity scores and $n$ samples, the
size of confidence intervals of IPW estimators scales down with $n$, and,
several of their variants improve the rate of scaling. However, neither IPW
estimators nor their variants are robust to inaccuracies: even if a single
covariate has an $\varepsilon>0$ additive error in the propensity score, the
size of confidence intervals of these estimators can increase arbitrarily.
Moreover, even without errors, the rate with which the confidence intervals of
these estimators go to zero with $n$ can be arbitrarily slow in the presence of
extreme propensity scores (those close to 0 or 1).
We introduce a family of Coarse IPW (CIPW) estimators that captures existing
IPW estimators and their variants. Each CIPW estimator is an IPW estimator on a
coarsened covariate space, where certain covariates are merged. Under mild
assumptions, e.g., Lipschitzness in expected outcomes and sparsity of extreme
propensity scores, we give an efficient algorithm to find a robust estimator:
given $\varepsilon$-inaccurate propensity scores and $n$ samples, its
confidence interval size scales with $\varepsilon+1/n$. In contrast,
under the same assumptions, existing estimators' confidence interval sizes are
$\Omega(1)$ irrespective of $\varepsilon$ and $n$. Crucially, our estimator is
data-dependent and we show that no data-independent CIPW estimator can be
robust to inaccuracies.
arXiv link: http://arxiv.org/abs/2410.01658v1
Transformers Handle Endogeneity in In-Context Linear Regression
in-context linear regression. Our main finding is that transformers inherently
possess a mechanism to handle endogeneity effectively using instrumental
variables (IV). First, we demonstrate that the transformer architecture can
emulate a gradient-based bi-level optimization procedure that converges to the
widely used two-stage least squares $(2SLS)$ solution at an
exponential rate. Next, we propose an in-context pretraining scheme and provide
theoretical guarantees showing that the global minimizer of the pre-training
loss achieves a small excess loss. Our extensive experiments validate these
theoretical findings, showing that the trained transformer provides more robust
and reliable in-context predictions and coefficient estimates than the
$2SLS$ method, in the presence of endogeneity.
arXiv link: http://arxiv.org/abs/2410.01265v3
Forecasting short-term inflation in Argentina with Random Forest Models
short-term monthly inflation in Argentina, based on a database of monthly
indicators since 1962. It is found that these models achieve forecast accuracy
that is statistically comparable to the consensus of market analysts'
expectations surveyed by the Central Bank of Argentina (BCRA) and to
traditional econometric models. One advantage of Random Forest models is that,
as they are non-parametric, they allow for the exploration of nonlinear effects
in the predictive power of certain macroeconomic variables on inflation. Among
other findings, the relative importance of the exchange rate gap in forecasting
inflation increases when the gap between the parallel and official exchange
rates exceeds 60%. The predictive power of the exchange rate on inflation rises
when the BCRA's net international reserves are negative or close to zero
(specifically, below USD 2 billion). The relative importance of inflation
inertia and the nominal interest rate in forecasting the following month's
inflation increases when the nominal levels of inflation and/or interest rates
rise.
arXiv link: http://arxiv.org/abs/2410.01175v1
Partially Identified Heterogeneous Treatment Effect with Selection: An Application to Gender Gaps
gender gap problem, where even random treatment assignment is affected by
selection bias. By offering a robust alternative free from distributional or
specification assumptions, we bound the treatment effect under the sample
selection model with an exclusion restriction, an assumption whose validity is
tested in the literature. This exclusion restriction allows for further
segmentation of the population into distinct types based on observed and
unobserved characteristics. For each type, we derive the proportions and bound
the gender gap accordingly. Notably, trends in type proportions and gender gap
bounds reveal an increasing proportion of always-working individuals over time,
alongside variations in bounds, including a general decline across time and
consistently higher bounds for those in high-potential wage groups. Further
analysis, considering additional assumptions, highlights persistent gender gaps
for some types, while other types exhibit differing or inconclusive trends.
This underscores the necessity of separating individuals by type to understand
the heterogeneous nature of the gender gap.
arXiv link: http://arxiv.org/abs/2410.01159v2
A Nonparametric Test of Heterogeneous Treatment Effects under Interference
predefined subgroups is challenging when units interact because treatment
effects may vary by pre-treatment variables, post-treatment exposure variables
(that measure the exposure to other units' treatment statuses), or both. Thus,
the conventional HTEs testing procedures may be invalid under interference. In
this paper, I develop statistical methods to infer HTEs and disentangle the
drivers of treatment effects heterogeneity in populations where units interact.
Specifically, I incorporate clustered interference into the potential outcomes
model and propose kernel-based test statistics for the null hypotheses of (i)
no HTEs by treatment assignment (or post-treatment exposure variables) for all
pre-treatment variables values and (ii) no HTEs by pre-treatment variables for
all treatment assignment vectors. I recommend a multiple-testing algorithm to
disentangle the source of heterogeneity in treatment effects. I prove the
asymptotic properties of the proposed test statistics. Finally, I illustrate
the application of the test procedures in an empirical setting using an
experimental data set from a Chinese weather insurance program.
arXiv link: http://arxiv.org/abs/2410.00733v1
Inference for the Marginal Value of Public Funds
summarize them into scalar measures of cost-effectiveness or welfare, such as
the Marginal Value of Public Funds (MVPF). In many settings, microdata
underlying these estimates are unavailable, leaving researchers with only
published estimates and their standard errors. We develop tools for valid
inference on functions of causal effects, such as the MVPF, when the
correlation structure is unknown. Our approach is to construct worst-case
confidence intervals, leveraging experimental designs to tighten them, and to
assess robustness using breakdown analyses. We illustrate our method with MVPFs
for eight policies.
arXiv link: http://arxiv.org/abs/2410.00217v3
New Tests of Equal Forecast Accuracy for Factor-Augmented Regressions with Weaker Loadings
accuracy and encompassing by Pitarakis (2023) and Pitarakis (2025), when the
competing forecast specification is that of a factor-augmented regression
model. This should be of interest for practitioners, as there is no theory
justifying the use of these simple and powerful tests in such context. In
pursuit of this, we employ a novel theory to incorporate the empirically
well-documented fact of homogeneously/heterogeneously weak factor loadings, and
track their effect on the forecast comparison problem.
arXiv link: http://arxiv.org/abs/2409.20415v3
Synthetic Difference in Differences for Repeated Cross-Sectional Data
to estimate a causal effect with a latent factor model. However, it relies on
the use of panel data. This paper presents an adaptation of the synthetic
difference-in-differences method for repeated cross-sectional data. The
treatment is considered to be at the group level so that it is possible to
aggregate data by group to compute the two types of synthetic
difference-in-differences weights on these aggregated data. Then, I develop and
compute a third type of weight that accounts for the different number of
observations in each cross-section. Simulation results show that the
performance of the synthetic difference-in-differences estimator is improved
when using the third type of weights on repeated cross-sectional data.
arXiv link: http://arxiv.org/abs/2409.20199v1
Factors in Fashion: Factor Analysis towards the Mode
in high dimensional panel data. Unlike the approximate factor model that
targets for the mean factors, it captures factors that influence the
conditional mode of the distribution of the observables. Statistical inference
is developed with the aid of mode estimation, where the modal factors and the
loadings are estimated through maximizing a kernel-type objective function. An
easy-to-implement alternating maximization algorithm is designed to obtain the
estimators numerically. Two model selection criteria are further proposed to
determine the number of factors. The asymptotic properties of the proposed
estimators are established under some regularity conditions. Simulations
demonstrate the nice finite sample performance of our proposed estimators, even
in the presence of heavy-tailed and asymmetric idiosyncratic error
distributions. Finally, the application to inflation forecasting illustrates
the practical merits of modal factors.
arXiv link: http://arxiv.org/abs/2409.19287v1
Large Bayesian Tensor VARs with Stochastic Volatility
coefficients are arranged as a three-dimensional array or tensor, and this
coefficient tensor is parameterized using a low-rank CP decomposition. We
develop a family of TVARs using a general stochastic volatility specification,
which includes a wide variety of commonly-used multivariate stochastic
volatility and COVID-19 outlier-augmented models. In a forecasting exercise
involving 40 US quarterly variables, we show that these TVARs outperform the
standard Bayesian VAR with the Minnesota prior. The results also suggest that
the parsimonious common stochastic volatility model tends to forecast better
than the more flexible Cholesky stochastic volatility model.
arXiv link: http://arxiv.org/abs/2409.16132v1
Identifying Elasticities in Autocorrelated Time Series Using Causal Graphs
instrumental variables (IV). However, naive IV estimators may be inconsistent
in settings with autocorrelated time series. We argue that causal time graphs
can simplify IV identification and help select consistent estimators. To do so,
we propose to first model the equilibrium condition by an unobserved
confounder, deriving a directed acyclic graph (DAG) while maintaining the
assumption of a simultaneous determination of prices and quantities. We then
exploit recent advances in graphical inference to derive valid IV estimators,
including estimators that achieve consistency by simultaneously estimating
nuisance effects. We further argue that observing significant differences
between the estimates of presumably valid estimators can help to reject false
model assumptions, thereby improving our understanding of underlying economic
dynamics. We apply this approach to the German electricity market, estimating
the price elasticity of demand on simulated and real-world data. The findings
underscore the importance of accounting for structural autocorrelation in
IV-based analysis.
arXiv link: http://arxiv.org/abs/2409.15530v1
Non-linear dependence and Granger causality: A vine copula approach
test for bivariate $k-$Markov stationary processes based on a recently
introduced class of non-linear models, i.e., vine copula models. By means of a
simulation study, we show that the proposed test improves on the statistical
properties of the original test in Jang et al. (2022), and also of other
previous methods, constituting an excellent tool for testing Granger causality
in the presence of non-linear dependence structures. Finally, we apply our test
to study the pairwise relationships between energy consumption, GDP and
investment in the U.S. and, notably, we find that Granger-causality runs two
ways between GDP and energy consumption.
arXiv link: http://arxiv.org/abs/2409.15070v2
Inequality Sensitive Optimal Treatment Assignment
mean $m$ is the outcome level such that the evaluator is indifferent between
the distribution of outcomes and a society in which everyone obtains an outcome
of $ee$. For an inequality averse evaluator, $ee < m$. In this paper, I extend
the optimal treatment choice framework in Manski (2024) to the case where the
welfare evaluation is made using egalitarian equivalent measures, and derive
optimal treatment rules for the Bayesian, maximin and minimax regret inequality
averse evaluators. I illustrate how the methodology operates in the context of
the JobCorps education and training program for disadvantaged youth (Schochet,
Burghardt, and McConnell 2008) and in Meager (2022)'s Bayesian meta analysis of
the microcredit literature.
arXiv link: http://arxiv.org/abs/2409.14776v2
The continuous-time limit of quasi score-driven volatility models
Score-Driven (QSD) models that characterize volatility. As the sampling
frequency increases and the time interval tends to zero, the model weakly
converges to a continuous-time stochastic volatility model where the two
Brownian motions are correlated, thereby capturing the leverage effect in the
market. Subsequently, we identify that a necessary condition for non-degenerate
correlation is that the distribution of driving innovations differs from that
of computing score, and at least one being asymmetric. We then illustrate this
with two typical examples. As an application, the QSD model is used as an
approximation for correlated stochastic volatility diffusions and quasi maximum
likelihood estimation is performed. Simulation results confirm the method's
effectiveness, particularly in estimating the correlation coefficient.
arXiv link: http://arxiv.org/abs/2409.14734v2
Mining Causality: AI-Assisted Search for Instrumental Variables
causal inference. Finding IVs is a heuristic and creative process, and
justifying its validity -- especially exclusion restrictions -- is largely
rhetorical. We propose using large language models (LLMs) to search for new IVs
through narratives and counterfactual reasoning, similar to how a human
researcher would. The stark difference, however, is that LLMs can dramatically
accelerate this process and explore an extremely large search space. We
demonstrate how to construct prompts to search for potentially valid IVs. We
contend that multi-step and role-playing prompting strategies are effective for
simulating the endogenous decision-making processes of economic agents and for
navigating language models through the realm of real-world scenarios, rather
than anchoring them within the narrow realm of academic discourses on IVs. We
apply our method to three well-known examples in economics: returns to
schooling, supply and demand, and peer effects. We then extend our strategy to
finding (i) control variables in regression and difference-in-differences and
(ii) running variables in regression discontinuity designs.
arXiv link: http://arxiv.org/abs/2409.14202v3
A simple but powerful tail index regression
conditional tail index of heavy tailed distributions. In this framework, the
tail index is computed from an auxiliary linear regression model that
facilitates estimation and inference based on established econometric methods,
such as ordinary least squares (OLS), least absolute deviations, or
M-estimation. We show theoretically and via simulations that OLS provides
interesting results. Our Monte Carlo results highlight the adequate finite
sample properties of the OLS tail index estimator computed from the proposed
new framework and contrast its behavior to that of tail index estimates
obtained by maximum likelihood estimation of exponential regression models,
which is one of the approaches currently in use in the literature. An empirical
analysis of the impact of determinants of the conditional left- and right-tail
indexes of commodities' return distributions highlights the empirical relevance
of our proposed approach. The novel framework's flexibility allows for
extensions and generalizations in various directions, empowering researchers
and practitioners to straightforwardly explore a wide range of research
questions.
arXiv link: http://arxiv.org/abs/2409.13531v1
Dynamic tail risk forecasting: what do realized skewness and kurtosis add?
including realized skewness and kurtosis in "additive" and "multiplicative"
models. Utilizing a panel of 960 US stocks, we conduct diagnostic tests, employ
scoring functions, and implement rolling window forecasting to evaluate the
performance of Value at Risk (VaR) and Expected Shortfall (ES) forecasts.
Additionally, we examine the impact of the window length on forecast accuracy.
We propose model specifications that incorporate realized skewness and kurtosis
for enhanced precision. Our findings provide insights into the importance of
considering skewness and kurtosis in tail risk modeling, contributing to the
existing literature and offering practical implications for risk practitioners
and researchers.
arXiv link: http://arxiv.org/abs/2409.13516v1
Testing for equal predictive accuracy with strong dependence
presence of autocorrelation in the loss differential. We show that the power of
the Diebold and Mariano (1995) test decreases as the dependence increases,
making it more difficult to obtain statistically significant evidence of
superior predictive ability against less accurate benchmarks. We also find
that, after a certain threshold, the test has no power and the correct null
hypothesis is spuriously rejected. Taken together, these results caution to
seriously consider the dependence properties of the loss differential before
the application of the Diebold and Mariano (1995) test.
arXiv link: http://arxiv.org/abs/2409.12662v1
Parameters on the boundary in predictive regression
regressions when the parameter of interest may lie on the boundary of the
parameter space, here defined by means of a smooth inequality constraint. For
instance, this situation occurs when the definition of the parameter space
allows for the cases of either no predictability or sign-restricted
predictability. We show that in this context constrained estimation gives rise
to bootstrap statistics whose limit distribution is, in general, random, and
thus distinct from the limit null distribution of the original statistics of
interest. This is due to both (i) the possible location of the true parameter
vector on the boundary of the parameter space, and (ii) the possible
non-stationarity of the posited predicting (resp. Granger-causing) variable. We
discuss a modification of the standard fixed-regressor wild bootstrap scheme
where the bootstrap parameter space is shifted by a data-dependent function in
order to eliminate the portion of limiting bootstrap randomness attributable to
the boundary, and prove validity of the associated bootstrap inference under
non-stationarity of the predicting variable as the only remaining source of
limiting bootstrap randomness. Our approach, which is initially presented in a
simple location model, has bearing on inference in parameter-on-the-boundary
situations beyond the predictive regression problem.
arXiv link: http://arxiv.org/abs/2409.12611v1
Robust Bond Risk Premia Predictability Test in the Quantiles
of bond risk premia, which only considers mean regressions, this paper
investigates whether the yield curve represented by CP factor (Cochrane and
Piazzesi, 2005) contains all available information about future bond returns in
a predictive quantile regression with many other macroeconomic variables. In
this study, we introduce the Trend in Debt Holding (TDH) as a novel predictor,
testing it alongside established macro indicators such as Trend Inflation (TI)
(Cieslak and Povala, 2015), and macro factors from Ludvigson and Ng (2009). A
significant challenge in this study is the invalidity of traditional quantile
model inference approaches, given the high persistence of many macro variables
involved. Furthermore, the existing methods addressing this issue do not
perform well in the marginal test with many highly persistent predictors. Thus,
we suggest a robust inference approach, whose size and power performance are
shown to be better than existing tests. Using data from 1980-2022, the
macro-spanning hypothesis is strongly supported at center quantiles by the
empirical finding that the CP factor has predictive power while all other macro
variables have negligible predictive power in this case. On the other hand, the
evidence against the macro-spanning hypothesis is found at tail quantiles, in
which TDH has predictive power at right tail quantiles while TI has predictive
power at both tails quantiles. Finally, we show the performance of in-sample
and out-of-sample predictions implemented by the proposed method are better
than existing methods.
arXiv link: http://arxiv.org/abs/2410.03557v1
A Way to Synthetic Triple Difference
with triple difference to address violations of the parallel trends assumption.
By transforming triple difference into a DID structure, we can apply synthetic
control to a triple-difference framework, enabling more robust estimates when
parallel trends are violated across multiple dimensions. The proposed procedure
is applied to a real-world dataset to illustrate when and how we should apply
this practice, while cautions are presented afterwards. This method contributes
to improving causal inference in policy evaluations and offers a valuable tool
for researchers dealing with heterogeneous treatment effects across subgroups.
arXiv link: http://arxiv.org/abs/2409.12353v2
Simple robust two-stage estimation and inference for generalized impulse responses and multi-horizon causality
for generalized impulse responses (GIRs). GIRs encompass all coefficients in a
multi-horizon linear projection model of future outcomes of y on lagged values
(Dufour and Renault, 1998), which include the Sims' impulse response. The
conventional use of Least Squares (LS) with heteroskedasticity- and
autocorrelation-consistent covariance estimation is less precise and often
results in unreliable finite sample tests, further complicated by the selection
of bandwidth and kernel functions. Our two-stage method surpasses the LS
approach in terms of estimation efficiency and inference robustness. The
robustness stems from our proposed covariance matrix estimates, which eliminate
the need to correct for serial correlation in the multi-horizon projection
residuals. Our method accommodates non-stationary data and allows the
projection horizon to grow with sample size. Monte Carlo simulations
demonstrate our two-stage method outperforms the LS method. We apply the
two-stage method to investigate the GIRs, implement multi-horizon Granger
causality test, and find that economic uncertainty exerts both short-run (1-3
months) and long-run (30 months) effects on economic activities.
arXiv link: http://arxiv.org/abs/2409.10820v1
GPT takes the SAT: Tracing changes in Test Difficulty and Math Performance of Students
effectiveness and relevance are increasingly questioned. This paper enhances
Synthetic Control methods by introducing "Transformed Control", a novel method
that employs Large Language Models (LLMs) powered by Artificial Intelligence to
generate control groups. We utilize OpenAI's API to generate a control group
where GPT-4, or ChatGPT, takes multiple SATs annually from 2008 to 2023. This
control group helps analyze shifts in SAT math difficulty over time, starting
from the baseline year of 2008. Using parallel trends, we calculate the Average
Difference in Scores (ADS) to assess changes in high school students' math
performance. Our results indicate a significant decrease in the difficulty of
the SAT math section over time, alongside a decline in students' math
performance. The analysis shows a 71-point drop in the rigor of SAT math from
2008 to 2023, with student performance decreasing by 36 points, resulting in a
107-point total divergence in average student math performance. We investigate
possible mechanisms for this decline in math proficiency, such as changing
university selection criteria, increased screen time, grade inflation, and
worsening adolescent mental health. Disparities among demographic groups show a
104-point drop for White students, 84 points for Black students, and 53 points
for Asian students. Male students saw a 117-point reduction, while female
students had a 100-point decrease.
arXiv link: http://arxiv.org/abs/2409.10750v1
Why you should also use OLS estimation of tail exponents
rank-size regressions, the usual recommendation is to use the Hill MLE with a
small-sample correction instead, due to its unbiasedness and efficiency. In
this paper, we advocate that you should also apply OLS in empirical
applications. On the one hand, we demonstrate that, with a small-sample
correction, the OLS estimator is also unbiased. On the other hand, we show that
the MLE assigns significantly greater weight to smaller observations. This
suggests that the OLS estimator may outperform the MLE in cases where the
distribution is (i) strictly Pareto but only in the upper tail or (ii)
regularly varying rather than strictly Pareto. We substantiate our theoretical
findings with Monte Carlo simulations and real-world applications,
demonstrating the practical relevance of the OLS method in estimating tail
exponents.
arXiv link: http://arxiv.org/abs/2409.10448v2
Econometric Inference for High Dimensional Predictive Regressions
adversely affect the desirable asymptotic normality and invalidate the standard
inferential procedure based on the $t$-statistic. The desparsified LASSO has
emerged as a well-known remedy for this issue. In the context of high
dimensional predictive regression, the desparsified LASSO faces an additional
challenge: the Stambaugh bias arising from nonstationary regressors. To restore
the standard inferential procedure, we propose a novel estimator called
IVX-desparsified LASSO (XDlasso). XDlasso eliminates the shrinkage bias and the
Stambaugh bias simultaneously and does not require prior knowledge about the
identities of nonstationary and stationary regressors. We establish the
asymptotic properties of XDlasso for hypothesis testing, and our theoretical
findings are supported by Monte Carlo simulations. Applying our method to
real-world applications from the FRED-MD database -- which includes a rich set
of control variables -- we investigate two important empirical questions: (i)
the predictability of the U.S. stock returns based on the earnings-price ratio,
and (ii) the predictability of the U.S. inflation using the unemployment rate.
arXiv link: http://arxiv.org/abs/2409.10030v2
A Simple and Adaptive Confidence Interval when Nuisance Parameters Satisfy an Inequality
parameter is nonnegative, possibly a regression coefficient or a treatment
effect. This paper focuses on the case that there is only one inequality and
proposes a confidence interval that is particularly attractive, called the
inequality-imposed confidence interval (IICI). The IICI is simple. It does not
require simulations or tuning parameters. The IICI is adaptive. It reduces to
the usual confidence interval (calculated by adding and subtracting the
standard error times the $1 - \alpha/2$ standard normal quantile) when the
inequality is sufficiently slack. When the inequality is sufficiently violated,
the IICI reduces to an equality-imposed confidence interval (the usual
confidence interval for the submodel where the inequality holds with equality).
Also, the IICI is uniformly valid and has (weakly) shorter length than the
usual confidence interval; it is never longer. The first empirical application
considers a linear regression when a coefficient is known to be nonpositive. A
second empirical application considers an instrumental variables regression
when the endogeneity of a regressor is known to be nonnegative.
arXiv link: http://arxiv.org/abs/2409.09962v1
Estimating Wage Disparities Using Foundation Models
instead of training specialized models from scratch, foundation models are
first trained on massive datasets before being adapted or fine-tuned to make
predictions on smaller datasets. Initially developed for text, foundation
models have also excelled at making predictions about social science data.
However, while many estimation problems in the social sciences use prediction
as an intermediate step, they ultimately require different criteria for
success. In this paper, we develop methods for fine-tuning foundation models to
perform these estimation problems. We first characterize an omitted variable
bias that can arise when a foundation model is only fine-tuned to maximize
predictive accuracy. We then provide a novel set of conditions for fine-tuning
under which estimates derived from a foundation model are root-n-consistent.
Based on this theory, we develop new fine-tuning algorithms that empirically
mitigate this omitted variable bias. To demonstrate our ideas, we study gender
wage decomposition. This is a statistical estimation problem from econometrics
where the goal is to decompose the gender wage gap into components that can and
cannot be explained by career histories of workers. Classical methods for
decomposing the wage gap employ simple predictive models of wages which
condition on coarse summaries of career history that may omit factors that are
important for explaining the gap. Instead, we use a custom-built foundation
model to decompose the gender wage gap, which captures a richer representation
of career history. Using data from the Panel Study of Income Dynamics, we find
that career history explains more of the gender wage gap than standard
econometric models can measure, and we identify elements of career history that
are omitted by standard models but are important for explaining the wage gap.
arXiv link: http://arxiv.org/abs/2409.09894v2
Structural counterfactual analysis in macroeconomics: theory and inference
macroeconomic counterfactuals related to policy path deviation: hypothetical
trajectory and policy intervention. Our model-free approach is built on a
structural vector moving-average (SVMA) model that relies solely on the
identification of policy shocks, thereby eliminating the need to specify an
entire structural model. Analytical solutions are derived for the
counterfactual parameters, and statistical inference for these parameter
estimates is provided using the Delta method. By utilizing external
instruments, we introduce a projection-based method for the identification,
estimation, and inference of these parameters. This approach connects our
counterfactual analysis with the Local Projection literature. A
simulation-based approach with nonlinear model is provided to add in addressing
Lucas' critique. The innovative model-free methodology is applied in three
counterfactual studies on the U.S. monetary policy: (1) a historical scenario
analysis for a hypothetical interest rate path in the post-pandemic era, (2) a
future scenario analysis under either hawkish or dovish interest rate policy,
and (3) an evaluation of the policy intervention effect of an oil price shock
by zeroing out the systematic responses of the interest rate.
arXiv link: http://arxiv.org/abs/2409.09577v1
Unconditional Randomization Tests for Interference
between units when conducting causal inference or designing policy. However,
testing for interference presents significant econometric challenges,
particularly due to complex clustering patterns and dependencies that can
invalidate standard methods. This paper introduces the pairwise
imputation-based randomization test (PIRT), a general and robust framework for
assessing the existence and extent of interference in experimental settings.
PIRT employs unconditional randomization testing and pairwise comparisons,
enabling straightforward implementation and ensuring finite-sample validity
under minimal assumptions about network structure. The method's practical value
is demonstrated through an application to a large-scale policing experiment in
Bogota, Colombia (Blattman et al., 2021), which evaluates the effects of
hotspot policing on crime at the street segment level. The analysis reveals
that increased police patrolling in hotspots significantly displaces violent
crime, but not property crime. Simulations calibrated to this context further
underscore the power and robustness of PIRT.
arXiv link: http://arxiv.org/abs/2409.09243v3
The Clustered Dose-Response Function Estimator for continuous treatment with heterogeneous treatment effects
heterogeneous effects even at identical treatment intensities. Taken together,
these characteristics pose significant challenges for identifying causal
effects, as no existing estimator can provide an unbiased estimate of the
average causal dose-response function. To address this gap, we introduce the
Clustered Dose-Response Function (Cl-DRF), a novel estimator designed to
discern the continuous causal relationships between treatment intensity and the
dependent variable across different subgroups. This approach leverages both
theoretical and data-driven sources of heterogeneity and operates under relaxed
versions of the conditional independence and positivity assumptions, which are
required to be met only within each identified subgroup. To demonstrate the
capabilities of the Cl-DRF estimator, we present both simulation evidence and
an empirical application examining the impact of European Cohesion funds on
economic growth.
arXiv link: http://arxiv.org/abs/2409.08773v1
Machine Learning and Econometric Approaches to Fiscal Policies: Understanding Industrial Investment Dynamics in Uruguay (1974-2010)
in Uruguay from 1974 to 2010. Using a mixed-method approach that combines
econometric models with machine learning techniques, the study investigates
both the short-term and long-term effects of fiscal benefits on industrial
investment. The results confirm the significant role of fiscal incentives in
driving long-term industrial growth, while also highlighting the importance of
a stable macroeconomic environment, public investment, and access to credit.
Machine learning models provide additional insights into nonlinear interactions
between fiscal benefits and other macroeconomic factors, such as exchange
rates, emphasizing the need for tailored fiscal policies. The findings have
important policy implications, suggesting that fiscal incentives, when combined
with broader economic reforms, can effectively promote industrial development
in emerging economies.
arXiv link: http://arxiv.org/abs/2410.00002v1
Bayesian Dynamic Factor Models for High-dimensional Matrix-valued Time Series
accommodates time-varying volatility, outliers, and cross-sectional correlation
in the idiosyncratic components. For model comparison, we employ an
importance-sampling estimator of the marginal likelihood based on the
cross-entropy method to determine: (1) the optimal dimension of the factor
matrix; (2) whether a vector- or matrix-valued structure is more suitable; and
(3) whether an approximate or exact factor model is favored by the data.
Through a series of Monte Carlo experiments, we demonstrate the accuracy of the
factor estimates and the effectiveness of the marginal likelihood estimator in
correctly identifying the true model. Applications to macroeconomic and
financial datasets illustrate the model's ability to capture key features in
matrix-valued time series.
arXiv link: http://arxiv.org/abs/2409.08354v3
Sensitivity analysis of the perturbed utility stochastic traffic equilibrium
utility route choice (PURC) model and the accompanying stochastic traffic
equilibrium model. We provide general results that determine the marginal
change in link flows following a marginal change in link costs across the
network in the cases of flow-independent and flow-dependent link costs. We
derive analytical sensitivity expressions for the Jacobian of the individual
optimal PURC flow and equilibrium link flows with respect to link cost
parameters under mild differentiability assumptions. Numerical examples
illustrate the robustness of our method, demonstrating its use for estimating
equilibrium link flows after link cost shifts, identifying critical design
parameters, and quantifying uncertainty in performance predictions. The
findings have implications for network design, pricing strategies, and policy
analysis in transportation planning and economics, providing a bridge between
theoretical models and real-world applications.
arXiv link: http://arxiv.org/abs/2409.08347v2
Trends and biases in the social cost of carbon
the social cost of carbon is around $200/tC with a large, right-skewed
uncertainty and trending up. The pure rate of time preference and the inverse
of the elasticity of intertemporal substitution are key assumptions, the total
impact of 2.5K warming less so. The social cost of carbon is much higher if
climate change is assumed to affect economic growth rather than the level of
output and welfare. The literature is dominated by a relatively small network
of authors, based in a few countries. Publication and citation bias have pushed
the social cost of carbon up.
arXiv link: http://arxiv.org/abs/2409.08158v1
Bootstrap Adaptive Lasso Solution Path Unit Root Tests
unit root tests of Arnold and Reinschl\"ussel (2024) arXiv:2404.06205 to
improve finite sample properties and extend their applicability to a
generalised framework, allowing for non-stationary volatility. Numerical
evidence shows the bootstrap to improve the tests' precision for error
processes that promote spurious rejections of the unit root null, depending on
the detrending procedure. The bootstrap mitigates finite-sample size
distortions and restores asymptotically valid inference when the data features
time-varying unconditional variance. We apply the bootstrap tests to real
residential property prices of the top six Eurozone economies and find evidence
of stationarity to be period-specific, supporting the conjecture that
exuberance in the housing market characterises the development of Euro-era
residential property prices in the recent past.
arXiv link: http://arxiv.org/abs/2409.07859v1
Testing for a Forecast Accuracy Breakdown under Long Memory
time series and provide theoretical and simulation evidence on the memory
transfer from the time series to the forecast residuals. The proposed method
uses a double sup-Wald test against the alternative of a structural break in
the mean of an out-of-sample loss series. To address the problem of estimating
the long-run variance under long memory, a robust estimator is applied. The
corresponding breakpoint results from a long memory robust CUSUM test. The
finite sample size and power properties of the test are derived in a Monte
Carlo simulation. A monotonic power function is obtained for the fixed
forecasting scheme. In our practical application, we find that the global
energy crisis that began in 2021 led to a forecast break in European
electricity prices, while the results for the U.S. are mixed.
arXiv link: http://arxiv.org/abs/2409.07087v1
Estimation and Inference for Causal Functions with Multiway Clustered Data
class of causal functions, such as the conditional average treatment effects
and the continuous treatment effects, under multiway clustering. The causal
function is identified as a conditional expectation of an adjusted
(Neyman-orthogonal) signal that depends on high-dimensional nuisance
parameters. We propose a two-step procedure where the first step uses machine
learning to estimate the high-dimensional nuisance parameters. The second step
projects the estimated Neyman-orthogonal signal onto a dictionary of basis
functions whose dimension grows with the sample size. For this two-step
procedure, we propose both the full-sample and the multiway cross-fitting
estimation approaches. A functional limit theory is derived for these
estimators. To construct the uniform confidence bands, we develop a novel
resampling procedure, called the multiway cluster-robust sieve score bootstrap,
that extends the sieve score bootstrap (Chen and Christensen, 2018) to the
novel setting with multiway clustering. Extensive numerical simulations
showcase that our methods achieve desirable finite-sample behaviors. We apply
the proposed methods to analyze the causal relationship between mistrust levels
in Africa and the historical slave trade. Our analysis rejects the null
hypothesis of uniformly zero effects and reveals heterogeneous treatment
effects, with significant impacts at higher levels of trade volumes.
arXiv link: http://arxiv.org/abs/2409.06654v1
Enhancing Preference-based Linear Bandits via Human Response Time
queries as pairs of options and collecting binary choices. Although binary
choices are simple and widely used, they provide limited information about
preference strength. To address this, we leverage human response times, which
are inversely related to preference strength, as an additional signal. We
propose a computationally efficient method that combines choices and response
times to estimate human utility functions, grounded in the EZ diffusion model
from psychology. Theoretical and empirical analyses show that for queries with
strong preferences, response times complement choices by providing extra
information about preference strength, leading to significantly improved
utility estimation. We incorporate this estimator into preference-based linear
bandits for fixed-budget best-arm identification. Simulations on three
real-world datasets demonstrate that using response times significantly
accelerates preference learning compared to choice-only approaches. Additional
materials, such as code, slides, and talk video, are available at
https://shenlirobot.github.io/pages/NeurIPS24.html
arXiv link: http://arxiv.org/abs/2409.05798v4
Uniform Estimation and Inference for Nonparametric Partitioning-Based M-Estimators
of nonparametric partitioning-based M-estimators. The main theoretical results
include: (i) uniform consistency for convex and non-convex objective functions;
(ii) rate-optimal uniform Bahadur representations; (iii) rate-optimal uniform
(and mean square) convergence rates; (iv) valid strong approximations and
feasible uniform inference methods; and (v) extensions to functional
transformations of underlying estimators. Uniformity is established over both
the evaluation point of the nonparametric functional parameter and a Euclidean
parameter indexing the class of loss functions. The results also account
explicitly for the smoothness degree of the loss function (if any), and allow
for a possibly non-identity (inverse) link function. We illustrate the
theoretical and methodological results in four examples: quantile regression,
distribution regression, $L_p$ regression, and Logistic regression. Many other
possibly non-smooth, nonlinear, generalized, robust M-estimation settings are
covered by our results. We provide detailed comparisons with the existing
literature and demonstrate substantive improvements: we achieve the best (in
some cases optimal) known results under improved (in some cases minimal)
requirements in terms of regularity conditions and side rate restrictions. The
supplemental appendix reports complementary technical results that may be of
independent interest, including a novel uniform strong approximation result
based on Yurinskii's coupling.
arXiv link: http://arxiv.org/abs/2409.05715v2
The Surprising Robustness of Partial Least Squares
with high dimensional problems in which the number of observations is limited
given the number of independent variables. In this article, we show that PLS
can perform better than ordinary least squares (OLS), least absolute shrinkage
and selection operator (LASSO) and ridge regression in forecasting quarterly
gross domestic product (GDP) growth, covering the period from 2000 to 2023. In
fact, through dimension reduction, PLS proved to be effective in lowering the
out-of-sample forecasting error, specially since 2020. For the period
2000-2019, the four methods produce similar results, suggesting that PLS is a
valid regularisation technique like LASSO or ridge.
arXiv link: http://arxiv.org/abs/2409.05713v1
Bellwether Trades: Characteristics of Trades influential in Predicting Future Price Movements in Markets
identify the characteristics of trades that contain valuable information.
First, we demonstrate the effectiveness of our optimized neural network
predictor in accurately predicting future market movements. Then, we utilize
the information from this successful neural network predictor to pinpoint the
individual trades within each data point (trading window) that had the most
impact on the optimized neural network's prediction of future price movements.
This approach helps us uncover important insights about the heterogeneity in
information content provided by trades of different sizes, venues, trading
contexts, and over time.
arXiv link: http://arxiv.org/abs/2409.05192v1
Difference-in-Differences with Multiple Events
there is a second event confounding the target event. When the events are
correlated, the treatment and the control group are unevenly exposed to the
effects of the second event, causing an omitted event bias. To address this
bias, I propose a two-stage DiD design. In the first stage, I estimate the
combined effects of both treatments using a control group that is neither
treated nor confounded. In the second stage, I isolate the effects of the
target treatment by leveraging a parallel treatment effect assumption and a
control group that is treated but not yet confounded. Finally, I apply this
method to revisit the effect of minimum wage increases on teen employment using
state-level hikes between 2010 and 2020. I find that the Medicaid expansion
under the ACA is a significant confounder: controlling for this bias reduces
the short-term estimate of the minimum wage effect by two-thirds.
arXiv link: http://arxiv.org/abs/2409.05184v3
DEPLOYERS: An agent based modeling tool for multi country real world data
agent-based macroeconomics modeling (ABM) framework, capable to deploy and
simulate a full economic system (individual workers, goods and services firms,
government, central and private banks, financial market, external sectors)
whose structure and activity analysis reproduce the desired calibration data,
that can be, for example a Social Accounting Matrix (SAM) or a Supply-Use Table
(SUT) or an Input-Output Table (IOT).Here we extend our previous work to a
multi-country version and show an example using data from a 46-countries
64-sectors FIGARO Inter-Country IOT. The simulation of each country runs on a
separate thread or CPU core to simulate the activity of one step (month, week,
or day) and then interacts (updates imports, exports, transfer) with that
country's foreign partners, and proceeds to the next step. This interaction can
be chosen to be aggregated (a single row and column IO account) or
disaggregated (64 rows and columns) with each partner. A typical run simulates
thousands of individuals and firms engaged in their monthly activity and then
records the results, much like a survey of the country's economic system. This
data can then be subjected to, for example, an Input-Output analysis to find
out the sources of observed stylized effects as a function of time in the
detailed and realistic modeling environment that can be easily implemented in
an ABM framework.
arXiv link: http://arxiv.org/abs/2409.04876v1
Improving the Finite Sample Estimation of Average Treatment Effects using Double/Debiased Machine Learning with Propensity Score Calibration
estimating causal effects. One machine learning approach that can be used for
estimating an average treatment effect is Double/debiased machine learning
(DML) (Chernozhukov et al., 2018). This approach uses a double-robust score
function that relies on the prediction of nuisance functions, such as the
propensity score, which is the probability of treatment assignment conditional
on covariates. Estimators relying on double-robust score functions are highly
sensitive to errors in propensity score predictions. Machine learners increase
the severity of this problem as they tend to over- or underestimate these
probabilities. Several calibration approaches have been proposed to improve
probabilistic forecasts of machine learners. This paper investigates the use of
probability calibration approaches within the DML framework. Simulation results
demonstrate that calibrating propensity scores may significantly reduces the
root mean squared error of DML estimates of the average treatment effect in
finite samples. We showcase it in an empirical example and provide conditions
under which calibration does not alter the asymptotic properties of the DML
estimator.
arXiv link: http://arxiv.org/abs/2409.04874v2
Horowitz-Manski-Lee Bounds with Multilayered Sample Selection
the presence of firm heterogeneity. When training affects the sorting of
workers to firms, sample selection is no longer binary but is “multilayered".
This paper extends the canonical Heckman (1979) sample selection model -- which
assumes selection is binary -- to a setting where it is multilayered. In this
setting Lee bounds set identifies a total effect that combines a
weighted-average of the causal effect of job training on wage rates across
firms with a weighted-average of the contrast in wages between different firms
for a fixed level of training. Thus, Lee bounds set identifies a
policy-relevant estimand only when firms pay homogeneous wages and/or when job
training does not affect worker sorting across firms. We derive analytic
expressions for sharp bounds for the causal effect of job training on wage
rates at each firm that leverage information on firm-specific wages. We
illustrate our partial identification approach with two empirical applications
to job training experiments. Our estimates demonstrate that even when
conventional Lee bounds are strictly positive, our within-firm bounds can be
tight around 0, showing that the canonical Lee bounds may capture only a pure
sorting effect of job training.
arXiv link: http://arxiv.org/abs/2409.04589v2
An MPEC Estimator for the Sequential Search Model
search models, using the MPEC (Mathematical Programming with Equilibrium
Constraints) approach. This method enhances numerical accuracy while avoiding
ad hoc components and errors related to equilibrium conditions. Monte Carlo
simulations show that the estimator performs better in small samples, with
lower bias and root-mean-squared error, though less effectively in large
samples. Despite these mixed results, the MPEC approach remains valuable for
identifying candidate parameters comparable to the benchmark, without relying
on ad hoc look-up tables, as it generates the table through solved equilibrium
constraints.
arXiv link: http://arxiv.org/abs/2409.04378v1
Extreme Quantile Treatment Effects under Endogeneity: Evaluating Policy Effects for the Most Vulnerable Individuals
extreme quantile treatment effects (QTEs) in the presence of endogeneity. Our
approach is applicable to a broad range of empirical research designs,
including instrumental variables design and regression discontinuity design,
among others. By leveraging regular variation and subsampling, the method
ensures robust performance even in extreme tails, where data may be sparse or
entirely absent. Simulation studies confirm the theoretical robustness of our
approach. Applying our method to assess the impact of job training provided by
the Job Training Partnership Act (JTPA), we find significantly negative QTEs
for the lowest quantiles (i.e., the most disadvantaged individuals),
contrasting with previous literature that emphasizes positive QTEs for
intermediate quantiles.
arXiv link: http://arxiv.org/abs/2409.03979v1
Performance of Empirical Risk Minimization For Principal Component Regression
minimization for principal component regression. Our analysis is nonparametric,
in the sense that the relation between the prediction target and the predictors
is not specified. In particular, we do not rely on the assumption that the
prediction target is generated by a factor model. In our analysis we consider
the cases in which the largest eigenvalues of the covariance matrix of the
predictors grow linearly in the number of predictors (strong signal regime) or
sublinearly (weak signal regime). The main result of this paper shows that
empirical risk minimization for principal component regression is consistent
for prediction and, under appropriate conditions, it achieves near-optimal
performance in both the strong and weak signal regimes.
arXiv link: http://arxiv.org/abs/2409.03606v2
Automatic Pricing and Replenishment Strategies for Vegetable Products Based on Data Analysis and Nonlinear Programming
limited shelf life, and their quality deteriorates with time. Most vegetable
varieties, if not sold on the day of delivery, become difficult to sell the
following day. Therefore, retailers usually perform daily quantitative
replenishment based on historical sales data and demand conditions. Vegetable
pricing typically uses a "cost-plus pricing" method, with retailers often
discounting products affected by transportation loss and quality decline. In
this context, reliable market demand analysis is crucial as it directly impacts
replenishment and pricing decisions. Given the limited retail space, a rational
sales mix becomes essential. This paper first uses data analysis and
visualization techniques to examine the distribution patterns and
interrelationships of vegetable sales quantities by category and individual
item, based on provided data on vegetable types, sales records, wholesale
prices, and recent loss rates. Next, it constructs a functional relationship
between total sales volume and cost-plus pricing for vegetable categories,
forecasts future wholesale prices using the ARIMA model, and establishes a
sales profit function and constraints. A nonlinear programming model is then
developed and solved to provide daily replenishment quantities and pricing
strategies for each vegetable category for the upcoming week. Further, we
optimize the profit function and constraints based on the actual sales
conditions and requirements, providing replenishment quantities and pricing
strategies for individual items on July 1 to maximize retail profit. Finally,
to better formulate replenishment and pricing decisions for vegetable products,
we discuss and forecast the data that retailers need to collect and analyses
how the collected data can be applied to the above issues.
arXiv link: http://arxiv.org/abs/2409.09065v1
Momentum Dynamics in Competitive Sports: A Multi-Model Analysis Using TOPSIS and Logistic Regression
the use of the TOPSIS model and 0-1 logistic regression model. First, the
TOPSIS model is employed to evaluate the performance of two tennis players,
with visualizations used to analyze the situation's evolution at every moment
in the match, explaining how "momentum" manifests in sports. Then, the 0-1
logistic regression model is utilized to verify the impact of "momentum" on
match outcomes, demonstrating that fluctuations in player performance and the
successive occurrence of successes are not random. Additionally, this paper
examines the indicators that influence the reversal of game situations by
analyzing key match data and testing the accuracy of the models with match
data. The findings show that the model accurately explains the conditions
during matches and can be generalized to other sports competitions. Finally,
the strengths, weaknesses, and potential future improvements of the model are
discussed.
arXiv link: http://arxiv.org/abs/2409.02872v1
The Impact of Data Elements on Narrowing the Urban-Rural Consumption Gap in China: Mechanisms and Policy Analysis
development, directly reflects the imbalance in urban and rural economic and
social development. Data elements, as an important component of New Quality
Productivity, are of significant importance in promoting economic development
and improving people's living standards in the information age. This study,
through the analysis of fixed-effects regression models, system GMM regression
models, and the intermediate effect model, found that the development level of
data elements to some extent promotes the narrowing of the urban-rural
consumption gap. At the same time, the intermediate variable of urban-rural
income gap plays an important role between data elements and consumption gap,
with a significant intermediate effect. The results of the study indicate that
the advancement of data elements can promote the balance of urban and rural
residents' consumption levels by reducing the urban-rural income gap, providing
theoretical support and policy recommendations for achieving common prosperity
and promoting coordinated urban-rural development. Building upon this, this
paper emphasizes the complex correlation between the development of data
elements and the urban-rural consumption gap, and puts forward policy
suggestions such as promoting the development of the data element market,
strengthening the construction of the digital economy and e-commerce, and
promoting integrated urban-rural development. Overall, the development of data
elements is not only an important path to reducing the urban-rural consumption
gap but also one of the key drivers for promoting the balanced development of
China's economic and social development. This study has a certain theoretical
and practical significance for understanding the mechanism of the urban-rural
consumption gap and improving policies for urban-rural economic development.
arXiv link: http://arxiv.org/abs/2409.02662v1
The Application of Green GDP and Its Impact on Global Economy and Environment: Analysis of GGDP based on SEEA model
the System of Environmental-Economic Accounting (SEEA) model to evaluate its
impact on global climate mitigation and economic health. GGDP is proposed as a
superior measure to tradi-tional GDP by incorporating natural resource
consumption, environmental pollution control, and degradation factors. The
study develops a GGDP model and employs grey correlation analysis and grey
prediction models to assess its relationship with these factors. Key findings
demonstrate that replacing GDP with GGDP can positively influence climate
change, partic-ularly in reducing CO2 emissions and stabilizing global
temperatures. The analysis further explores the implications of GGDP adoption
across developed and developing countries, with specific predictions for China
and the United States. The results indicate a potential increase in economic
levels for developing countries, while developed nations may experi-ence a
decrease. Additionally, the shift to GGDP is shown to significantly reduce
natural re-source depletion and population growth rates in the United States,
suggesting broader envi-ronmental and economic benefits. This paper highlights
the universal applicability of the GGDP model and its potential to enhance
environmental and economic policies globally.
arXiv link: http://arxiv.org/abs/2409.02642v1
Fitting an Equation to Data Impartially
scientific law) to data involving multiple variables. Ordinary (least squares)
regression is not suitable for this because the estimated relationship will
differ according to which variable is chosen as being dependent, and the
dependent variable is unrealistically assumed to be the only variable which has
any measurement error (noise). We present a very general method for estimating
a linear functional relationship between multiple noisy variables, which are
treated impartially, i.e. no distinction between dependent and independent
variables. The data are not assumed to follow any distribution, but all
variables are treated as being equally reliable. Our approach extends the
geometric mean functional relationship to multiple dimensions. This is
especially useful with variables measured in different units, as it is
naturally scale-invariant, whereas orthogonal regression is not. This is
because our approach is not based on minimizing distances, but on the symmetric
concept of correlation. The estimated coefficients are easily obtained from the
covariances or correlations, and correspond to geometric means of associated
least squares coefficients. The ease of calculation will hopefully allow
widespread application of impartial fitting to estimate relationships in a
neutral way.
arXiv link: http://arxiv.org/abs/2409.02573v1
Double Machine Learning at Scale to Predict Causal Impact of Customer Actions
to inform both short- and long-term investment decisions of various types. In
this paper, we apply the double machine learning (DML) methodology to estimate
the CI values across 100s of customer actions of business interest and 100s of
millions of customers. We operationalize DML through a causal ML library based
on Spark with a flexible, JSON-driven model configuration approach to estimate
CI at scale (i.e., across hundred of actions and millions of customers). We
outline the DML methodology and implementation, and associated benefits over
the traditional potential outcomes based CI model. We show population-level as
well as customer-level CI values along with confidence intervals. The
validation metrics show a 2.2% gain over the baseline methods and a 2.5X gain
in the computational time. Our contribution is to advance the scalable
application of CI, while also providing an interface that allows faster
experimentation, cross-platform support, ability to onboard new use cases, and
improves accessibility of underlying code for partner teams.
arXiv link: http://arxiv.org/abs/2409.02332v1
Distribution Regression Difference-In-Differences
in the difference-in-differences (DiD) design. Our procedure is particularly
useful when the treatment effect differs across the distribution of the outcome
variable. Our proposed estimator easily incorporates covariates and,
importantly, can be extended to settings where the treatment potentially
affects the joint distribution of multiple outcomes. Our key identifying
restriction is that the counterfactual distribution of the treated in the
untreated state has no interaction effect between treatment and time. This
assumption results in a parallel trend assumption on a transformation of the
distribution. We highlight the relationship between our procedure and
assumptions with the changes-in-changes approach of Athey and Imbens (2006). We
also reexamine the Card and Krueger (1994) study of the impact of minimum wages
on employment to illustrate the utility of our approach.
arXiv link: http://arxiv.org/abs/2409.02311v2
Variable selection in convex nonparametric least squares via structured Lasso: An application to the Swedish electricity distribution networks
squares (CNLS). Whereas the least absolute shrinkage and selection operator
(Lasso) is a popular technique for least squares, its variable selection
performance is unknown in CNLS problems. In this work, we investigate the
performance of the Lasso estimator and find out it is usually unable to select
variables efficiently. Exploiting the unique structure of the subgradients in
CNLS, we develop a structured Lasso method by combining $\ell_1$-norm and
$\ell_{\infty}$-norm. The relaxed version of the structured Lasso is proposed
for achieving model sparsity and predictive performance simultaneously, where
we can control the two effects--variable selection and model shrinkage--using
separate tuning parameters. A Monte Carlo study is implemented to verify the
finite sample performance of the proposed approaches. We also use real data
from Swedish electricity distribution networks to illustrate the effects of the
proposed variable selection techniques. The results from the simulation and
application confirm that the proposed structured Lasso performs favorably,
generally leading to sparser and more accurate predictive models, relative to
the conventional Lasso methods in the literature.
arXiv link: http://arxiv.org/abs/2409.01911v2
Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions
relax functional form assumptions if used within appropriate frameworks.
However, most of these frameworks assume settings with cross-sectional data,
whereas researchers often have access to panel data, which in traditional
methods helps to deal with unobserved heterogeneity between units. In this
paper, we explore how we can adapt double/debiased machine learning (DML)
(Chernozhukov et al., 2018) for panel data in the presence of unobserved
heterogeneity. This adaptation is challenging because DML's cross-fitting
procedure assumes independent data and the unobserved heterogeneity is not
necessarily additively separable in settings with nonlinear observed
confounding. We assess the performance of several intuitively appealing
estimators in a variety of simulations. While we find violations of the
cross-fitting assumptions to be largely inconsequential for the accuracy of the
effect estimates, many of the considered methods fail to adequately account for
the presence of unobserved heterogeneity. However, we find that using
predictive models based on the correlated random effects approach (Mundlak,
1978) within DML leads to accurate coefficient estimates across settings, given
a sample size that is large relative to the number of observed confounders. We
also show that the influence of the unobserved heterogeneity on the observed
confounders plays a significant role for the performance of most alternative
methods.
arXiv link: http://arxiv.org/abs/2409.01266v1
Bandit Algorithms for Policy Learning: Methods, Implementation, and Welfare-performance
sample for the estimation of an optimal treatment assignment policy-is a
commonly assumed framework of policy learning. An arguably more realistic but
challenging scenario is a dynamic setting in which the planner performs
experimentation and exploitation simultaneously with subjects that arrive
sequentially. This paper studies bandit algorithms for learning an optimal
individualised treatment assignment policy. Specifically, we study
applicability of the EXP4.P (Exponential weighting for Exploration and
Exploitation with Experts) algorithm developed by Beygelzimer et al. (2011) to
policy learning. Assuming that the class of policies has a finite
Vapnik-Chervonenkis dimension and that the number of subjects to be allocated
is known, we present a high probability welfare-regret bound of the algorithm.
To implement the algorithm, we use an incremental enumeration algorithm for
hyperplane arrangements. We perform extensive numerical analysis to assess the
algorithm's sensitivity to its tuning parameters and its welfare-regret
performance. Further simulation exercises are calibrated to the National Job
Training Partnership Act (JTPA) Study sample to determine how the algorithm
performs when applied to economic data. Our findings highlight various
computational challenges and suggest that the limited welfare gain from the
algorithm is due to substantial heterogeneity in causal effects in the JTPA
data.
arXiv link: http://arxiv.org/abs/2409.00379v1
Weighted Regression with Sybil Networks
assumes multiple identities -- is a pervasive feature. This complicates
experiments, as off-the-shelf regression estimators at least assume known
network topologies (if not fully independent observations) when Sybil network
topologies in practice are often unknown. The literature has exclusively
focused on techniques to detect Sybil networks, leading many experimenters to
subsequently exclude suspected networks entirely before estimating treatment
effects. I present a more efficient solution in the presence of these suspected
Sybil networks: a weighted regression framework that applies weights based on
the probabilities that sets of observations are controlled by single actors. I
show in the paper that the MSE-minimizing solution is to set the weight matrix
equal to the inverse of the expected network topology. I demonstrate the
methodology on simulated data, and then I apply the technique to a competition
with suspected Sybil networks run on the Sui blockchain and show reductions in
the standard error of the estimate by 6 - 24%.
arXiv link: http://arxiv.org/abs/2408.17426v3
State Space Model of Realized Volatility under the Existence of Dependent Market Microstructure Noise
in finance. Realized Volatility (RV) is an estimator of the volatility
calculated using high-frequency observed prices. RV has lately attracted
considerable attention of econometrics and mathematical finance. However, it is
known that high-frequency data includes observation errors called market
microstructure noise (MN). Nagakura and Watanabe[2015] proposed a state space
model that resolves RV into true volatility and influence of MN. In this paper,
we assume a dependent MN that autocorrelates and correlates with return as
reported by Hansen and Lunde[2006] and extends the results of Nagakura and
Watanabe[2015] and compare models by simulation and actual data.
arXiv link: http://arxiv.org/abs/2408.17187v1
Sensitivity Analysis for Dynamic Discrete Choice Models
factor, are being fixed instead of being estimated. This paper proposes two
sensitivity analysis procedures for dynamic discrete choice models with respect
to the fixed parameters. First, I develop a local sensitivity measure that
estimates the change in the target parameter for a unit change in the fixed
parameter. This measure is fast to compute as it does not require model
re-estimation. Second, I propose a global sensitivity analysis procedure that
uses model primitives to study the relationship between target parameters and
fixed parameters. I show how to apply the sensitivity analysis procedures of
this paper through two empirical applications.
arXiv link: http://arxiv.org/abs/2408.16330v1
Marginal homogeneity tests with panel data
distributions are homogeneous or time-invariant. Marginal homogeneity is
relevant in economic settings such as dynamic discrete games. In this paper, we
propose several tests for the hypothesis of marginal homogeneity and
investigate their properties. We consider an asymptotic framework in which the
number of individuals n in the panel diverges, and the number of periods T is
fixed. We implement our tests by comparing a studentized or non-studentized
T-sample version of the Cramer-von Mises statistic with a suitable critical
value. We propose three methods to construct the critical value: asymptotic
approximations, the bootstrap, and time permutations. We show that the first
two methods result in asymptotically exact hypothesis tests. The permutation
test based on a non-studentized statistic is asymptotically exact when T=2, but
is asymptotically invalid when T>2. In contrast, the permutation test based on
a studentized statistic is always asymptotically exact. Finally, under a
time-exchangeability assumption, the permutation test is exact in finite
samples, both with and without studentization.
arXiv link: http://arxiv.org/abs/2408.15862v1
BayesSRW: Bayesian Sampling and Re-weighting approach for variance reduction
limited resources prevent exhaustive measurement across all subjects. We
consider a setting where samples are drawn from multiple groups, each following
a distribution with unknown mean and variance parameters. We introduce a novel
sampling strategy, motivated simply by Cauchy-Schwarz inequality, which
minimizes the variance of the population mean estimator by allocating samples
proportionally to both the group size and the standard deviation. This approach
improves the efficiency of sampling by focusing resources on groups with
greater variability, thereby enhancing the precision of the overall estimate.
Additionally, we extend our method to a two-stage sampling procedure in a Bayes
approach, named BayesSRW, where a preliminary stage is used to estimate the
variance, which then informs the optimal allocation of the remaining sampling
budget. Through simulation examples, we demonstrate the effectiveness of our
approach in reducing estimation uncertainty and providing more reliable
insights in applications ranging from user experience surveys to
high-dimensional peptide array studies.
arXiv link: http://arxiv.org/abs/2408.15454v1
The effects of data preprocessing on probability of default model fairness
learning models has become a critical concern, especially given the potential
for biased predictions that disproportionately affect certain demographic
groups. This study investigates the impact of data preprocessing, with a
specific focus on Truncated Singular Value Decomposition (SVD), on the fairness
and performance of probability of default models. Using a comprehensive dataset
sourced from Kaggle, various preprocessing techniques, including SVD, were
applied to assess their effect on model accuracy, discriminatory power, and
fairness.
arXiv link: http://arxiv.org/abs/2408.15452v1
Double/Debiased CoCoLASSO of Treatment Effects with Mismeasured High-Dimensional Control Variables
with additive measurement error, a prevalent challenge in modern econometrics.
We introduce the Double/Debiased Convex Conditioned LASSO (Double/Debiased
CoCoLASSO), which extends the double/debiased machine learning framework to
accommodate mismeasured covariates. Our principal contributions are threefold.
(1) We construct a Neyman-orthogonal score function that remains valid under
measurement error, incorporating a bias correction term to account for
error-induced correlations. (2) We propose a method of moments estimator for
the measurement error variance, enabling implementation without prior knowledge
of the error covariance structure. (3) We establish the $N$-consistency
and asymptotic normality of our estimator under general conditions, allowing
for both the number of covariates and the magnitude of measurement error to
increase with the sample size. Our theoretical results demonstrate the
estimator's efficiency within the class of regularized high-dimensional
estimators accounting for measurement error. Monte Carlo simulations
corroborate our asymptotic theory and illustrate the estimator's robust
performance across various levels of measurement error. Notably, our
covariance-oblivious approach nearly matches the efficiency of methods that
assume known error variance.
arXiv link: http://arxiv.org/abs/2408.14671v1
Modeling the Dynamics of Growth in Master-Planned Communities
housing development at a master-planned community during a transition from high
to low growth. Our approach draws on detailed historical data to model the
dynamics of the market participants, producing results that are entirely
data-driven and free of bias. While traditional time series forecasting methods
often struggle to account for nonlinear regime changes in growth, our approach
successfully captures the onset of buildout as well as external economic
shocks, such as the 1990 and 2008-2011 recessions and the 2021 post-pandemic
boom.
This research serves as a valuable tool for urban planners, homeowner
associations, and property stakeholders aiming to navigate the complexities of
growth at master-planned communities during periods of both system stability
and instability.
arXiv link: http://arxiv.org/abs/2408.14214v2
Endogenous Treatment Models with Social Interactions: An Application to the Impact of Exercise on Self-Esteem
interactions in both the treatment and outcome equations. We model the
interactions between individuals in an internally consistent manner via a game
theoretic approach based on discrete Bayesian games. This introduces a
substantial computational burden in estimation which we address through a
sequential version of the nested fixed point algorithm. We also provide some
relevant treatment effects, and procedures for their estimation, which capture
the impact on both the individual and the total sample. Our empirical
application examines the impact of an individual's exercise frequency on her
level of self-esteem. We find that an individual's exercise frequency is
influenced by her expectation of her friends'. We also find that an
individual's level of self-esteem is affected by her level of exercise and, at
relatively lower levels of self-esteem, by the expectation of her friends'
self-esteem.
arXiv link: http://arxiv.org/abs/2408.13971v1
Inference on Consensus Ranking of Distributions
consensus favors one distribution over another (of earnings, productivity,
asset returns, test scores, etc.). Specifically, given a sample from each of
two distributions, I propose statistical inference methods to learn about the
set of utility functions for which the first distribution has higher expected
utility than the second distribution. With high probability, an "inner"
confidence set is contained within this true set, while an "outer" confidence
set contains the true set. Such confidence sets can be formed by inverting a
proposed multiple testing procedure that controls the familywise error rate.
Theoretical justification comes from empirical process results, given that very
large classes of utility functions are generally Donsker (subject to finite
moments). The theory additionally justifies a uniform (over utility functions)
confidence band of expected utility differences, as well as tests with a
utility-based "restricted stochastic dominance" as either the null or
alternative hypothesis. Simulated and empirical examples illustrate the
methodology.
arXiv link: http://arxiv.org/abs/2408.13949v1
Cross-sectional Dependence in Idiosyncratic Volatility
dependence in the idiosyncratic volatilities of assets using high frequency
data. We first consider the estimation of standard measures of dependence in
the idiosyncratic volatilities such as covariances and correlations. Naive
estimators of these measures are biased due to the use of the error-laden
estimates of idiosyncratic volatilities. We provide bias-corrected estimators
and the relevant asymptotic theory. Next, we introduce an idiosyncratic
volatility factor model, in which we decompose the variation in idiosyncratic
volatilities into two parts: the variation related to the systematic factors
such as the market volatility, and the residual variation. Again, naive
estimators of the decomposition are biased, and we provide bias-corrected
estimators. We also provide the asymptotic theory that allows us to test
whether the residual (non-systematic) components of the idiosyncratic
volatilities exhibit cross-sectional dependence. We apply our methodology to
the S&P 100 index constituents, and document strong cross-sectional dependence
in their idiosyncratic volatilities. We consider two different sets of
idiosyncratic volatility factors, and find that neither can fully account for
the cross-sectional dependence in idiosyncratic volatilities. For each model,
we map out the network of dependencies in residual (non-systematic)
idiosyncratic volatilities across all stocks.
arXiv link: http://arxiv.org/abs/2408.13437v2
Difference-in-differences with as few as two cross-sectional units -- A new perspective to the democracy-growth debate
effects. This challenge, for example, crops up in studies of the impact of
democracy on economic growth, where findings vary substantially due to
differences in country composition. To address this challenge, this paper
introduces a Difference-in-Differences (DiD) estimator that leverages temporal
variation in the data to estimate unit-specific average treatment effects on
the treated (ATT) with as few as two cross-sectional units. Under weak
identification and temporal dependence conditions, the proposed DiD estimator
is shown to be asymptotically normal. The method is further complemented with
an identification test that, unlike pre-trends tests, is more powerful and can
detect violations of parallel trends in post-treatment periods. Empirical
results using the DiD estimator suggest Benin's economy would have been 6.3%
smaller on average over the 1993-2018 period had she not democratised.
arXiv link: http://arxiv.org/abs/2408.13047v4
Machine Learning and the Yield Curve: Tree-Based Macroeconomic Regime Switching
dynamic Nelson-Siegel (DNS) yield-curve model. In particular, we customize the
tree-growing algorithm to partition macroeconomic variables based on the DNS
model's marginal likelihood, thereby identifying regime-shifting patterns in
the yield curve. Compared to traditional Markov-switching models, our model
offers clear economic interpretation via macroeconomic linkages and ensures
computational simplicity. In an empirical application to U.S. Treasury yields,
we find (1) important yield-curve regime switching, and (2) evidence that
macroeconomic variables have predictive power for the yield curve when the
federal funds rate is high, but not in other regimes, thereby refining the
notion of yield curve ”macro-spanning”.
arXiv link: http://arxiv.org/abs/2408.12863v2
A nested nonparametric logit model for microtransit revenue management supplemented with citywide synthetic data
accessibility, reduce congestion, and enhance flexibility. However, its
heterogeneous impacts across travelers necessitate better tools for
microtransit forecasting and revenue management, especially when actual usage
data are limited. We propose a nested nonparametric model for joint travel mode
and ride pass subscription choice, estimated using marginal subscription data
and synthetic populations. The model improves microtransit choice modeling by
(1) leveraging citywide synthetic data for greater spatiotemporal granularity,
(2) employing an agent-based estimation approach to capture heterogeneous user
preferences, and (3) integrating mode choice parameters into subscription
choice modeling. We apply our methodology to a case study in Arlington, TX,
using synthetic data from Replica Inc. and microtransit data from Via. Our
model accurately predicts the number of subscribers in the upper branch and
achieves a high McFadden R2 in the lower branch (0.603 for weekday trips and
0.576 for weekend trips), while also retrieving interpretable elasticities and
consumer surplus. We further integrate the model into a simulation-based
framework for microtransit revenue management. For the ride pass pricing
policy, our simulation results show that reducing the price of the weekly pass
($25 -> $18.9) and monthly pass ($80 -> $71.5) would surprisingly increase
total revenue by $127 per day. For the subsidy policy, our simulation results
show that a 100% fare discount would reduce 61 car trips to AT&T Stadium for a
game event, and increase 82 microtransit trips to Medical City Arlington, but
require subsidies of $533 per event and $483 per day, respectively.
arXiv link: http://arxiv.org/abs/2408.12577v2
Momentum Informed Inflation-at-Risk
which has seen it be researched extensively. Surprisingly, the same cannot be
said for Inflation-at-Risk where both tails, deflation and high inflation, are
of key concern to policymakers, which has seen comparatively much less
research. This paper will tackle this gap and provide estimates for
Inflation-at-Risk. The key insight of the paper is that inflation is best
characterised by a combination of two types of nonlinearities: quantile
variation, and conditioning on the momentum of inflation.
arXiv link: http://arxiv.org/abs/2408.12286v1
Enhancing Causal Discovery in Financial Networks with Piecewise Quantile Regression
within the price series of speculative assets. Across the various methods used
to infer these networks, there is a general reliance on predictive modelling to
capture cross-correlation effects. These methods usually model the flow of
mean-response information, or the propagation of volatility and risk within the
market. Such techniques, though insightful, don't fully capture the broader
distribution-level causality that is possible within speculative markets. This
paper introduces a novel approach, combining quantile regression with a
piecewise linear embedding scheme - allowing us to construct causality networks
that identify the complex tail interactions inherent to financial markets.
Applying this method to 260 cryptocurrency return series, we uncover
significant tail-tail causal effects and substantial causal asymmetry. We
identify a propensity for coins to be self-influencing, with comparatively
sparse cross variable effects. Assessing all link types in conjunction, Bitcoin
stands out as the primary influencer - a nuance that is missed in conventional
linear mean-response analyses. Our findings introduce a comprehensive framework
for modelling distributional causality, paving the way towards more holistic
representations of causality in financial markets.
arXiv link: http://arxiv.org/abs/2408.12210v1
An Econometric Analysis of Large Flexible Cryptocurrency-mining Consumers in Electricity Markets
firms, with individual consumption levels reaching 700MW. This study examines
the behavior of these firms in Texas, focusing on how their consumption is
influenced by cryptocurrency conversion rates, electricity prices, local
weather, and other factors. We transform the skewed electricity consumption
data of these firms, perform correlation analysis, and apply a seasonal
autoregressive moving average model for analysis. Our findings reveal that,
surprisingly, short-term mining electricity consumption is not directly
correlated with cryptocurrency conversion rates. Instead, the primary
influencers are the temperature and electricity prices. These firms also
respond to avoid transmission and distribution network (T&D) charges - commonly
referred to as four Coincident peak (4CP) charges - during the summer months.
As the scale of these firms is likely to surge in future years, the developed
electricity consumption model can be used to generate public, synthetic
datasets to understand the overall impact on the power grid. The developed
model could also lead to better pricing mechanisms to effectively use the
flexibility of these resources towards improving power grid reliability.
arXiv link: http://arxiv.org/abs/2408.12014v2
Valuing an Engagement Surface using a Large Scale Dynamic Causal Model
(ES) have become ubiquitous across retail services. These engagement surfaces
perform an increasing range of functions, including recommending new products
for purchase, reminding customers of their orders and providing delivery
notifications. Understanding the causal effect of engagement surfaces on value
driven for customers and businesses remains an open scientific question. In
this paper, we develop a dynamic causal model at scale to disentangle value
attributable to an ES, and to assess its effectiveness. We demonstrate the
application of this model to inform business decision-making by understanding
returns on investment in the ES, and identifying product lines and features
where the ES adds the most value.
arXiv link: http://arxiv.org/abs/2408.11967v1
SPORTSCausal: Spill-Over Time Series Causal Inference
causal inference across various fields, including business analysis, economic
studies, sociology, clinical research, and network learning. The primary
advantage of RCTs over observational studies lies in their ability to
significantly reduce noise from individual variance. However, RCTs depend on
strong assumptions, such as group independence, time independence, and group
randomness, which are not always feasible in real-world applications.
Traditional inferential methods, including analysis of covariance (ANCOVA),
often fail when these assumptions do not hold. In this paper, we propose a
novel approach named Spillover Time
Series Causal (\verb+SPORTSCausal+), which enables the
estimation of treatment effects without relying on these stringent assumptions.
We demonstrate the practical applicability of \verb+SPORTSCausal+ through a
real-world budget-control experiment. In this experiment, data was collected
from both a 5% live experiment and a 50% live experiment using the same
treatment. Due to the spillover effect, the vanilla estimation of the treatment
effect was not robust across different treatment sizes, whereas
\verb+SPORTSCausal+ provided a robust estimation.
arXiv link: http://arxiv.org/abs/2408.11951v1
Actually, There is No Rotational Indeterminacy in the Approximate Factor Model
principal components converge in mean square (up to sign) under the standard
assumptions for $n\to \infty$. Consequently, we have a generic interpretation
of what the principal components estimator is actually identifying and existing
results on factor identification are reinforced and refined. Based on this
result, we provide a new asymptotic theory for the approximate factor model
entirely without rotation matrices. We show that the factors space is
consistently estimated with finite $T$ for $n\to \infty$ while consistency of
the factors a.k.a the $L^2$ limit of the normalised principal components
requires that both $(n, T)\to \infty$.
arXiv link: http://arxiv.org/abs/2408.11676v2
Robust Bayes Treatment Choice with Partial Identification
identification, through the lens of robust (multiple prior) Bayesian analysis.
We use a convenient set of prior distributions to derive ex-ante and ex-post
robust Bayes decision rules, both for decision makers who can randomize and for
decision makers who cannot.
Our main messages are as follows: First, ex-ante and ex-post robust Bayes
decision rules do not tend to agree in general, whether or not randomized rules
are allowed. Second, randomized treatment assignment for some data realizations
can be optimal in both ex-ante and, perhaps more surprisingly, ex-post
problems. Therefore, it is usually with loss of generality to exclude
randomized rules from consideration, even when regret is evaluated ex-post.
We apply our results to a stylized problem where a policy maker uses
experimental data to choose whether to implement a new policy in a population
of interest, but is concerned about the external validity of the experiment at
hand (Stoye, 2012); and to the aggregation of data generated by multiple
randomized control trials in different sites to make a policy choice in a
population for which no experimental data are available (Manski, 2020; Ishihara
and Kitagawa, 2021).
arXiv link: http://arxiv.org/abs/2408.11621v1
Towards an Inclusive Approach to Corporate Social Responsibility (CSR) in Morocco: CGEM's Commitment
environmental concerns into their activities and their relations with
stakeholders. It encompasses all actions aimed at the social good, above and
beyond corporate interests and legal requirements. Various international
organizations, authors and researchers have explored the notion of CSR and
proposed a range of definitions reflecting their perspectives on the concept.
In Morocco, although Moroccan companies are not overwhelmingly embracing CSR,
several factors are encouraging them to integrate the CSR approach not only
into their discourse, but also into their strategies. The CGEM is actively
involved in promoting CSR within Moroccan companies, awarding the "CGEM Label
for CSR" to companies that meet the criteria set out in the CSR Charter. The
process of labeling Moroccan companies is in full expansion. The graphs
presented in this article are broken down according to several criteria, such
as company size, sector of activity and listing on the Casablanca Stock
Exchange, in order to provide an overview of CSR-labeled companies in Morocco.
The approach adopted for this article is a qualitative one aimed at presenting,
firstly, the different definitions of the CSR concept and its evolution over
time. In this way, the study focuses on the Moroccan context to dissect and
analyze the state of progress of CSR integration in Morocco and the various
efforts made by the CGEM to implement it. According to the data, 124 Moroccan
companies have been awarded the CSR label. For a label in existence since 2006,
this figure reflects a certain reluctance on the part of Moroccan companies to
fully implement the CSR approach in their strategies. Nevertheless, Morocco is
in a transitional phase, marked by the gradual adoption of various socially
responsible practices.
arXiv link: http://arxiv.org/abs/2408.11519v1
Inference with Many Weak Instruments and Heterogeneity
model with many potentially weak instruments, in the presence of heterogeneous
treatment effects. I first show that existing test procedures, including those
that are robust to either weak instruments or heterogeneous treatment effects,
can be arbitrarily oversized. I propose a novel and valid test based on a score
statistic and a “leave-three-out" variance estimator. In the presence of
heterogeneity and within the class of tests that are functions of the
leave-one-out analog of a maximal invariant, this test is asymptotically the
uniformly most powerful unbiased test. In two applications to judge and
quarter-of-birth instruments, the proposed inference procedure also yields a
bounded confidence set while some existing methods yield unbounded or empty
confidence sets.
arXiv link: http://arxiv.org/abs/2408.11193v3
Conditional nonparametric variable screening by neural factor regression
effectively screen correlated covariates in high-dimension, we propose a
conditional variable screening test based on non-parametric regression using
neural networks due to their representation power. We ask the question whether
individual covariates have additional contributions given the latent factors or
more generally a set of variables. Our test statistics are based on the
estimated partial derivative of the regression function of the candidate
variable for screening and a observable proxy for the latent factors. Hence,
our test reveals how much predictors contribute additionally to the
non-parametric regression after accounting for the latent factors. Our
derivative estimator is the convolution of a deep neural network regression
estimator and a smoothing kernel. We demonstrate that when the neural network
size diverges with the sample size, unlike estimating the regression function
itself, it is necessary to smooth the partial derivative of the neural network
estimator to recover the desired convergence rate for the derivative. Moreover,
our screening test achieves asymptotic normality under the null after finely
centering our test statistics that makes the biases negligible, as well as
consistency for local alternatives under mild conditions. We demonstrate the
performance of our test in a simulation study and two real world applications.
arXiv link: http://arxiv.org/abs/2408.10825v1
Gradient Wild Bootstrap for Instrumental Variable Quantile Regressions with Weak and Few Clusters
variable quantile regressions in the framework of a small number of large
clusters in which the number of clusters is viewed as fixed, and the number of
observations for each cluster diverges to infinity. For the Wald inference, we
show that our wild bootstrap Wald test, with or without studentization using
the cluster-robust covariance estimator (CRVE), controls size asymptotically up
to a small error as long as the parameter of endogenous variable is strongly
identified in at least one of the clusters. We further show that the wild
bootstrap Wald test with CRVE studentization is more powerful for distant local
alternatives than that without. Last, we develop a wild bootstrap
Anderson-Rubin (AR) test for the weak-identification-robust inference. We show
it controls size asymptotically up to a small error, even under weak or partial
identification for all clusters. We illustrate the good finite-sample
performance of the new inference methods using simulations and provide an
empirical application to a well-known dataset about US local labor markets.
arXiv link: http://arxiv.org/abs/2408.10686v1
Continuous difference-in-differences with double/debiased machine learning
treatments. Specifically, the average treatment effect on the treated (ATT) at
any level of treatment intensity is identified under a conditional parallel
trends assumption. Estimating the ATT in this framework requires first
estimating infinite-dimensional nuisance parameters, particularly the
conditional density of the continuous treatment, which can introduce
substantial bias. To address this challenge, we propose estimators for the
causal parameters under the double/debiased machine learning framework and
establish their asymptotic normality. Additionally, we provide consistent
variance estimators and construct uniform confidence bands based on a
multiplier bootstrap procedure. To demonstrate the effectiveness of our
approach, we apply our estimators to the 1983 Medicare Prospective Payment
System (PPS) reform studied by Acemoglu and Finkelstein (2008), reframing it as
a DiD with continuous treatment and nonparametrically estimating its effects.
arXiv link: http://arxiv.org/abs/2408.10509v4
kendallknight: An R Package for Efficient Implementation of Kendall's Correlation Coefficient Computation
correlation coefficient computation, significantly improving the processing
time for large datasets without sacrificing accuracy. The kendallknight
package, following Knight (1966) and posterior literature, reduces the
computational complexity resulting in drastic reductions in computation time,
transforming operations that would take minutes or hours into milliseconds or
minutes, while maintaining precision and correctly handling edge cases and
errors. The package is particularly advantageous in econometric and statistical
contexts where rapid and accurate calculation of Kendall's correlation
coefficient is desirable. Benchmarks demonstrate substantial performance gains
over the base R implementation, especially for large datasets.
arXiv link: http://arxiv.org/abs/2408.09618v5
Experimental Design For Causal Inference Through An Optimization Lens
causal questions across a wide range of applications, including agricultural
experiments, clinical trials, industrial experiments, social experiments, and
digital experiments. Although valuable in such applications, the costs of
experiments often drive experimenters to seek more efficient designs. Recently,
experimenters have started to examine such efficiency questions from an
optimization perspective, as experimental design problems are fundamentally
decision-making problems. This perspective offers a lot of flexibility in
leveraging various existing optimization tools to study experimental design
problems. This manuscript thus aims to examine the foundations of experimental
design problems in the context of causal inference as viewed through an
optimization lens.
arXiv link: http://arxiv.org/abs/2408.09607v2
Anytime-Valid Inference for Double/Debiased Machine Learning of Causal Parameters
years for learning causal/structural parameters, in part due to its flexibility
and adaptability to high-dimensional nuisance functions as well as its ability
to avoid bias from regularization or overfitting. However, the classic
double-debiased framework is only valid asymptotically for a predetermined
sample size, thus lacking the flexibility of collecting more data if sharper
inference is needed, or stopping data collection early if useful inferences can
be made earlier than expected. This can be of particular concern in large scale
experimental studies with huge financial costs or human lives at stake, as well
as in observational studies where the length of confidence of intervals do not
shrink to zero even with increasing sample size due to partial identifiability
of a structural parameter. In this paper, we present time-uniform counterparts
to the asymptotic DML results, enabling valid inference and confidence
intervals for structural parameters to be constructed at any arbitrary
(possibly data-dependent) stopping time. We provide conditions which are only
slightly stronger than the standard DML conditions, but offer the stronger
guarantee for anytime-valid inference. This facilitates the transformation of
any existing DML method to provide anytime-valid guarantees with minimal
modifications, making it highly adaptable and easy to use. We illustrate our
procedure using two instances: a) local average treatment effect in online
experiments with non-compliance, and b) partial identification of average
treatment effect in observational studies with potential unmeasured
confounding.
arXiv link: http://arxiv.org/abs/2408.09598v2
Deep Learning for the Estimation of Heterogeneous Parameters in Discrete Choice Models
approach of Farrell, Liang, and Misra (2021a), who propose to use deep learning
for the estimation of heterogeneous parameters in economic models, in the
context of discrete choice models. The approach combines the structure imposed
by economic models with the flexibility of deep learning, which assures the
interpretebility of results on the one hand, and allows estimating flexible
functional forms of observed heterogeneity on the other hand. For inference
after the estimation with deep learning, Farrell et al. (2021a) derive an
influence function that can be applied to many quantities of interest. We
conduct a series of Monte Carlo experiments that investigate the impact of
regularization on the proposed estimation and inference procedure in the
context of discrete choice models. The results show that the deep learning
approach generally leads to precise estimates of the true average parameters
and that regular robust standard errors lead to invalid inference results,
showing the need for the influence function approach for inference. Without
regularization, the influence function approach can lead to substantial bias
and large estimated standard errors caused by extreme outliers. Regularization
reduces this property and stabilizes the estimation procedure, but at the
expense of inducing an additional bias. The bias in combination with decreasing
variance associated with increasing regularization leads to the construction of
invalid inferential statements in our experiments. Repeated sample splitting,
unlike regularization, stabilizes the estimation approach without introducing
an additional bias, thereby allowing for the construction of valid inferential
statements.
arXiv link: http://arxiv.org/abs/2408.09560v1
Counterfactual and Synthetic Control Method: Causal Inference with Instrumented Principal Component Analysis
framework of counterfactual and synthetic control. Matching forward the
generalized synthetic control method, our instrumented principal component
analysis method instruments factor loadings with predictive covariates rather
than including them as regressors. These instrumented factor loadings exhibit
time-varying dynamics, offering a better economic interpretation. Covariates
are instrumented through a transformation matrix, $\Gamma$, when we have a
large number of covariates it can be easily reduced in accordance with a small
number of latent factors helping us to effectively handle high-dimensional
datasets and making the model parsimonious. Moreover, the novel way of handling
covariates is less exposed to model misspecification and achieved better
prediction accuracy. Our simulations show that this method is less biased in
the presence of unobserved covariates compared to other mainstream approaches.
In the empirical application, we use the proposed method to evaluate the effect
of Brexit on foreign direct investment to the UK.
arXiv link: http://arxiv.org/abs/2408.09271v2
Externally Valid Selection of Experimental Sites via the k-Median Problem
to best choose where to experiment in order to optimize external validity as a
$k$-median problem, a popular problem in computer science and operations
research. We present conditions under which minimizing the worst-case,
welfare-based regret among all nonrandom schemes that select $k$ sites to
experiment is approximately equal - and sometimes exactly equal - to finding
the k most central vectors of baseline site-level covariates. The k-median
problem can be formulated as a linear integer program. Two empirical
applications illustrate the theoretical and computational benefits of the
suggested procedure.
arXiv link: http://arxiv.org/abs/2408.09187v2
Method of Moments Estimation for Affine Stochastic Volatility Models
volatility models. We first address the challenge of calculating moments for
the models by introducing a recursive equation for deriving closed-form
expressions for moments of any order. Consequently, we propose our moment
estimators. We then establish a central limit theorem for our estimators and
derive the explicit formulas for the asymptotic covariance matrix. Finally, we
provide numerical results to validate our method.
arXiv link: http://arxiv.org/abs/2408.09185v1
Revisiting the Many Instruments Problem using Random Matrix Theory
Traditional bias-adjustments are closely connected to the Silverstein equation.
Based on the theory of random matrices, we show that Ridge estimation of the
first-stage parameters reduces the implicit price of bias-adjustments. This
leads to a trade-off, allowing for less costly estimation of the causal effect,
which comes along with improved asymptotic properties. Our theoretical results
nest existing ones on bias approximation and adjustment with ordinary
least-squares in the first-stage regression and, moreover, generalize them to
settings with more instruments than observations. Finally, we derive the
optimal tuning parameter of Ridge regressions in simultaneous equations models,
which comprises the well-known result for single equation models as a special
case with uncorrelated error terms.
arXiv link: http://arxiv.org/abs/2408.08580v2
Quantile and Distribution Treatment Effects on the Treated with Possibly Non-Continuous Outcomes
non-continuous outcomes are either not identified or inference thereon is
infeasible using existing methods. By introducing functional index parallel
trends and no anticipation assumptions, this paper identifies and provides
uniform inference procedures for QTT/DTT. The inference procedure applies under
both the canonical two-group and staggered treatment designs with balanced
panels, unbalanced panels, or repeated cross-sections. Monte Carlo experiments
demonstrate the proposed method's robust and competitive performance, while an
empirical application illustrates its practical utility.
arXiv link: http://arxiv.org/abs/2408.07842v1
Your MMM is Broken: Identification of Nonlinear and Time-varying Effects in Marketing Mix Models
(MMMs), which are aggregate-level models of marketing effectiveness. Often
these models incorporate nonlinear effects, and either implicitly or explicitly
assume that marketing effectiveness varies over time. In this paper, we show
that nonlinear and time-varying effects are often not identifiable from
standard marketing mix data: while certain data patterns may be suggestive of
nonlinear effects, such patterns may also emerge under simpler models that
incorporate dynamics in marketing effectiveness. This lack of identification is
problematic because nonlinearities and dynamics suggest fundamentally different
optimal marketing allocations. We examine this identification issue through
theory and simulations, wherein we explore the exact conditions under which
conflation between the two types of models is likely to occur. In doing so, we
introduce a flexible Bayesian nonparametric model that allows us to both
flexibly simulate and estimate different data-generating processes. We show
that conflating the two types of effects is especially likely in the presence
of autocorrelated marketing variables, which are common in practice, especially
given the widespread use of stock variables to capture long-run effects of
advertising. We illustrate these ideas through numerous empirical applications
to real-world marketing mix data, showing the prevalence of the conflation
issue in practice. Finally, we show how marketers can avoid this conflation, by
designing experiments that strategically manipulate spending in ways that pin
down model form.
arXiv link: http://arxiv.org/abs/2408.07678v1
A Sparse Grid Approach for the Nonparametric Estimation of High-Dimensional Random Coefficient Models
models is the exponential increase of the number of parameters in the number of
random coefficients included into the model. This property, known as the curse
of dimensionality, restricts the application of such estimators to models with
moderately few random coefficients. This paper proposes a scalable
nonparametric estimator for high-dimensional random coefficient models. The
estimator uses a truncated tensor product of one-dimensional hierarchical basis
functions to approximate the underlying random coefficients' distribution. Due
to the truncation, the number of parameters increases at a much slower rate
than in the regular tensor product basis, rendering the nonparametric
estimation of high-dimensional random coefficient models feasible. The derived
estimator allows estimating the underlying distribution with constrained least
squares, making the approach computationally simple and fast. Monte Carlo
experiments and an application to data on the regulation of air pollution
illustrate the good performance of the estimator.
arXiv link: http://arxiv.org/abs/2408.07185v1
Endogeneity Corrections in Binary Outcome Models with Nonlinear Transformations: Identification and Inference
rank-based transformations is proposed. Identification without external
instruments is achieved under one of two assumptions: either the endogenous
regressor is a nonlinear function of one component of the error term,
conditional on the exogenous regressors, or the dependence between the
endogenous and exogenous regressors is nonlinear. Under these conditions, we
prove consistency and asymptotic normality. Monte Carlo simulations and an
application on German insolvency data illustrate the usefulness of the method.
arXiv link: http://arxiv.org/abs/2408.06977v5
Panel Data Unit Root testing: Overview
approaches to testing in cross-sectionally correlated panels are discussed,
preceding the analysis with an analysis of independent panels. In addition,
methods for testing in the case of non-linearity in the data (for example, in
the case of structural breaks) are presented, as well as methods for testing in
short panels, when the time dimension is small and finite. In conclusion, links
to existing packages that allow implementing some of the described methods are
provided.
arXiv link: http://arxiv.org/abs/2408.08908v1
Estimation and Inference of Average Treatment Effect in Percentage Points under Heterogeneity
as approximations of the average treatment effect (ATE) in percentage points.
This paper highlights the overlooked bias of this approximation under treatment
effect heterogeneity, arising from Jensen's inequality. The issue is
particularly relevant for difference-in-differences designs with
log-transformed outcomes and staggered treatment adoption, where treatment
effects often vary across groups and periods. I propose new estimation and
inference methods for the ATE in percentage points, which are applicable when
treatment effects vary across and within groups. I establish the methods'
large-sample properties and demonstrate their finite-sample performance through
simulations, revealing substantial discrepancies between conventional and
proposed measures. Two empirical applications further underscore the practical
importance of these methods.
arXiv link: http://arxiv.org/abs/2408.06624v2
An unbounded intensity model for point processes
can be locally unbounded without inducing an explosion. In contrast to an
orderly point process, for which the probability of observing more than one
event over a short time interval is negligible, the bursting intensity causes
an extreme clustering of events around the singularity. We propose a
nonparametric approach to detect such bursts in the intensity. It relies on a
heavy traffic condition, which admits inference for point processes over a
finite time interval. With Monte Carlo evidence, we show that our testing
procedure exhibits size control under the null, whereas it has high rejection
rates under the alternative. We implement our approach on high-frequency data
for the EUR/USD spot exchange rate, where the test statistic captures abnormal
surges in trading activity. We detect a nontrivial amount of intensity bursts
in these data and describe their basic properties. Trading activity during an
intensity burst is positively related to volatility, illiquidity, and the
probability of observing a drift burst. The latter effect is reinforced if the
order flow is imbalanced or the price elasticity of the limit order book is
large.
arXiv link: http://arxiv.org/abs/2408.06519v1
Method-of-Moments Inference for GLMs and Doubly Robust Functionals under Proportional Asymptotics
signal-to-noise (SNR) ratio in high-dimensional Generalized Linear Models
(GLMs), and explore their implications in inferring popular estimands such as
average treatment effects in high-dimensional observational studies. Under the
“proportional asymptotic” regime and Gaussian covariates with known
(population) covariance $\Sigma$, we derive Consistent and Asymptotically
Normal (CAN) estimators of our targets of inference through a Method-of-Moments
type of estimators that bypasses estimation of high dimensional nuisance
functions and hyperparameter tuning altogether. Additionally, under
non-Gaussian covariates, we demonstrate universality of our results under
certain additional assumptions on the regression coefficients and $\Sigma$. We
also demonstrate that knowing $\Sigma$ is not essential to our proposed
methodology when the sample covariance matrix estimator is invertible. Finally,
we complement our theoretical results with numerical experiments and
comparisons with existing literature.
arXiv link: http://arxiv.org/abs/2408.06103v3
Correcting invalid regression discontinuity designs with multiple time period data
outcome means at the cutoff, but this assumption often fails when other
treatments or policies are implemented at this cutoff. We characterize the bias
in sharp and fuzzy RD designs due to violations of continuity, and develop a
general identification framework that leverages multiple time periods to
estimate local effects on the (un)treated. We extend the framework to settings
with carry-over effects and time-varying running variables, highlighting
additional assumptions needed for valid causal inference. We propose an
estimation framework that extends the conventional and bias-corrected
single-period local linear regression framework to multiple periods and
different sampling schemes, and study its finite-sample performance in
simulations. Finally, we revisit a prior study on fiscal rules in Italy to
illustrate the practical utility of our approach.
arXiv link: http://arxiv.org/abs/2408.05847v2
Bank Cost Efficiency and Credit Market Structure Under a Volatile Exchange Rate
structure in a cross-section of banks that have non-trivial exposures to
foreign currency (FX) operations. We use unique data on quarterly revaluations
of FX assets and liabilities (Revals) that Russian banks were reporting between
2004 Q1 and 2020 Q2. {\it First}, we document that Revals constitute the
largest part of the banks' total costs, 26.5% on average, with considerable
variation across banks. {\it Second}, we find that stochastic estimates of cost
efficiency are both severely downward biased -- by 30% on average -- and
generally not rank preserving when Revals are ignored, except for the tails, as
our nonparametric copulas reveal. To ensure generalizability to other emerging
market economies, we suggest a two-stage approach that does not rely on Revals
but is able to shrink the downward bias in cost efficiency estimates by
two-thirds. {\it Third}, we show that Revals are triggered by the mismatch in
the banks' FX operations, which, in turn, is driven by household FX deposits
and the instability of Ruble's exchange rate. {\it Fourth}, we find that the
failure to account for Revals leads to the erroneous conclusion that the credit
market is inefficient, which is driven by the upper quartile of the banks'
distribution by total assets. Revals have considerable negative implications
for financial stability which can be attenuated by the cross-border
diversification of bank assets.
arXiv link: http://arxiv.org/abs/2408.05688v1
Change-Point Detection in Time Series Using Mixed Integer Programming
framework for detection and estimation of structural breaks in time series
regression models. The framework is constructed based on the least squares
problem subject to a penalty on the number of breakpoints. We restate the
$l_0$-penalized regression problem as a quadratic programming problem with
integer- and real-valued arguments and show that MIO is capable of finding
provably optimal solutions using a well-known optimization solver. Compared to
the popular $l_1$-penalized regression (LASSO) and other classical methods, the
MIO framework permits simultaneous estimation of the number and location of
structural breaks as well as regression coefficients, while accommodating the
option of specifying a given or minimal number of breaks. We derive the
asymptotic properties of the estimator and demonstrate its effectiveness
through extensive numerical experiments, confirming a more accurate estimation
of multiple breaks as compared to popular non-MIO alternatives. Two empirical
examples demonstrate usefulness of the framework in applications from business
and economic statistics.
arXiv link: http://arxiv.org/abs/2408.05665v3
ARMA-Design: Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments
treatments over time are frequently employed in many technological companies to
evaluate the performance of a newly developed policy, product, or treatment
relative to a baseline control. In many applications, the experimental units
receive a sequence of treatments over time. To handle these time-dependent
settings, existing A/B testing solutions typically assume a fully observable
experimental environment that satisfies the Markov condition. However, this
assumption often does not hold in practice.
This paper studies the optimal design for A/B testing in partially observable
online experiments. We introduce a controlled (vector) autoregressive moving
average model to capture partial observability. We introduce a small signal
asymptotic framework to simplify the calculation of asymptotic mean squared
errors of average treatment effect estimators under various designs. We develop
two algorithms to estimate the optimal design: one utilizing constrained
optimization and the other employing reinforcement learning. We demonstrate the
superior performance of our designs using two dispatch simulators that
realistically mimic the behaviors of drivers and passengers to create virtual
environments, along with two real datasets from a ride-sharing company. A
Python implementation of our proposal is available at
https://github.com/datake/ARMADesign.
arXiv link: http://arxiv.org/abs/2408.05342v4
What are the real implications for $CO_2$ as generation from renewables increases?
generation in the United States and are expected to continue to grow in the
next decades. In low carbon systems, generation from renewable energy sources
displaces conventional fossil fuel power plants resulting in lower system-level
emissions and emissions intensity. However, we find that intermittent
generation from renewables changes the way conventional thermal power plants
operate, and that the displacement of generation is not 1 to 1 as expected. Our
work provides a method that allows policy and decision makers to continue to
track the effect of additional renewable capacity and the resulting thermal
power plant operational responses.
arXiv link: http://arxiv.org/abs/2408.05209v1
Vela: A Data-Driven Proposal for Joint Collaboration in Space Exploration
activities and international cooperation through data and infrastructure
sharing in their Sustainable Development Goal 17 (SDG17). Current multilateral
space exploration paradigms, however, are divided between the Artemis and the
Roscosmos-CNSA programs to return to the moon and establish permanent human
settlements. As space agencies work to expand human presence in space, economic
resource consolidation in pursuit of technologically ambitious space
expeditions is the most sensible path to accomplish SDG17. This paper compiles
a budget dataset for the top five federally-funded space agencies: CNSA, ESA,
JAXA, NASA, and Roscosmos. Using time-series econometric anslysis methods in
STATA, this work analyzes each agency's economic contributions toward space
exploration. The dataset results are used to propose a multinational space
mission, Vela, for the development of an orbiting space station around Mars in
the late 2030s. Distribution of economic resources and technological
capabilities by the respective space programs are proposed to ensure
programmatic redundancy and increase the odds of success on the given timeline.
arXiv link: http://arxiv.org/abs/2408.04730v1
Difference-in-Differences for Health Policy and Practice: A Review of Modern Methods
inference method in health policy, employed to evaluate the real-world impact
of policies and programs. To estimate treatment effects, DiD relies on the
"parallel trends assumption", that on average treatment and comparison groups
would have had parallel trajectories in the absence of an intervention.
Historically, DiD has been considered broadly applicable and straightforward to
implement, but recent years have seen rapid advancements in DiD methods. This
paper reviews and synthesizes these innovations for medical and health policy
researchers. We focus on four topics: (1) assessing the parallel trends
assumption in health policy contexts; (2) relaxing the parallel trends
assumption when appropriate; (3) employing estimators to account for staggered
treatment timing; and (4) conducting robust inference for analyses in which
normal-based clustered standard errors are inappropriate. For each, we explain
challenges and common pitfalls in traditional DiD and modern methods available
to address these issues.
arXiv link: http://arxiv.org/abs/2408.04617v1
Semiparametric Estimation of Individual Coefficients in a Dyadic Link Formation Model Lacking Observable Characteristics
yet are difficult to estimate in the presence of individual specific effects
and in the absence of distributional assumptions regarding the model noise
component. The availability of (continuously distributed) individual or link
characteristics generally facilitates estimation. Yet, while data on social
networks has recently become more abundant, the characteristics of the entities
involved in the link may not be measured. Adapting the procedure of KS,
I propose to use network data alone in a semiparametric estimation of the
individual fixed effect coefficients, which carry the interpretation of the
individual relative popularity. This entails the possibility to anticipate how
a new-coming individual will connect in a pre-existing group. The estimator,
needed for its fast convergence, fails to implement the monotonicity assumption
regarding the model noise component, thereby potentially reversing the order if
the fixed effect coefficients. This and other numerical issues can be
conveniently tackled by my novel, data-driven way of normalising the fixed
effects, which proves to outperform a conventional standardisation in many
cases. I demonstrate that the normalised coefficients converge both at the same
rate and to the same limiting distribution as if the true error distribution
was known. The cost of semiparametric estimation is thus purely computational,
while the potential benefits are large whenever the errors have a strongly
convex or strongly concave distribution.
arXiv link: http://arxiv.org/abs/2408.04552v1
Robust Estimation of Regression Models with Potentially Endogenous Outliers via a Modern Optimization Lens
presence of potentially endogenous outliers. Through Monte Carlo simulations,
we demonstrate that existing $L_1$-regularized estimation methods, including
the Huber estimator and the least absolute deviation (LAD) estimator, exhibit
significant bias when outliers are endogenous. Motivated by this finding, we
investigate $L_0$-regularized estimation methods. We propose systematic
heuristic algorithms, notably an iterative hard-thresholding algorithm and a
local combinatorial search refinement, to solve the combinatorial optimization
problem of the \(L_0\)-regularized estimation efficiently. Our Monte Carlo
simulations yield two key results: (i) The local combinatorial search algorithm
substantially improves solution quality compared to the initial
projection-based hard-thresholding algorithm while offering greater
computational efficiency than directly solving the mixed integer optimization
problem. (ii) The $L_0$-regularized estimator demonstrates superior performance
in terms of bias reduction, estimation accuracy, and out-of-sample prediction
errors compared to $L_1$-regularized alternatives. We illustrate the practical
value of our method through an empirical application to stock return
forecasting.
arXiv link: http://arxiv.org/abs/2408.03930v1
Robust Identification in Randomized Experiments with Noncompliance
identify causal effects of a policy. In the local average treatment effect
(LATE) framework, the IV estimand identifies the LATE under three main
assumptions: random assignment, exclusion restriction, and monotonicity.
However, these assumptions are often questionable in many applications, leading
some researchers to doubt the causal interpretation of the IV estimand. This
paper considers a robust identification of causal parameters in a randomized
experiment setting with noncompliance where the standard LATE assumptions could
be violated. We discuss identification under two sets of weaker assumptions:
random assignment and exclusion restriction (without monotonicity), and random
assignment and monotonicity (without exclusion restriction). We derive sharp
bounds on some causal parameters under these two sets of relaxed LATE
assumptions. Finally, we apply our method to revisit the random information
experiment conducted in Bursztyn, Gonz\'alez, and Yanagizawa-Drott (2020) and
find that the standard LATE assumptions are jointly incompatible in this
application. We then estimate the robust identified sets under the two sets of
relaxed assumptions.
arXiv link: http://arxiv.org/abs/2408.03530v4
Efficient Asymmetric Causality Tests
scientific fields. This approach corresponds better to reality since logical
reasons behind asymmetric behavior exist and need to be considered in empirical
investigations. Hatemi-J (2012) introduced the asymmetric causality tests via
partial cumulative sums for positive and negative components of the variables
operating within the vector autoregressive (VAR) model. However, since the
residuals across the equations in the VAR model are not independent, the
ordinary least squares method for estimating the parameters is not efficient.
Additionally, asymmetric causality tests mean having different causal
parameters (i.e., for positive or negative components), thus, it is crucial to
assess not only if these causal parameters are individually statistically
significant, but also if their difference is statistically significant.
Consequently, tests of difference between estimated causal parameters should
explicitly be conducted, which are neglected in the existing literature. The
purpose of the current paper is to deal with these issues explicitly. An
application is provided, and ten different hypotheses pertinent to the
asymmetric causal interaction between two largest financial markets worldwide
are efficiently tested within a multivariate setting.
arXiv link: http://arxiv.org/abs/2408.03137v4
A nonparametric test for diurnal variation in spot correlation processes
measured by their spot correlation estimated from high-frequency data, exhibits
a pronounced upward-sloping and almost piecewise linear relationship at the
intraday horizon. There is notably lower-on average less positive-correlation
in the morning than in the afternoon. We develop a nonparametric testing
procedure to detect such deterministic variation in a correlation process. The
test statistic has a known distribution under the null hypothesis, whereas it
diverges under the alternative. It is robust against stochastic correlation. We
run a Monte Carlo simulation to discover the finite sample properties of the
test statistic, which are close to the large sample predictions, even for small
sample sizes and realistic levels of diurnal variation. In an application, we
implement the test on a monthly basis for a high-frequency dataset covering the
stock market over an extended period. The test leads to rejection of the null
most of the time. This suggests diurnal variation in the correlation process is
a nontrivial effect in practice.
arXiv link: http://arxiv.org/abs/2408.02757v1
Testing identifying assumptions in Tobit Models
models' identifying assumptions: linear index specification, (joint) normality
of latent errors, and treatment (instrument) exogeneity and relevance. The new
sharp testable equalities can detect all possible observable violations of the
identifying conditions. We propose a testing procedure for the model's validity
using existing inference methods for intersection bounds. Simulation results
suggests proper size for large samples and that the test is powerful to detect
large violation of the exogeneity assumption and violations in the error
structure. Finally, we review and propose new alternative paths to partially
identify the parameters of interest under less restrictive assumptions.
arXiv link: http://arxiv.org/abs/2408.02573v2
Kullback-Leibler-based characterizations of score-driven updates
last decade. Much of this literature cites the optimality result in Blasques et
al. (2015), which, roughly, states that sufficiently small score-driven updates
are unique in locally reducing the Kullback-Leibler divergence relative to the
true density for every observation. This is at odds with other well-known
optimality results; the Kalman filter, for example, is optimal in a
mean-squared-error sense, but occasionally moves away from the true state. We
show that score-driven updates are, similarly, not guaranteed to improve the
localized Kullback-Leibler divergence at every observation. The seemingly
stronger result in Blasques et al. (2015) is due to their use of an improper
(localized) scoring rule. Even as a guaranteed improvement for every
observation is unattainable, we prove that sufficiently small score-driven
updates are unique in reducing the Kullback-Leibler divergence relative to the
true density in expectation. This positive, albeit weaker, result justifies the
continued use of score-driven models and places their information-theoretic
properties on solid footing.
arXiv link: http://arxiv.org/abs/2408.02391v2
Analysis of Factors Affecting the Entry of Foreign Direct Investment into Indonesia (Case Study of Three Industrial Sectors in Indonesia)
Rp1,207.2 trillion. The largest FDI investment realization by sector was led by
the Basic Metal, Metal Goods, Non-Machinery, and Equipment Industry sector,
followed by the Mining sector and the Electricity, Gas, and Water sector. The
uneven amount of FDI investment realization in each industry and the impact of
the COVID-19 pandemic in Indonesia are the main issues addressed in this study.
This study aims to identify the factors that influence the entry of FDI into
industries in Indonesia and measure the extent of these factors' influence on
the entry of FDI. In this study, classical assumption tests and hypothesis
tests are conducted to investigate whether the research model is robust enough
to provide strategic options nationally. Moreover, this study uses the ordinary
least squares (OLS) method. The results show that the electricity factor does
not influence FDI inflows in the three industries. The Human Development Index
(HDI) factor has a significant negative effect on FDI in the Mining Industry
and a significant positive effect on FDI in the Basic Metal, Metal Goods,
Non-Machinery, and Equipment Industries. However, HDI does not influence FDI in
the Electricity, Gas, and Water Industries in Indonesia.
arXiv link: http://arxiv.org/abs/2408.01985v1
Distributional Difference-in-Differences Models with Multiple Time Periods
entire (or specific parts of the) distribution of the outcome of interest. In
this paper, I provide a method to recover the whole distribution of the
untreated potential outcome for the treated group in non-experimental settings
with staggered treatment adoption by generalizing the existing quantile
treatment effects on the treated (QTT) estimator proposed by Callaway and Li
(2019). Besides the QTT, I consider different approaches that anonymously
summarize the quantiles of the distribution of the outcome of interest (such as
tests for stochastic dominance rankings) without relying on rank invariance
assumptions. The finite-sample properties of the estimator proposed are
analyzed via different Monte Carlo simulations. Despite being slightly biased
for relatively small sample sizes, the proposed method's performance increases
substantially when the sample size increases.
arXiv link: http://arxiv.org/abs/2408.01208v2
Distilling interpretable causal trees from causal forests
promise greater flexibility than existing methods that test a few pre-specified
hypotheses. However, one problem these methods can have is that it can be
challenging to extract insights from complicated machine learning models. A
high-dimensional distribution of conditional average treatment effects may give
accurate, individual-level estimates, but it can be hard to understand the
underlying patterns; hard to know what the implications of the analysis are.
This paper proposes the Distilled Causal Tree, a method for distilling a
single, interpretable causal tree from a causal forest. This compares well to
existing methods of extracting a single tree, particularly in noisy data or
high-dimensional data where there are many correlated features. Here it even
outperforms the base causal forest in most simulations. Its estimates are
doubly robust and asymptotically normal just as those of the causal forest are.
arXiv link: http://arxiv.org/abs/2408.01023v1
Application of Superconducting Technology in the Electricity Industry: A Game-Theoretic Analysis of Government Subsidy Policies and Power Company Equipment Upgrade Decisions
developed by a Korean research team, on the power equipment industry. Using
evolutionary game theory, the interactions between governmental subsidies and
technology adoption by power companies are modeled. A key innovation of this
research is the introduction of sensitivity analyses concerning time delays and
initial subsidy amounts, which significantly influence the strategic decisions
of both government and corporate entities. The findings indicate that these
factors are critical in determining the rate of technology adoption and the
efficiency of the market as a whole. Due to existing data limitations, the
study offers a broad overview of likely trends and recommends the inclusion of
real-world data for more precise modeling once the material demonstrates
room-temperature superconducting characteristics. The research contributes
foundational insights valuable for future policy design and has significant
implications for advancing the understanding of technology adoption and market
dynamics.
arXiv link: http://arxiv.org/abs/2408.01017v1
Identification and Inference for Synthetic Control Methods with Spillover Effects: Estimating the Economic Cost of the Sudan Split
panel data, particularly when there are few treated units. SCM assumes the
stable unit treatment value assumption (SUTVA), which posits that potential
outcomes are unaffected by the treatment status of other units. However,
interventions often impact not only treated units but also untreated units,
known as spillover effects. This study introduces a novel panel data method
that extends SCM to allow for spillover effects and estimate both treatment and
spillover effects. This method leverages a spatial autoregressive panel data
model to account for spillover effects. We also propose Bayesian inference
methods using Bayesian horseshoe priors for regularization. We apply the
proposed method to two empirical studies: evaluating the effect of the
California tobacco tax on consumption and estimating the economic impact of the
2011 division of Sudan on GDP per capita.
arXiv link: http://arxiv.org/abs/2408.00291v2
Methodological Foundations of Modern Causal Inference in Social Science Research
(modern) causal inference methods to address the causal estimand with
observational/survey data that have been or will be used in social science
research. Mainly, this paper is divided into two parts: inference from
statistical estimand for the causal estimand, in which we reviewed the
assumptions for causal identification and the methodological strategies
addressing the problems if some of the assumptions are violated. We also
discuss the asymptotical analysis concerning the measure from the observational
data to the theoretical measure and replicate the deduction of the
efficient/doubly robust average treatment effect estimator, which is commonly
used in current social science analysis.
arXiv link: http://arxiv.org/abs/2408.00032v1
Potential weights and implicit causal designs in linear regression
quasi-experimental treatment variation, what do we mean? This paper
characterizes the necessary implications when linear regressions are
interpreted causally. A minimal requirement for causal interpretation is that
the regression estimates some contrast of individual potential outcomes under
the true treatment assignment process. This requirement implies linear
restrictions on the true distribution of treatment. Solving these linear
restrictions leads to a set of implicit designs. Implicit designs are plausible
candidates for the true design if the regression were to be causal. The
implicit designs serve as a framework that unifies and extends existing
theoretical results across starkly distinct settings (including multiple
treatment, panel, and instrumental variables). They lead to new theoretical
insights for widely used but less understood specifications.
arXiv link: http://arxiv.org/abs/2407.21119v3
On the power properties of inference for parameters with interval identified sets
partially-identified parameter of interest with an interval identified set. We
assume the researcher has bounds estimators to construct the CIs proposed by
Stoye (2009), referred to as CI1, CI2, and CI3. We also assume that these
estimators are "ordered": the lower bound estimator is less than or equal to
the upper bound estimator.
Under these conditions, we establish two results. First, we show that CI1 and
CI2 are equally powerful, and both dominate CI3. Second, we consider a
favorable situation in which there are two possible bounds estimators to
construct these CIs, and one is more efficient than the other. One would expect
that the more efficient bounds estimator yields more powerful inference. We
prove that this desirable result holds for CI1 and CI2, but not necessarily for
CI3.
arXiv link: http://arxiv.org/abs/2407.20386v2
Testing for the Asymmetric Optimal Hedge Ratios: With an Application to Bitcoin
institutions, and corporations. Since the pioneering contribution of Johnson
(1960), the optimal hedge ratio based on futures is regularly utilized. The
current paper suggests an explicit and efficient method for testing the null
hypothesis of a symmetric optimal hedge ratio against an asymmetric alternative
one within a multivariate setting. If the null is rejected, the position
dependent optimal hedge ratios can be estimated via the suggested model. This
approach is expected to enhance the accuracy of the implemented hedging
strategies compared to the standard methods since it accounts for the fact that
the source of risk depends on whether the investor is a buyer or a seller of
the risky asset. An application is provided using spot and futures prices of
Bitcoin. The results strongly support the view that the optimal hedge ratio for
this cryptocurrency is position dependent. The investor that is long in Bitcoin
has a much higher conditional optimal hedge ratio compared to the one that is
short in the asset. The difference between the two conditional optimal hedge
ratios is statistically significant, which has important repercussions for
implementing risk management strategies.
arXiv link: http://arxiv.org/abs/2407.19932v2
Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality
treatments on the short-term outcomes has been well understood and become the
golden standard in industrial practice. However, as service systems become
increasingly dynamical and personalized, much focus is shifting toward
maximizing long-term outcomes, such as customer lifetime value, through
lifetime exposure to interventions. Our goal is to assess the impact of
treatment and control policies on long-term outcomes from relatively short-term
observations, such as those generated by A/B testing. A key managerial
observation is that many practical treatments are local, affecting only
targeted states while leaving other parts of the policy unchanged. This paper
rigorously investigates whether and how such locality can be exploited to
improve estimation of long-term effects in Markov Decision Processes (MDPs), a
fundamental model of dynamic systems. We first develop optimal inference
techniques for general A/B testing in MDPs and establish corresponding
efficiency bounds. We then propose methods to harness the localized structure
by sharing information on the non-targeted states. Our new estimator can
achieve a linear reduction with the number of test arms for a major part of the
variance without sacrificing unbiasedness. It also matches a tighter variance
lower bound that accounts for locality. Furthermore, we extend our framework to
a broad class of differentiable estimators, which encompasses many widely used
approaches in practice. We show that all such estimators can benefit from
variance reduction through information sharing without increasing their bias.
Together, these results provide both theoretical foundations and practical
tools for conducting efficient experiments in dynamic service systems with
local treatments.
arXiv link: http://arxiv.org/abs/2407.19618v3
Heterogeneous Grouping Structures in Panel Data
panels with latent grouping structure. The assumption of within group
homogeneity is prevalent in this literature, implying that the formation of
groups alleviates cross-sectional heterogeneity, regardless of the prior
knowledge of groups. While the latter hypothesis makes inference powerful, it
can be often restrictive. We allow for models with richer heterogeneity that
can be found both in the cross-section and within a group, without imposing the
simple assumption that all groups must be heterogeneous. We further contribute
to the method proposed by su2016identifying, by showing that the model
parameters can be consistently estimated and the groups, while unknown, can be
identifiable in the presence of different types of heterogeneity. Within the
same framework we consider the validity of assuming both cross-sectional and
within group homogeneity, using testing procedures. Simulations demonstrate
good finite-sample performance of the approach in both classification and
estimation, while empirical applications across several datasets provide
evidence of multiple clusters, as well as reject the hypothesis of within group
homogeneity.
arXiv link: http://arxiv.org/abs/2407.19509v1
Using Total Margin of Error to Account for Non-Sampling Error in Election Polls: The Case of Nonresponse
but measurement has focused on the margin of sampling error. Survey
statisticians have long recommended measurement of total survey error by mean
square error (MSE), which jointly measures sampling and non-sampling errors. We
think it reasonable to use the square root of maximum MSE to measure the total
margin of error (TME). Measurement of TME should encompass both sampling error
and all forms of non-sampling error. We suggest that measurement of TME should
be a standard feature in the reporting of polls. To provide a clear
illustration, and because we believe the exceedingly low response rates
commonly obtained by election polls to be a particularly worrisome source of
potential error, we demonstrate how to measure the potential impact of
nonresponse using the concept of TME. We first show how to measure TME when a
pollster lacks any knowledge of the candidate preferences of nonrespondents. We
then extend the analysis to settings where the pollster has partial knowledge
that bounds the preferences of non-respondents. In each setting, we derive a
simple poll estimate that approximately minimizes TME, a midpoint estimate, and
compare it to a conventional poll estimate.
arXiv link: http://arxiv.org/abs/2407.19339v3
Starting Small: Prioritizing Safety over Efficacy in Randomized Experiments Using the Exact Finite Sample Likelihood
answer questions of “why?” and “what should you have done?” using data from
randomized experiments and a utility function that prioritizes safety over
efficacy. We propose a finite sample Bayesian decision rule and a finite sample
maximum likelihood decision rule. We show that in finite samples from 2 to 50,
it is possible for these rules to achieve better performance according to
established maximin and maximum regret criteria than a rule based on the
Boole-Frechet-Hoeffding bounds. We also propose a finite sample maximum
likelihood criterion. We apply our rules and criterion to an actual clinical
trial that yielded a promising estimate of efficacy, and our results point to
safety as a reason for why results were mixed in subsequent trials.
arXiv link: http://arxiv.org/abs/2407.18206v1
Enhanced power enhancements for testing many moment equalities: Beyond the $2$- and $\infty$-norm
high-dimensional. Tests based on the $2$- and $\infty$-norm have received
considerable attention in such settings, as they are powerful against dense and
sparse alternatives, respectively. The power enhancement principle of Fan et
al. (2015) combines these two norms to construct improved tests that are
powerful against both types of alternatives. In the context of testing whether
a candidate parameter satisfies a large number of moment equalities, we
construct a test that harnesses the strength of all $p$-norms with $p\in[2,
\infty]$. As a result, this test is consistent against strictly more
alternatives than any test based on a single $p$-norm. In particular, our test
is consistent against more alternatives than tests based on the $2$- and
$\infty$-norm, which is what most implementations of the power enhancement
principle target.
We illustrate the scope of our general results by using them to construct a
test that simultaneously dominates the Anderson-Rubin test (based on $p=2$),
tests based on the $\infty$-norm and power enhancement based combinations of
these in terms of consistency in the linear instrumental variable model with
many instruments.
arXiv link: http://arxiv.org/abs/2407.17888v2
Formalising causal inference as prediction on a target population
sciences is the potential outcomes framework due to Neyman and Rubin. In this
framework, observations are thought to be drawn from a distribution over
variables of interest, and the goal is to identify parameters of this
distribution. Even though the stated goal is often to inform decision making on
some target population, there is no straightforward way to include these target
populations in the framework. Instead of modelling the relationship between the
observed sample and the target population, the inductive assumptions in this
framework take the form of abstract sampling and independence assumptions. In
this paper, we develop a version of this framework that construes causal
inference as treatment-wise predictions for finite populations where all
assumptions are testable in retrospect; this means that one can not only test
predictions themselves (without any fundamental problem) but also investigate
sources of error when they fail. Due to close connections to the original
framework, established methods can still be be analysed under the new
framework.
arXiv link: http://arxiv.org/abs/2407.17385v3
Identification and inference of outcome conditioned partial effects of general interventions
to as the outcome conditioned partial policy effects (OCPPEs), to
measure the average effect of a general counterfactual intervention of
a target covariate on the individuals in different quantile ranges of the
outcome distribution.
The OCPPE approach is valuable in several aspects: (i) Unlike the
unconditional quantile partial effect (UQPE) that is not $n$-estimable,
an OCPPE is $n$-estimable. Analysts can use it to capture heterogeneity
across the unconditional distribution of $Y$ as well as obtain accurate
estimation of the aggregated effect at the upper and lower tails of $Y$. (ii)
The semiparametric efficiency bound for an OCPPE is explicitly derived. (iii)
We propose an efficient debiased estimator for OCPPE, and provide feasible
uniform inference procedures for the OCPPE process. (iv) The efficient doubly
robust score for an OCPPE can be used to optimize infinitesimal nudges to a
continuous treatment by maximizing a quantile specific Empirical Welfare
function. We illustrate the method by analyzing how anti-smoking policies
impact low percentiles of live infants' birthweights.
arXiv link: http://arxiv.org/abs/2407.16950v1
Bayesian modelling of VAR precision matrices using stochastic block networks
the autoregressive coefficients. Introducing shrinkage on the error covariance
matrix is sometimes done but, in the vast majority of cases, without
considering the network structure of the shocks and by placing the prior on the
lower Cholesky factor of the precision matrix. In this paper, we propose a
prior on the VAR error precision matrix directly. Our prior, which resembles a
standard spike and slab prior, models variable inclusion probabilities through
a stochastic block model that clusters shocks into groups. Within groups, the
probability of having relations across group members is higher (inducing less
sparsity) whereas relations across groups imply a lower probability that
members of each group are conditionally related. We show in simulations that
our approach recovers the true network structure well. Using a US macroeconomic
data set, we illustrate how our approach can be used to cluster shocks together
and that this feature leads to improved density forecasts.
arXiv link: http://arxiv.org/abs/2407.16349v1
Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction
distributional treatment effect parameters in randomized experiments.
Randomized experiments have been extensively used to estimate treatment effects
in various scientific fields. However, to gain deeper insights, it is essential
to estimate distributional treatment effects rather than relying solely on
average effects. Our approach incorporates pre-treatment covariates into a
distributional regression framework, utilizing machine learning techniques to
improve the precision of distributional treatment effect estimators. The
proposed approach can be readily implemented with off-the-shelf machine
learning methods and remains valid as long as the nuisance components are
reasonably well estimated. Also, we establish the asymptotic properties of the
proposed estimator and present a uniformly valid inference method. Through
simulation results and real data analysis, we demonstrate the effectiveness of
integrating machine learning techniques in reducing the variance of
distributional treatment effect estimators in finite samples.
arXiv link: http://arxiv.org/abs/2407.16037v1
Big Data Analytics-Enabled Dynamic Capabilities and Market Performance: Examining the Roles of Marketing Ambidexterity and Competitor Pressure
Data Analytics, explores the transformative effect of BDA EDCs on marketing.
Ambidexterity and firms market performance in the textile sector of Pakistans
cities. Specifically, focusing on the firms who directly deal with customers,
investigates the nuanced role of BDA EDCs in textile retail firms potential to
navigate market dynamics. Emphasizing the exploitation component of marketing
ambidexterity, the study investigated the mediating function of marketing
ambidexterity and the moderating influence of competitive pressure. Using a
survey questionnaire, the study targets key choice makers in textile firms of
Faisalabad, Chiniot and Lahore, Pakistan. The PLS-SEM model was employed as an
analytical technique, allows for a full examination of the complicated
relations between BDA EDCs, marketing ambidexterity, rival pressure, and market
performance. The study Predicting a positive impact of Big Data on marketing
ambidexterity, with a specific emphasis on exploitation. The study expects this
exploitation-orientated marketing ambidexterity to significantly enhance the
firms market performance. This research contributes to the existing literature
on dynamic capabilities-based frameworks from the perspective of the retail
segment of textile industry. The study emphasizes the role of BDA-EDCs in the
retail sector, imparting insights into the direct and indirect results of BDA
EDCs on market performance inside the retail area. The study s novelty lies in
its contextualization of BDA-EDCs in the textile zone of Faisalabad, Lahore and
Chiniot, providing a unique perspective on the effect of BDA on marketing
ambidexterity and market performance in firms. Methodologically, the study uses
numerous samples of retail sectors to make sure broader universality,
contributing realistic insights.
arXiv link: http://arxiv.org/abs/2407.15522v1
Nonlinear Binscatter Methods
the social, behavioral, and biomedical sciences. Available methods rely on a
quantile-based partitioning estimator of the conditional mean regression
function to primarily construct flexible yet interpretable visualization
methods, but they can also be used to estimate treatment effects, assess
uncertainty, and test substantive domain-specific hypotheses. This paper
introduces novel binscatter methods based on nonlinear, possibly nonsmooth
M-estimation methods, covering generalized linear, robust, and quantile
regression models. We provide a host of theoretical results and practical tools
for local constant estimation along with piecewise polynomial and spline
approximations, including (i) optimal tuning parameter (number of bins)
selection, (ii) confidence bands, and (iii) formal statistical tests regarding
functional form or shape restrictions. Our main results rely on novel strong
approximations for general partitioning-based estimators covering random,
data-driven partitions, which may be of independent interest. We demonstrate
our methods with an empirical application studying the relation between the
percentage of individuals without health insurance and per capita income at the
zip-code level. We provide general-purpose software packages implementing our
methods in Python, R, and Stata.
arXiv link: http://arxiv.org/abs/2407.15276v1
Weak-instrument-robust subvector inference in instrumental variables regression: A subvector Lagrange multiplier test and properties of subvector Anderson-Rubin confidence sets
instrumental variables regression. We show that it is asymptotically
size-correct under a technical condition. This is the first
weak-instrument-robust subvector test for instrumental variables regression to
recover the degrees of freedom of the commonly used non-weak-instrument-robust
Wald test. Additionally, we provide a closed-form solution for subvector
confidence sets obtained by inverting the subvector Anderson-Rubin test. We
show that they are centered around a k-class estimator. Also, we show that the
subvector confidence sets for single coefficients of the causal parameter are
jointly bounded if and only if Anderson's likelihood-ratio test rejects the
hypothesis that the first-stage regression parameter is of reduced rank, that
is, that the causal parameter is not identified. Finally, we show that if a
confidence set obtained by inverting the Anderson-Rubin test is bounded and
nonempty, it is equal to a Wald-based confidence set with a data-dependent
confidence level. We explicitly compute this Wald-based confidence test.
arXiv link: http://arxiv.org/abs/2407.15256v3
Leveraging Uniformization and Sparsity for Computation and Estimation of Continuous Time Dynamic Discrete Choice Games
computational advantages over discrete-time models. This paper addresses
remaining computational challenges to further improve both model solution and
maximum likelihood estimation. We establish convergence rates for value
iteration and policy evaluation with fixed beliefs, and develop
Newton-Kantorovich methods that exploit analytical Jacobians and sparse matrix
structure. We apply uniformization both to derive a new representation of the
value function that draws direct analogies to discrete-time models and to
enable stable computation of the matrix exponential and its parameter
derivatives for likelihood-based estimation with snapshot data. Critically,
these methods provide a complete chain of analytical derivatives from the
equilibrium value function through the log likelihood function, eliminating
numerical approximations in both model solution and estimation and improving
finite-sample statistical properties. Monte Carlo experiments demonstrate
substantial gains in computational time and estimator accuracy, enabling
estimation of richer models of strategic interaction.
arXiv link: http://arxiv.org/abs/2407.14914v2
Predicting the Distribution of Treatment Effects via Covariate-Adjustment, with an Application to Microcredit
average effects, but of the distribution of treatment effects. The inability to
observe individual counterfactuals makes answering these empirical questions
challenging. I propose an inference approach for points of the distribution of
treatment effects by incorporating predicted counterfactuals through covariate
adjustment. I provide finite-sample valid inference using sample-splitting, and
asymptotically valid inference using cross-fitting, under arguably weak
conditions. Revisiting five randomized controlled trials on microcredit that
reported null average effects, I find important distributional impacts, with
some individuals helped and others harmed by the increased credit access.
arXiv link: http://arxiv.org/abs/2407.14635v3
Spatially-clustered spatial autoregressive models with application to agricultural market concentration in Europe
regression models, namely, the spatially-clustered spatial autoregression
(SCSAR) model, to deal with spatial heterogeneity issues in clustering
procedures. In particular, we extend classical spatial econometrics models,
such as the spatial autoregressive model, the spatial error model, and the
spatially-lagged model, by allowing the regression coefficients to be spatially
varying according to a cluster-wise structure. Cluster memberships and
regression coefficients are jointly estimated through a penalized maximum
likelihood algorithm which encourages neighboring units to belong to the same
spatial cluster with shared regression coefficients. Motivated by the increase
of observed values of the Gini index for the agricultural production in Europe
between 2010 and 2020, the proposed methodology is employed to assess the
presence of local spatial spillovers on the market concentration index for the
European regions in the last decade. Empirical findings support the hypothesis
of fragmentation of the European agricultural market, as the regions can be
well represented by a clustering structure partitioning the continent into
three-groups, roughly approximated by a division among Western, North Central
and Southeastern regions. Also, we detect heterogeneous local effects induced
by the selected explanatory variables on the regional market concentration. In
particular, we find that variables associated with social, territorial and
economic relevance of the agricultural sector seem to act differently
throughout the spatial dimension, across the clusters and with respect to the
pooled model, and temporal dimension.
arXiv link: http://arxiv.org/abs/2407.15874v1
Regression Adjustment for Estimating Distributional Treatment Effects in Randomized Controlled Trials
distributional treatment effects in randomized experiments. The distributional
treatment effect provides a more comprehensive understanding of treatment
heterogeneity compared to average treatment effects. We propose a regression
adjustment method that utilizes distributional regression and pre-treatment
information, establishing theoretical efficiency gains without imposing
restrictive distributional assumptions. We develop a practical inferential
framework and demonstrate its advantages through extensive simulations.
Analyzing water conservation policies, our method reveals that behavioral
nudges systematically shift consumption from high to moderate levels. Examining
health insurance coverage, we show the treatment reduces the probability of
zero doctor visits by 6.6 percentage points while increasing the likelihood of
3-6 visits. In both applications, our regression adjustment method
substantially improves precision and identifies treatment effects that were
statistically insignificant under conventional approaches.
arXiv link: http://arxiv.org/abs/2407.14074v2
Revisiting Randomization with the Cube Method
method, which achieves near-exact covariate balance. This ensures compliance
with standard balance tests and allows for balancing on many covariates,
enabling more precise estimation of treatment effects using pre-experimental
information. We derive theoretical bounds on imbalance as functions of sample
size and covariate dimension, and establish consistency and asymptotic
normality of the resulting estimators. Simulations show substantial
improvements in precision and covariate balance over existing methods,
particularly when the number of covariates is large.
arXiv link: http://arxiv.org/abs/2407.13613v3
Conduct Parameter Estimation in Homogeneous Goods Markets with Equilibrium Existence and Uniqueness Conditions: The Case of Log-linear Specification
incorporating theoretical conditions for the unique existence of equilibrium
prices for estimating conduct parameters in a log-linear model with homogeneous
goods markets. First, we derive such conditions. Second, Monte Carlo
simulations confirm that in a log-linear model, incorporating the conditions
resolves the problems of implausibly low or negative values of conduct
parameters.
arXiv link: http://arxiv.org/abs/2407.12422v1
Factorial Difference-in-Differences
that extends the canonical difference-in-differences (DID) to settings without
clean controls. Such situations often arise when researchers exploit
cross-sectional variation in a baseline factor and temporal variation in an
event affecting all units. In these applications, the exact estimand is often
unspecified and justification for using the DID estimator is unclear. We
formalize FDID by characterizing its data structure, target parameters, and
identifying assumptions. Framing FDID as a factorial design with two factors --
the baseline factor G and the exposure level Z, we define effect modification
and causal moderation as the associative and causal effects of G on the effect
of Z. Under standard DID assumptions, including no anticipation and parallel
trends, the DID estimator identifies effect modification but not causal
moderation. To identify the latter, we propose an additional factorial parallel
trends assumption. We also show that the canonical DID is a special case of
FDID under an exclusion restriction. We extend the framework to conditionally
valid assumptions and clarify regression-based implementations. We then discuss
extensions to repeated cross-sectional data and continuous G. We illustrate the
approach with an empirical example on the role of social capital in famine
relief in China.
arXiv link: http://arxiv.org/abs/2407.11937v4
Nowcasting R&D Expenditures: A Machine Learning Approach
driving policy. However, traditional data acquisition processes are slow,
subject to delays, and performed at a low frequency. We address this
'ragged-edge' problem with a two-step framework. The first step is a supervised
learning model predicting observed low-frequency figures. We propose a
neural-network-based nowcasting model that exploits mixed-frequency,
high-dimensional data. The second step uses the elasticities derived from the
previous step to interpolate unobserved high-frequency figures. We apply our
method to nowcast countries' yearly research and development (R&D) expenditure
series. These series are collected through infrequent surveys, making them
ideal candidates for this task. We exploit a range of predictors, chiefly
Internet search volume data, and document the relevance of these data in
improving out-of-sample predictions. Furthermore, we leverage the high
frequency of our data to derive monthly estimates of R&D expenditures, which
are currently unobserved. We compare our results with those obtained from the
classical regression-based and the sparse temporal disaggregation methods.
Finally, we validate our results by reporting a strong correlation with monthly
R&D employment data.
arXiv link: http://arxiv.org/abs/2407.11765v1
A nonparametric test for rough volatility
follows a standard semimartingale process, with paths of finite quadratic
variation, or a rough process with paths of infinite quadratic variation. The
test utilizes the fact that volatility is rough if and only if volatility
increments are negatively autocorrelated at high frequencies. It is based on
the sample autocovariance of increments of spot volatility estimates computed
from high-frequency asset return data. By showing a feasible CLT for this
statistic under the null hypothesis of semimartingale volatility paths, we
construct a test with fixed asymptotic size and an asymptotic power equal to
one. The test is derived under very general conditions for the data-generating
process. In particular, it is robust to jumps with arbitrary activity and to
the presence of market microstructure noise. In an application of the test to
SPY high-frequency data, we find evidence for rough volatility.
arXiv link: http://arxiv.org/abs/2407.10659v1
The Dynamic, the Static, and the Weak: Factor models and the analysis of high-dimensional time series
models are reviewed and discussed: dynamic versus static loadings, rate-strong
versus rate-weak factors, the concept of weakly common component recently
introduced by Gersing et al. (2023), the irrelevance of cross-sectional
ordering and the assumption of cross-sectional exchangeability, the impact of
undetected strong factors, and the problem of combining common and
idiosyncratic forecasts. Conclusions all point to the advantages of the General
Dynamic Factor Model approach of Forni et al. (2000) over the widely used
Static Approximate Factor Model introduced by Chamberlain and Rothschild
(1983).
arXiv link: http://arxiv.org/abs/2407.10653v3
Reinforcement Learning in High-frequency Market Making
application of reinforcement learning (RL) in high-frequency market making. We
bridge the modern RL theory and the continuous-time statistical models in
high-frequency financial economics. Different with most existing literature on
methodological research about developing various RL methods for market making
problem, our work is a pilot to provide the theoretical analysis. We target the
effects of sampling frequency, and find an interesting tradeoff between error
and complexity of RL algorithm when tweaking the values of the time increment
$\Delta$ $-$ as $\Delta$ becomes smaller, the error will be smaller but the
complexity will be larger. We also study the two-player case under the
general-sum game framework and establish the convergence of Nash equilibrium to
the continuous-time game equilibrium as $\Delta\rightarrow0$. The Nash
Q-learning algorithm, which is an online multi-agent RL method, is applied to
solve the equilibrium. Our theories are not only useful for practitioners to
choose the sampling frequency, but also very general and applicable to other
high-frequency financial decision making problems, e.g., optimal executions, as
long as the time-discretization of a continuous-time markov decision process is
adopted. Monte Carlo simulation evidence support all of our theories.
arXiv link: http://arxiv.org/abs/2407.21025v2
Low Volatility Stock Portfolio Through High Dimensional Bayesian Cointegration
estimation to construct low volatility portfolios from a large number of
stocks. The proposed Bayesian framework effectively identifies sparse and
important cointegration relationships amongst large baskets of stocks across
various asset spaces, resulting in portfolios with reduced volatility. Such
cointegration relationships persist well over the out-of-sample testing time,
providing practical benefits in portfolio construction and optimization.
Further studies on drawdown and volatility minimization also highlight the
benefits of including cointegrated portfolios as risk management instruments.
arXiv link: http://arxiv.org/abs/2407.10175v1
Estimation of Integrated Volatility Functionals with Kernel Spot Volatility Estimators
estimating integrated volatility functionals. Jacod and Rosenbaum (2013)
studied a plug-in type of estimator based on a Riemann sum approximation of the
integrated functional and a spot volatility estimator with a forward uniform
kernel. Motivated by recent results that show that spot volatility estimators
with general two-side kernels of unbounded support are more accurate, in this
paper, an estimator using a general kernel spot volatility estimator as the
plug-in is considered. A biased central limit theorem for estimating the
integrated functional is established with an optimal convergence rate. Unbiased
central limit theorems for estimators with proper de-biasing terms are also
obtained both at the optimal convergence regime for the bandwidth and when
applying undersmoothing. Our results show that one can significantly reduce the
estimator's bias by adopting a general kernel instead of the standard uniform
kernel. Our proposed bias-corrected estimators are found to maintain remarkable
robustness against bandwidth selection in a variety of sampling frequencies and
functions.
arXiv link: http://arxiv.org/abs/2407.09759v3
Sparse Asymptotic PCA: Identifying Sparse Latent Factors Across Time Horizon in High-Dimensional Time Series
sparse asymptotic Principal Component Analysis (APCA) to analyze the
co-movements of high-dimensional panel data over time. Unlike existing methods
based on sparse PCA, which assume sparsity in the loading matrices, our
approach posits sparsity in the factor processes while allowing non-sparse
loadings. This is motivated by the fact that financial returns typically
exhibit universal and non-sparse exposure to market factors. Unlike the
commonly used $\ell_1$-relaxation in sparse PCA, the proposed sparse APCA
employs a truncated power method to estimate the leading sparse factor and a
sequential deflation method for multi-factor cases under $\ell_0$-constraints.
Furthermore, we develop a data-driven approach to identify the sparsity of risk
factors over the time horizon using a novel cross-sectional cross-validation
method. We establish the consistency of our estimators under mild conditions as
both the dimension $N$ and the sample size $T$ grow. Monte Carlo simulations
demonstrate that the proposed method performs well in finite samples.
Empirically, we apply our method to daily S&P 500 stock returns (2004--2016)
and identify nine risk factors influencing the stock market.
arXiv link: http://arxiv.org/abs/2407.09738v3
Regularizing stock return covariance matrices via multiple testing of correlations
of stock return covariance matrices. The framework allows for the presence of
heavy tails and multivariate GARCH-type effects of unknown form among the stock
returns. The approach involves simultaneous testing of all pairwise
correlations, followed by setting non-statistically significant elements to
zero. This adaptive thresholding is achieved through sign-based Monte Carlo
resampling within multiple testing procedures, controlling either the
traditional familywise error rate, a generalized familywise error rate, or the
false discovery proportion. Subsequent shrinkage ensures that the final
covariance matrix estimate is positive definite and well-conditioned while
preserving the achieved sparsity. Compared to alternative estimators, this new
regularization method demonstrates strong performance in simulation experiments
and real portfolio optimization.
arXiv link: http://arxiv.org/abs/2407.09696v1
An Introduction to Permutation Processes (version 0.5)
Department of Statistics at the University of Washington, Seattle. They
comprise the first eight chapters of a book currently in progress.
arXiv link: http://arxiv.org/abs/2407.09664v1
Computationally Efficient Estimation of Large Probit Models
disciplines, including consumer choice data in economics and marketing.
However, the Gaussian latent variable feature of probit models coupled with
identification constraints pose significant computational challenges for its
estimation and inference, especially when the dimension of the discrete
response variable is large. In this paper, we propose a computationally
efficient Expectation-Maximization (EM) algorithm for estimating large probit
models. Our work is distinct from existing methods in two important aspects.
First, instead of simulation or sampling methods, we apply and customize
expectation propagation (EP), a deterministic method originally proposed for
approximate Bayesian inference, to estimate moments of the truncated
multivariate normal (TMVN) in the E (expectation) step. Second, we take
advantage of a symmetric identification condition to transform the constrained
optimization problem in the M (maximization) step into a one-dimensional
problem, which is solved efficiently using Newton's method instead of
off-the-shelf solvers. Our method enables the analysis of correlated choice
data in the presence of more than 100 alternatives, which is a reasonable size
in modern applications, such as online shopping and booking platforms, but has
been difficult in practice with probit models. We apply our probit estimation
method to study ordering effects in hotel search results on Expedia's online
booking platform.
arXiv link: http://arxiv.org/abs/2407.09371v2
An Introduction to Causal Discovery
assessing the impact of predefined treatments (or interventions) on predefined
outcomes, such as the effect of education programs on earnings. Causal
discovery, in contrast, aims to uncover causal relationships among multiple
variables in a data-driven manner, by investigating statistical associations
rather than relying on predefined causal structures. This approach, more common
in computer science, seeks to understand causality in an entire system of
variables, which can be visualized by causal graphs. This survey provides an
introduction to key concepts, algorithms, and applications of causal discovery
from the perspectives of economics and social sciences. It covers fundamental
concepts like d-separation, causal faithfulness, and Markov equivalence,
sketches various algorithms for causal discovery, and discusses the back-door
and front-door criteria for identifying causal effects. The survey concludes
with more specific examples of causal discovery, e.g. for learning all
variables that directly affect an outcome of interest and/or testing
identification of causal effects in observational data.
arXiv link: http://arxiv.org/abs/2407.08602v1
Comparative analysis of Mixed-Data Sampling (MIDAS) model compared to Lag-Llama model for inflation nowcasting
both public institutions and private agents. This study compares the
performance of a traditional econometric model, Mixed Data Sampling regression,
with one of the newest developments from the field of Artificial Intelligence,
a foundational time series forecasting model based on a Long short-term memory
neural network called Lag-Llama, in their ability to nowcast the Harmonized
Index of Consumer Prices in the Euro area. Two models were compared and
assessed whether the Lag-Llama can outperform the MIDAS regression, ensuring
that the MIDAS regression is evaluated under the best-case scenario using a
dataset spanning from 2010 to 2022. The following metrics were used to evaluate
the models: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE),
Mean Squared Error (MSE), correlation with the target, R-squared and adjusted
R-squared. The results show better performance of the pre-trained Lag-Llama
across all metrics.
arXiv link: http://arxiv.org/abs/2407.08510v1
Production function estimation using subjective expectations data
(1996) tradition require assumptions on input choices. We introduce a new
method that exploits (increasingly available) data on a firm's expectations of
its future output and inputs that allows us to obtain consistent production
function parameter estimates while relaxing these input demand assumptions. In
contrast to dynamic panel methods, our proposed estimator can be implemented on
very short panels (including a single cross-section), and Monte Carlo
simulations show it outperforms alternative estimators when firms' material
input choices are subject to optimization error. Implementing a range of
production function estimators on UK data, we find our proposed estimator
yields results that are either similar to or more credible than commonly-used
alternatives. These differences are larger in industries where material inputs
appear harder to optimize. We show that TFP implied by our proposed estimator
is more strongly associated with future jobs growth than existing methods,
suggesting that failing to adequately account for input endogeneity may
underestimate the degree of dynamic reallocation in the economy.
arXiv link: http://arxiv.org/abs/2407.07988v1
Reduced-Rank Matrix Autoregressive Models: A Medium $N$ Approach
within economic time series. However, this task becomes challenging when we
observe matrix-valued time series, where each dimension may have a different
co-movement structure. We propose reduced-rank regressions with a tensor
structure for the coefficient matrix to provide new insights into co-movements
within and between the dimensions of matrix-valued time series. Moreover, we
relate the co-movement structures to two commonly used reduced-rank models,
namely the serial correlation common feature and the index model. Two empirical
applications involving U.S.\ states and economic indicators for the Eurozone
and North American countries illustrate how our new tools identify
co-movements.
arXiv link: http://arxiv.org/abs/2407.07973v1
R. A. Fisher's Exact Test Revisited
(1935) pioneering exact test in the context of the Lady Testing Tea experiment.
It unveils a critical implicit assumption in Fisher's calibration: the taster
minimizes expected misclassification given fixed probabilistic information.
Without similar assumptions or an explicit alternative hypothesis, the
rationale behind Fisher's specification of the rejection region remains
unclear.
arXiv link: http://arxiv.org/abs/2407.07251v1
The Hidden Subsidy of the Affordable Care Act
medical costs of newly eligible Medicaid enrollees than previously eligible
ones. States could save up to 100% of their per-enrollee costs by reclassifying
original enrollees into the newly eligible group. We examine whether this
fiscal incentive changed states' enrollment practices. We find that Medicaid
expansion caused large declines in the number of beneficiaries enrolled in the
original Medicaid population, suggesting widespread reclassifications. In 2019
alone, this phenomenon affected 4.4 million Medicaid enrollees at a federal
cost of $8.3 billion. Our results imply that reclassifications inflated the
federal cost of Medicaid expansion by 18.2%.
arXiv link: http://arxiv.org/abs/2407.07217v1
Dealing with idiosyncratic cross-correlation when constructing confidence regions for PC factors
asymptotic covariance matrix of the Principal Components (PC) factors valid in
the presence of cross-correlated idiosyncratic components. The proposed
estimator of the asymptotic Mean Square Error (MSE) of PC factors is based on
adaptive thresholding the sample covariances of the id iosyncratic residuals
with the threshold based on their individual variances. We compare the nite
sample performance of condence regions for the PC factors obtained using the
proposed asymptotic MSE with those of available extant asymptotic and bootstrap
regions and show that the former beats all alternative procedures for a wide
variety of idiosyncratic cross-correlation structures.
arXiv link: http://arxiv.org/abs/2407.06883v1
Causes and Electoral Consequences of Political Assassinations: The Role of Organized Crime in Mexico
candidates and mayors. This article argues that these killings are largely
driven by organized crime, aiming to influence candidate selection, control
local governments for rent-seeking, and retaliate against government
crackdowns. Using a new dataset of political assassinations in Mexico from 2000
to 2021 and instrumental variables, we address endogeneity concerns in the
location and timing of government crackdowns. Our instruments include
historical Chinese immigration patterns linked to opium cultivation in Mexico,
local corn prices, and U.S. illicit drug prices. The findings reveal that
candidates in municipalities near oil pipelines face an increased risk of
assassination due to drug trafficking organizations expanding into oil theft,
particularly during elections and fuel price hikes. Government arrests or
killings of organized crime members trigger retaliatory violence, further
endangering incumbent mayors. This political violence has a negligible impact
on voter turnout, as it targets politicians rather than voters. However, voter
turnout increases in areas where authorities disrupt drug smuggling, raising
the chances of the local party being re-elected. These results offer new
insights into how criminal groups attempt to capture local governments and the
implications for democracy under criminal governance.
arXiv link: http://arxiv.org/abs/2407.06733v1
Femicide Laws, Unilateral Divorce, and Abortion Decriminalization Fail to Stop Women from Being Killed in Mexico
gender-based killings of women, a major cause of premature female mortality.
Focusing on Mexico, a pioneer in adopting such legislation, the paper exploits
variations in the enactment of femicide laws and prison sentences across
states. Using the difference-in-differences estimator, the analysis reveals
femicide laws have not impacted femicides, homicides, disappearances, or
suicides of women. Results remain robust when considering differences in prison
sentencing, states introducing unilateral divorce, equitable divorce asset
compensation, or decriminalizing abortion. Findings also hold with synthetic
matching, suggesting laws are insufficient to combat gender-based violence in
contexts of impunity.
arXiv link: http://arxiv.org/abs/2407.06722v2
Conditional Rank-Rank Regression
capturing the relationship between two economic variables. It frequently
features in studies of intergenerational mobility as the resulting coefficient,
capturing the rank correlation between the variables, is easy to interpret and
measures overall persistence. However, in many applications it is common
practice to include other covariates to account for differences in persistence
levels between groups defined by the values of these covariates. In these
instances the resulting coefficients can be difficult to interpret. We propose
the conditional rank-rank regression, which uses conditional ranks instead of
unconditional ranks, to measure average within-group persistence. The
difference between conditional and unconditional rank-rank regression
coefficients can then be interpreted as a measure of between-group persistence.
We develop a flexible estimation approach using distribution regression and
establish a theoretical framework for large sample inference. An empirical
study on intergenerational income mobility in Switzerland demonstrates the
advantages of this approach. The study reveals stronger intergenerational
persistence between fathers and sons compared to fathers and daughters, with
the within-group persistence explaining 62% of the overall income persistence
for sons and 52% for daughters. Smaller families and those with highly educated
fathers exhibit greater persistence in economic status.
arXiv link: http://arxiv.org/abs/2407.06387v3
Dynamic Matrix Factor Models for High Dimensional Time Series
are prevalent in various fields such as economics, finance, and engineering.
Such matrix time series data are often observed in high dimensions. Matrix
factor models are employed to reduce the dimensionality of such data, but they
lack the capability to make predictions without specified dynamics in the
latent factor process. To address this issue, we propose a two-component
dynamic matrix factor model that extends the standard matrix factor model by
incorporating a matrix autoregressive structure for the low-dimensional latent
factor process. This two-component model injects prediction capability to the
matrix factor model and provides deeper insights into the dynamics of
high-dimensional matrix time series. We present the estimation procedures of
the model and their theoretical properties, as well as empirical analysis of
the estimation procedures via simulations, and a case study of New York city
taxi data, demonstrating the performance and usefulness of the model.
arXiv link: http://arxiv.org/abs/2407.05624v1
Methodology for Calculating CO2 Absorption by Tree Planting for Greening Projects
which play an important role in climate change mitigation, this paper examines
a formula for estimating the amount of carbon fixation for greening activities
in urban areas through tree planting. The usefulness of the formula studied was
examined by conducting calculations based on actual data through measurements
made by on-site surveys of a greening companie. A series of calculation results
suggest that this formula may be useful. Recognizing carbon credits for green
businesses for the carbon sequestration of their projects is an important
incentive not only as part of environmental improvement and climate change
action, but also to improve the health and well-being of local communities and
to generate economic benefits. This study is a pioneering exploration of the
methodology.
arXiv link: http://arxiv.org/abs/2407.05596v1
A Convexified Matching Approach to Imputation and Individualized Inference
and individualized inference inspired by computational optimal transport. Our
method integrates favorable features from mainstream imputation approaches:
optimal matching, regression imputation, and synthetic control. We impute
counterfactual outcomes based on convex combinations of observed outcomes,
defined based on an optimal coupling between the treated and control data sets.
The optimal coupling problem is considered a convex relaxation to the
combinatorial optimal matching problem. We estimate granular-level individual
treatment effects while maintaining a desirable aggregate-level summary by
properly constraining the coupling. We construct transparent, individual
confidence intervals for the estimated counterfactual outcomes. We devise fast
iterative entropic-regularized algorithms to solve the optimal coupling problem
that scales favorably when the number of units to match is large. Entropic
regularization plays a crucial role in both inference and computation; it helps
control the width of the individual confidence intervals and design fast
optimization algorithms.
arXiv link: http://arxiv.org/abs/2407.05372v1
A Short Note on Event-Study Synthetic Difference-in-Differences Estimators
(SDID) estimators. I show that, in simple and staggered adoption designs,
estimators from Arkhangelsky et al. (2021) can be disaggregated into dynamic
treatment effect estimators, comparing the lagged outcome differentials of
treated and synthetic controls to their pre-treatment average. Estimators
presented in this note can be computed using the sdid_event Stata package.
arXiv link: http://arxiv.org/abs/2407.09565v2
Learning control variables and instruments for causal analysis in observational data
suitable control variables and instruments for assessing the causal effect of a
treatment on an outcome in observational data, if they exist. Our approach
tests the joint existence of instruments, which are associated with the
treatment but not directly with the outcome (at least conditional on
observables), and suitable control variables, conditional on which the
treatment is exogenous, and learns the partition of instruments and control
variables from the observed data. The detection of sets of instruments and
control variables relies on the condition that proper instruments are
conditionally independent of the outcome given the treatment and suitable
control variables. We establish the consistency of our method for detecting
control variables and instruments under certain regularity conditions,
investigate the finite sample performance through a simulation study, and
provide an empirical application to labor market data from the Job Corps study.
arXiv link: http://arxiv.org/abs/2407.04448v2
Overeducation under different macroeconomic conditions: The case of Spanish university graduates
early careers of Spanish university graduates. We investigate the role played
by the business cycle and field of study and their interaction in shaping both
phenomena. We also analyse the relevance of specific types of knowledge and
skills as driving factors in reducing overeducation risk. We use data from the
Survey on the Labour Insertion of University Graduates (EILU) conducted by the
Spanish National Statistics Institute in 2014 and 2019. The survey collects
rich information on cohorts that graduated in the 2009/2010 and 2014/2015
academic years during the Great Recession and the subsequent economic recovery,
respectively. Our results show, first, the relevance of the economic scenario
when graduates enter the labour market. Graduation during a recession increased
overeducation risk and persistence. Second, a clear heterogeneous pattern
occurs across fields of study, with health sciences graduates displaying better
performance in terms of both overeducation incidence and persistence and less
impact of the business cycle. Third, we find evidence that some transversal
skills (language, IT, management) can help to reduce overeducation risk in the
absence of specific knowledge required for the job, thus indicating some kind
of compensatory role. Finally, our findings have important policy implications.
Overeducation, and more importantly overeducation persistence, imply a
non-neglectable misallocation of resources. Therefore, policymakers need to
address this issue in the design of education and labour market policies.
arXiv link: http://arxiv.org/abs/2407.04437v1
Under the null of valid specification, pre-tests cannot make post-test inference liberal
some conditions. Suppose also that we can at least partly test these conditions
with specification tests. We consider the common practice of conducting
inference on the parameter of interest conditional on not rejecting these
tests. We show that if the tested conditions hold, conditional inference is
valid, though possibly conservative. This holds generally, without imposing any
assumption on the asymptotic dependence between the estimator of the parameter
of interest and the specification test.
arXiv link: http://arxiv.org/abs/2407.03725v2
When can weak latent factors be statistically inferred?
theory for principal component analysis (PCA) under the weak factor model that
allow for cross-sectional dependent idiosyncratic components under the nearly
minimal factor strength relative to the noise level or signal-to-noise ratio.
Our theory is applicable regardless of the relative growth rate between the
cross-sectional dimension $N$ and temporal dimension $T$. This more realistic
assumption and noticeable result require completely new technical device, as
the commonly-used leave-one-out trick is no longer applicable to the case with
cross-sectional dependence. Another notable advancement of our theory is on PCA
inference $ - $ for example, under the regime where $N\asymp T$, we show that
the asymptotic normality for the PCA-based estimator holds as long as the
signal-to-noise ratio (SNR) grows faster than a polynomial rate of $\log N$.
This finding significantly surpasses prior work that required a polynomial rate
of $N$. Our theory is entirely non-asymptotic, offering finite-sample
characterizations for both the estimation error and the uncertainty level of
statistical inference. A notable technical innovation is our closed-form
first-order approximation of PCA-based estimator, which paves the way for
various statistical tests. Furthermore, we apply our theories to design
easy-to-implement statistics for validating whether given factors fall in the
linear spans of unknown latent factors, testing structural breaks in the factor
loadings for an individual unit, checking whether two units have the same risk
exposures, and constructing confidence intervals for systematic risks. Our
empirical studies uncover insightful correlations between our test results and
economic cycles.
arXiv link: http://arxiv.org/abs/2407.03616v3
Finely Stratified Rerandomization Designs
stratified rerandomization designs, which use baseline covariates to match
units into groups (e.g. matched pairs), then rerandomize within-group treatment
assignments until a balance criterion is satisfied. We show that finely
stratified rerandomization does partially linear regression adjustment by
design, providing nonparametric control over the stratified covariates and
linear control over the rerandomized covariates. We introduce several new forms
of rerandomization, allowing for imbalance metrics based on nonlinear
estimators, and proposing a minimax scheme that minimizes the computational
cost of rerandomization subject to a bound on estimation error. While the
asymptotic distribution of GMM estimators under stratified rerandomization is
generically non-normal, we show how to restore asymptotic normality using
ex-post linear adjustment tailored to the stratification. We derive new
variance bounds that enable conservative inference on finite population causal
parameters, and provide asymptotically exact inference on their superpopulation
counterparts.
arXiv link: http://arxiv.org/abs/2407.03279v3
Wild inference for wild SVARs with application to heteroscedasticity-based IV
functions (IRF) for persistent data. Existing multiple-parameter inference
requires cumbersome pretesting for unit roots, cointegration, and trends with
subsequent stationarization. To avoid pretesting, we propose a novel
dependent wild bootstrap procedure for simultaneous inference on IRF
using local projections (LP) estimated in levels in possibly
nonstationary and heteroscedastic SVARs. The bootstrap also
allows efficient smoothing of LP estimates.
We study IRF to US monetary policy identified using FOMC meetings count as an
instrument for heteroscedasticity of monetary shocks. We validate our method
using DSGE model simulations and alternative SVAR methods.
arXiv link: http://arxiv.org/abs/2407.03265v2
Conditional Forecasts in Large Bayesian VARs with Multiple Equality and Inequality Constraints
the future paths of some other variables, are used routinely by empirical
macroeconomists in a number of applied settings. In spite of this, the existing
algorithms used to generate conditional forecasts tend to be very
computationally intensive, especially when working with large Vector
Autoregressions or when multiple linear equality and inequality constraints are
imposed at once. We introduce a novel precision-based sampler that is fast,
scales well, and yields conditional forecasts from linear equality and
inequality constraints. We show in a simulation study that the proposed method
produces forecasts that are identical to those from the existing algorithms but
in a fraction of the time. We then illustrate the performance of our method in
a large Bayesian Vector Autoregression where we simultaneously impose a mix of
linear equality and inequality constraints on the future trajectories of key US
macroeconomic indicators over the 2020--2022 period.
arXiv link: http://arxiv.org/abs/2407.02262v1
How do financial variables impact public debt growth in China? An empirical study based on Markov regime-switching model
exacerbated fiscal shocks and soaring public debt levels, which raises concerns
about the stability and sustainability of China's public debt growth in the
future. This paper employs the Markov regime-switching model with time-varying
transition probability (TVTP-MS) to investigate the growth pattern of China's
public debt and the impact of financial variables such as credit, house prices
and stock prices on the growth of public debt. We identify two distinct regimes
of China's public debt, i.e., the surge regime with high growth rate and high
volatility and the steady regime with low growth rate and low volatility. The
main results are twofold. On the one hand, an increase in the growth rate of
the financial variables helps to moderate the growth rate of public debt,
whereas the effects differ between the two regimes. More specifically, the
impacts of credit and house prices are significant in the surge regime, whereas
stock prices affect public debt growth significantly in the steady regime. On
the other hand, a higher growth rate of financial variables also increases the
probability of public debt either staying in or switching to the steady regime.
These findings highlight the necessity of aligning financial adjustments with
the prevailing public debt regime when developing sustainable fiscal policies.
arXiv link: http://arxiv.org/abs/2407.02183v1
Macroeconomic Forecasting with Large Language Models
Language Models (LLMs) against traditional macro time series forecasting
approaches. In recent times, LLMs have surged in popularity for forecasting due
to their ability to capture intricate patterns in data and quickly adapt across
very different domains. However, their effectiveness in forecasting
macroeconomic time series data compared to conventional methods remains an area
of interest. To address this, we conduct a rigorous evaluation of LLMs against
traditional macro forecasting methods, using as common ground the FRED-MD
database. Our findings provide valuable insights into the strengths and
limitations of LLMs in forecasting macroeconomic time series, shedding light on
their applicability in real-world scenarios
arXiv link: http://arxiv.org/abs/2407.00890v4
Three Scores and 15 Years (1948-2023) of Rao's Score Test: A Brief History
likelihood ratio and Wald test statistics. In spite of the optimality
properties of the score statistic shown in Rao and Poti (1946), the Rao score
(RS) test remained unnoticed for almost 20 years. Today, the RS test is part of
the “Holy Trinity” of hypothesis testing and has found its place in the
Statistics and Econometrics textbooks and related software. Reviewing the
history of the RS test we note that remarkable test statistics proposed in the
literature earlier or around the time of Rao (1948) mostly from intuition, such
as Pearson (1900) goodness-fit-test, Moran (1948) I test for spatial dependence
and Durbin and Watson (1950) test for serial correlation, can be given RS test
statistic interpretation. At the same time, recent developments in the robust
hypothesis testing under certain forms of misspecification, make the RS test an
active area of research in Statistics and Econometrics. From our brief account
of the history the RS test we conclude that its impact in science goes far
beyond its calendar starting point with promising future research activities
for many years to come.
arXiv link: http://arxiv.org/abs/2406.19956v3
Vector AutoRegressive Moving Average Models: A Review
general model class for analyzing dynamics among multiple time series. While
VARMA models encompass the Vector AutoRegressive (VAR) models, their popularity
in empirical applications is dominated by the latter. Can this phenomenon be
explained fully by the simplicity of VAR models? Perhaps many users of VAR
models have not fully appreciated what VARMA models can provide. The goal of
this review is to provide a comprehensive resource for researchers and
practitioners seeking insights into the advantages and capabilities of VARMA
models. We start by reviewing the identification challenges inherent to VARMA
models thereby encompassing classical and modern identification schemes and we
continue along the same lines regarding estimation, specification and diagnosis
of VARMA models. We then highlight the practical utility of VARMA models in
terms of Granger Causality analysis, forecasting and structural analysis as
well as recent advances and extensions of VARMA models to further facilitate
their adoption in practice. Finally, we discuss some interesting future
research directions where VARMA models can fulfill their potentials in
applications as compared to their subclass of VAR models.
arXiv link: http://arxiv.org/abs/2406.19702v1
Factor multivariate stochastic volatility models of high dimension
of dimensionality inherent to multivariate volatility processes, we develop a
factor model-based multivariate stochastic volatility (fMSV) framework that
relies on two viewpoints: sparse approximate factor model and sparse factor
loading matrix. We propose a two-stage estimation procedure for the fMSV model:
the first stage obtains the estimators of the factor model, and the second
stage estimates the MSV part using the estimated common factor variables. We
derive the asymptotic properties of the estimators. Simulated experiments are
performed to assess the forecasting performances of the covariance matrices.
The empirical analysis based on vectors of asset returns illustrates that the
forecasting performances of the fMSV models outperforms competing conditional
covariance models.
arXiv link: http://arxiv.org/abs/2406.19033v1
A Note on Identification of Match Fixed Effects as Interpretable Unobserved Match Affinity
interaction terms involving dummy variables for two elements, lack
identification without specific restrictions on parameters. Consequently, the
coefficients typically reported as relative match fixed effects by statistical
software are not interpretable. To address this, we establish normalization
conditions that enable identification of match fixed effect parameters as
interpretable indicators of unobserved match affinity, facilitating comparisons
among observed matches. Using data from middle school students in the 2007
Trends in International Mathematics and Science Study (TIMSS), we highlight the
distribution of comparable match fixed effects within a specific school.
arXiv link: http://arxiv.org/abs/2406.18913v3
Online Distributional Regression
and have led to the development of online learning algorithms. Many fields,
such as supply chain management, weather and meteorology, energy markets, and
finance, have pivoted towards using probabilistic forecasts. This results in
the need not only for accurate learning of the expected value but also for
learning the conditional heteroskedasticity and conditional moments. Against
this backdrop, we present a methodology for online estimation of regularized,
linear distributional models. The proposed algorithm is based on a combination
of recent developments for the online estimation of LASSO models and the
well-known GAMLSS framework. We provide a case study on day-ahead electricity
price forecasting, in which we show the competitive performance of the
incremental estimation combined with strongly reduced computational effort. Our
algorithms are implemented in a computationally efficient Python package ondil.
arXiv link: http://arxiv.org/abs/2407.08750v3
LABOR-LLM: Language-Based Occupational Representations with Large Language Models
that predicts a worker's next job as a function of career history (an
"occupation model"). CAREER was initially estimated ("pre-trained") using a
large, unrepresentative resume dataset, which served as a "foundation model,"
and parameter estimation was continued ("fine-tuned") using data from a
representative survey. CAREER had better predictive performance than
benchmarks. This paper considers an alternative where the resume-based
foundation model is replaced by a large language model (LLM). We convert
tabular data from the survey into text files that resemble resumes and
fine-tune the LLMs using these text files with the objective to predict the
next token (word). The resulting fine-tuned LLM is used as an input to an
occupation model. Its predictive performance surpasses all prior models. We
demonstrate the value of fine-tuning and further show that by adding more
career data from a different population, fine-tuning smaller LLMs surpasses the
performance of fine-tuning larger models.
arXiv link: http://arxiv.org/abs/2406.17972v3
Forecast Relative Error Decomposition
well-suited for the analysis of shocks in nonlinear dynamic models. They
include the Forecast Relative Error Decomposition (FRED), Forecast Error
Kullback Decomposition (FEKD) and Forecast Error Laplace Decomposition (FELD).
These measures are favourable over the traditional Forecast Error Variance
Decomposition (FEVD) because they account for nonlinear dependence in both a
serial and cross-sectional sense. This is illustrated by applications to
dynamic models for qualitative data, count data, stochastic volatility and
cyberrisk.
arXiv link: http://arxiv.org/abs/2406.17708v1
Estimation and Inference for CP Tensor Factor Models
researchers in economics and finance. We consider the estimation and inference
of high-dimensional tensor factor models, where each dimension of the tensor
diverges. Our focus is on a factor model that admits CP-type tensor
decomposition, which allows for non-orthogonal loading vectors. Based on the
contemporary covariance matrix, we propose an iterative simultaneous projection
estimation method. Our estimator is robust to weak dependence among factors and
weak correlation across different dimensions in the idiosyncratic shocks. We
establish an inferential theory, demonstrating both consistency and asymptotic
normality under relaxed assumptions. Within a unified framework, we consider
two eigenvalue ratio-based estimators for the number of factors in a tensor
factor model and justify their consistency. Simulation studies confirm the
theoretical results and an empirical application to sorted portfolios reveals
three important factors: a market factor, a long-short factor, and a volatility
factor.
arXiv link: http://arxiv.org/abs/2406.17278v2
Efficient two-sample instrumental variable estimators with change points and near-weak identification
regressors where the parameters of interest change across two samples. If the
first-stage is common, we show how to use this information to obtain more
efficient two-sample GMM estimators than the standard split-sample GMM, even in
the presence of near-weak instruments. We also propose two tests to detect
change points in the parameters of interest, depending on whether the
first-stage is common or not. We derive the limiting distribution of these
tests and show that they have non-trivial power even under weaker and possibly
time-varying identification patterns. The finite sample properties of our
proposed estimators and testing procedures are illustrated in a series of
Monte-Carlo experiments, and in an application to the open-economy New
Keynesian Phillips curve. Our empirical analysis using US data provides strong
support for a New Keynesian Phillips curve with incomplete pass-through and
reveals important time variation in the relationship between inflation and
exchange rate pass-through.
arXiv link: http://arxiv.org/abs/2406.17056v1
F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data
businesses, especially during high-stake sales events. However, the limited
availability of historical data from these peak periods poses a significant
challenge for traditional forecasting methods. In this paper, we propose a
novel approach that leverages strategically chosen proxy data reflective of
potential sales patterns from similar entities during non-peak periods,
enriched by features learned from a graph neural networks (GNNs)-based
forecasting model, to predict demand during peak events. We formulate the
demand prediction as a meta-learning problem and develop the Feature-based
First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm that leverages
proxy data from non-peak periods and GNN-generated relational metadata to learn
feature-specific layer parameters, thereby adapting to demand forecasts for
peak events. Theoretically, we show that by considering domain similarities
through task-specific metadata, our model achieves improved generalization,
where the excess risk decreases as the number of training tasks increases.
Empirical evaluations on large-scale industrial datasets demonstrate the
superiority of our approach. Compared to existing state-of-the-art models, our
method demonstrates a notable improvement in demand prediction accuracy,
reducing the Mean Absolute Error by 26.24% on an internal vending machine
dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv link: http://arxiv.org/abs/2406.16221v1
Testing for Restricted Stochastic Dominance under Survey Nonresponse with Panel Data: Theory and an Evaluation of Poverty in Australia
dominance testing under survey nonresponse that integrates the partial
identification approach to incomplete data and design-based inference for
complex survey data. We propose a novel inference procedure for restricted
$s$th-order stochastic dominance, tailored to accommodate a broad spectrum of
nonresponse assumptions. The method uses pseudo-empirical likelihood to
formulate the test statistic and compares it to a critical value from the
chi-squared distribution with one degree of freedom. We detail the procedure's
asymptotic properties under both null and alternative hypotheses, establishing
its uniform validity under the null and consistency against various
alternatives. Using the Household, Income and Labour Dynamics in Australia
survey, we demonstrate the procedure's utility in a sensitivity analysis of
temporal poverty comparisons among Australian households.
arXiv link: http://arxiv.org/abs/2406.15702v1
Identification and Estimation of Causal Effects in High-Frequency Event Studies
effects by high-frequency event study regressions, which have been used widely
in the recent macroeconomics, financial economics and political economy
literatures. The high-frequency event study method regresses changes in an
outcome variable on a measure of unexpected changes in a policy variable in a
narrow time window around an event or a policy announcement (e.g., a 30-minute
window around an FOMC announcement). We show that, contrary to popular belief,
the narrow size of the window is not sufficient for identification. Rather, the
population regression coefficient identifies a causal estimand when (i) the
effect of the policy shock on the outcome does not depend on the other
variables (separability) and (ii) the surprise component of the news or event
dominates all other variables that are present in the event window (relative
exogeneity). Technically, the latter condition requires the ratio between the
variance of the policy shock and that of the other variables to be infinite in
the event window. Under these conditions, we establish the causal meaning of
the event study estimand corresponding to the regression coefficient and the
consistency and asymptotic normality of the event study estimator. Notably,
this standard linear regression estimator is robust to general forms of
nonlinearity. We apply our results to Nakamura and Steinsson's (2018a) analysis
of the real economic effects of monetary policy, providing a simple empirical
procedure to analyze the extent to which the standard event study estimator
adequately estimates causal effects of interest.
arXiv link: http://arxiv.org/abs/2406.15667v5
The disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice
monitoring the efficiency and competitiveness of the knowledge economy. To this
end, a disruption index (CD) was recently developed and applied to publication
and patent citation networks (Wu et al., Nature 2019; Park et al., Nature
2023). Here we show that CD systematically decreases over time due to secular
growth in research and patent production, following two distinct mechanisms
unrelated to innovation -- one behavioral and the other structural. Whereas the
behavioral explanation reflects shifts associated with techno-social factors
(e.g. self-citation practices), the structural explanation follows from
`citation inflation' (CI), an inextricable feature of real citation networks
attributable to increasing reference list lengths, which causes CD to
systematically decrease. We demonstrate this causal link by way of mathematical
deduction, computational simulation, multi-variate regression, and
quasi-experimental comparison of the disruptiveness of PNAS versus PNAS Plus
articles, which differ only in their lengths. Accordingly, we analyze CD data
available in the SciSciNet database and find that disruptiveness incrementally
increased from 2005-2015, and that the negative relationship between disruption
and team-size is remarkably small in overall magnitude effect size, and shifts
from negative to positive for team size $\geq$ 8 coauthors.
arXiv link: http://arxiv.org/abs/2406.15311v1
Difference-in-Differences when Parallel Trends Holds Conditional on Covariates
estimation strategies when the parallel trends assumption holds after
conditioning on covariates. We consider empirically relevant settings where the
covariates can be time-varying, time-invariant, or both. We uncover a number of
weaknesses of commonly used two-way fixed effects (TWFE) regressions in this
context, even in applications with only two time periods. In addition to some
weaknesses due to estimating linear regression models that are similar to cases
with cross-sectional data, we also point out a collection of additional issues
that we refer to as hidden linearity bias that arise because the
transformations used to eliminate the unit fixed effect also transform the
covariates (e.g., taking first differences can result in the estimating
equation only including the change in covariates over time, not their level,
and also drop time-invariant covariates altogether). We provide simple
diagnostics for assessing how susceptible a TWFE regression is to hidden
linearity bias based on reformulating the TWFE regression as a weighting
estimator. Finally, we propose simple alternative estimation strategies that
can circumvent these issues.
arXiv link: http://arxiv.org/abs/2406.15288v2
MIDAS-QR with 2-Dimensional Structure
growth-at-risk models in the literature. Most of the research has focused on
imposing structure on the high-frequency lags when estimating MIDAS-QR models
akin to what is done in mean models. However, only imposing structure on the
lag-dimension can potentially induce quantile variation that would otherwise
not be there. In this paper we extend the framework by introducing structure on
both the lag dimension and the quantile dimension. In this way we are able to
shrink unnecessary quantile variation in the high-frequency variables. This
leads to more gradual lag profiles in both dimensions compared to the MIDAS-QR
and UMIDAS-QR. We show that this proposed method leads to further gains in
nowcasting and forecasting on a pseudo-out-of-sample exercise on US data.
arXiv link: http://arxiv.org/abs/2406.15157v1
Statistical Inference and A/B Testing in Fisher Markets and Paced Auctions
equilibrium models: linear Fisher market (LFM) equilibrium and first-price
pacing equilibrium (FPPE). LFM arises from fair resource allocation systems
such as allocation of food to food banks and notification opportunities to
different types of notifications. For LFM, we assume that the data observed is
captured by the classical finite-dimensional Fisher market equilibrium, and its
steady-state behavior is modeled by a continuous limit Fisher market. The
second type of equilibrium we study, FPPE, arises from internet advertising
where advertisers are constrained by budgets and advertising opportunities are
sold via first-price auctions. For platforms that use pacing-based methods to
smooth out the spending of advertisers, FPPE provides a hindsight-optimal
configuration of the pacing method. We propose a statistical framework for the
FPPE model, in which a continuous limit FPPE models the steady-state behavior
of the auction platform, and a finite FPPE provides the data to estimate
primitives of the limit FPPE. Both LFM and FPPE have an Eisenberg-Gale convex
program characterization, the pillar upon which we derive our statistical
theory. We start by deriving basic convergence results for the finite market to
the limit market. We then derive asymptotic distributions, and construct
confidence intervals. Furthermore, we establish the asymptotic local minimax
optimality of estimation based on finite markets. We then show that the theory
can be used for conducting statistically valid A/B testing on auction
platforms. Synthetic and semi-synthetic experiments verify the validity and
practicality of our theory.
arXiv link: http://arxiv.org/abs/2406.15522v3
Movement Prediction-Adjusted Naive Forecast: Is the Naive Baseline Unbeatable in Financial Time Series Forecasting?
difficult benchmark to surpass because of the stochastic nature of the data.
Motivated by this challenge, this study introduces the movement
prediction-adjusted naive forecast (MPANF), a forecast combination method that
systematically refines the naive forecast by incorporating directional
information. In particular, MPANF adjusts the naive forecast with an increment
formed by three components: the in-sample mean absolute increment as the base
magnitude, the movement prediction as the sign, and a coefficient derived from
the in-sample movement prediction accuracy as the scaling factor. The
experimental results on eight financial time series, using the RMSE, MAE, MAPE,
and sMAPE, show that with a movement prediction accuracy of approximately 0.55,
MPANF generally outperforms common benchmarks, including the naive forecast,
naive forecast with drift, IMA(1,1), and linear regression. These findings
indicate that MPANF has the potential to outperform the naive baseline when
reliable movement predictions are available.
arXiv link: http://arxiv.org/abs/2406.14469v10
Estimating Treatment Effects under Recommender Interference: A Structured Neural Networks Approach
personalized content. To evaluate updates to recommender systems targeting
content creators, platforms frequently rely on creator-side randomized
experiments. The treatment effect measures the change in outcomes when a new
algorithm is implemented compared to the status quo. We show that the standard
difference-in-means estimator can lead to biased estimates due to recommender
interference that arises when treated and control creators compete for
exposure. We propose a "recommender choice model" that describes which item
gets exposed from a pool containing both treated and control items. By
combining a structural choice model with neural networks, this framework
directly models the interference pathway while accounting for rich
viewer-content heterogeneity. We construct a debiased estimator of the
treatment effect and prove it is $\sqrt n$-consistent and asymptotically normal
with potentially correlated samples. We validate our estimator's empirical
performance with a field experiment on Weixin short-video platform. In addition
to the standard creator-side experiment, we conduct a costly double-sided
randomization design to obtain a benchmark estimate free from interference
bias. We show that the proposed estimator yields results comparable to the
benchmark, whereas the standard difference-in-means estimator can exhibit
significant bias and even produce reversed signs.
arXiv link: http://arxiv.org/abs/2406.14380v3
Temperature in the Iberian Peninsula: Trend, seasonality, and heterogeneity
the dynamic evolution of bivariate systems of centre and log-range temperatures
obtained monthly from minimum/maximum temperatures observed at a given
location. In doing so, the centre and log-range temperature are decomposed into
potentially stochastic trends, seasonal, and transitory components. Since our
model encompasses deterministic trends and seasonal components as limiting
cases, we contribute to the debate on whether stochastic or deterministic
components better represent the trend and seasonal components. The methodology
is implemented to centre and log-range temperature observed in four locations
in the Iberian Peninsula, namely, Barcelona, Coru\ {n}a, Madrid, and Seville.
We show that, at each location, the centre temperature can be represented by a
smooth integrated random walk with time-varying slope, while a stochastic level
better represents the log-range. We also show that centre and log-range
temperature are unrelated. The methodology is then extended to simultaneously
model centre and log-range temperature observed at several locations in the
Iberian Peninsula. We fit a multi-level dynamic factor model to extract
potential commonalities among centre (log-range) temperature while also
allowing for heterogeneity in different areas in the Iberian Peninsula. We show
that, although the commonality in trends of average temperature is
considerable, the regional components are also relevant.
arXiv link: http://arxiv.org/abs/2406.14145v1
Estimating Time-Varying Parameters of Various Smoothness in Linear Models via Kernel Regression
using kernel regression. Our contributions are threefold. First, we consider a
broad class of time-varying parameters including deterministic smooth
functions, the rescaled random walk, structural breaks, the threshold model and
their mixtures. We show that those time-varying parameters can be consistently
estimated by kernel regression. Our analysis exploits the smoothness of the
time-varying parameter quantified by a single parameter. The second
contribution is to reveal that the bandwidth used in kernel regression
determines a trade-off between the rate of convergence and the size of the
class of time-varying parameters that can be estimated. We demonstrate that an
improper choice of the bandwidth yields biased estimation, and argue that the
bandwidth should be selected according to the smoothness of the time-varying
parameter. Our third contribution is to propose a data-driven procedure for
bandwidth selection that is adaptive to the smoothness of the time-varying
parameter.
arXiv link: http://arxiv.org/abs/2406.14046v4
Testing identification in mediation and dynamic treatment models
dynamic treatment models that is based on two sets of observed variables,
namely covariates to be controlled for and suspected instruments, building on
the test by Huber and Kueck (2022) for single treatment models. We consider
models with a sequential assignment of a treatment and a mediator to assess the
direct treatment effect (net of the mediator), the indirect treatment effect
(via the mediator), or the joint effect of both treatment and mediator. We
establish testable conditions for identifying such effects in observational
data. These conditions jointly imply (1) the exogeneity of the treatment and
the mediator conditional on covariates and (2) the validity of distinct
instruments for the treatment and the mediator, meaning that the instruments do
not directly affect the outcome (other than through the treatment or mediator)
and are unconfounded given the covariates. Our framework extends to
post-treatment sample selection or attrition problems when replacing the
mediator by a selection indicator for observing the outcome, enabling joint
testing of the selectivity of treatment and attrition. We propose a machine
learning-based test to control for covariates in a data-driven manner and
analyze its finite sample performance in a simulation study. Additionally, we
apply our method to Slovak labor market data and find that our testable
implications are not rejected for a sequence of training programs typically
considered in dynamic treatment evaluations.
arXiv link: http://arxiv.org/abs/2406.13826v1
Bayesian Inference for Multidimensional Welfare Comparisons
how Bayesian inference can be used to make multivariate welfare comparisons. A
four-dimensional distribution for the well-being attributes income, mental
health, education, and happiness are estimated via Bayesian Markov chain Monte
Carlo using unit-record data taken from the Household, Income and Labour
Dynamics in Australia survey. Marginal distributions of beta and gamma mixtures
and discrete ordinal distributions are combined using a copula. Improvements in
both well-being generally and poverty magnitude are assessed using posterior
means of single-index measures and posterior probabilities of stochastic
dominance. The conditions for stochastic dominance depend on the class of
utility functions that is assumed to define a social welfare function and the
number of attributes in the utility function. Three classes of utility
functions are considered, and posterior probabilities of dominance are computed
for one, two, and four-attribute utility functions for three time intervals
within the period 2001 to 2019.
arXiv link: http://arxiv.org/abs/2406.13395v1
Testing for Underpowered Literatures
they been run on larger samples? I show how to estimate the expected number of
statistically significant results that a set of experiments would have reported
had their sample sizes all been counterfactually increased. The proposed
deconvolution estimator is asymptotically normal and adjusts for publication
bias. Unlike related methods, this approach requires no assumptions of any kind
about the distribution of true intervention treatment effects and allows for
point masses. Simulations find good coverage even when the t-score is only
approximately normal. An application to randomized trials (RCTs) published in
economics journals finds that doubling every sample would increase the power of
t-tests by 7.2 percentage points on average. This effect is smaller than for
non-RCTs and comparable to systematic replications in laboratory psychology
where previous studies enabled more accurate power calculations. This suggests
that RCTs are on average relatively insensitive to sample size increases.
Research funders who wish to raise power should generally consider sponsoring
better-measured and higher quality experiments -- rather than only larger ones.
arXiv link: http://arxiv.org/abs/2406.13122v3
Model-Based Inference and Experimental Design for Interference Using Partial Network Data
individual is not affected by the treatment statuses of others, however in many
real world applications, treatments can have an effect on many others beyond
the immediately treated. Interference can generically be thought of as mediated
through some network structure. In many empirically relevant situations
however, complete network data (required to adjust for these spillover effects)
are too costly or logistically infeasible to collect. Partially or indirectly
observed network data (e.g., subsamples, aggregated relational data (ARD),
egocentric sampling, or respondent-driven sampling) reduce the logistical and
financial burden of collecting network data, but the statistical properties of
treatment effect adjustments from these design strategies are only beginning to
be explored. In this paper, we present a framework for the estimation and
inference of treatment effect adjustments using partial network data through
the lens of structural causal models. We also illustrate procedures to assign
treatments using only partial network data, with the goal of either minimizing
estimator variance or optimally seeding. We derive single network asymptotic
results applicable to a variety of choices for an underlying graph model. We
validate our approach using simulated experiments on observed graphs with
applications to information diffusion in India and Malawi.
arXiv link: http://arxiv.org/abs/2406.11940v1
Dynamically Consistent Analysis of Realized Covariations in Term Structure Models
nonparametrically and robustly, staying consistent with a general no-arbitrage
setting. This is, in particular, motivated by the problem of identifying the
number of statistically relevant factors in the bond market under minimal
conditions. We apply this method in an empirical study which suggests that a
high number of factors is needed to describe the term structure evolution and
that the term structure of volatility varies over time.
arXiv link: http://arxiv.org/abs/2406.19412v1
Resilience of international oil trade networks under extreme event shock-recovery simulations
situation has become increasingly complex and severe. Assessing the resilience
of the international oil trade network (iOTN) is crucial for evaluating its
ability to withstand extreme shocks and recover thereafter, ensuring energy
security. We overcomes the limitations of discrete historical data by
developing a simulation model for extreme event shock-recovery in the iOTNs. We
introduce network efficiency indicator to measure oil resource allocation
efficiency and evaluate network performance. Then, construct a resilience index
to explore the resilience of the iOTNs from dimensions of resistance and
recoverability. Our findings indicate that extreme events can lead to sharp
declines in performance of the iOTNs, especially when economies with
significant trading positions and relations suffer shocks. The upward trend in
recoverability and resilience reflects the self-organizing nature of the iOTNs,
demonstrating its capacity for optimizing its own structure and functionality.
Unlike traditional energy security research based solely on discrete historical
data or resistance indicators, our model evaluates resilience from multiple
dimensions, offering insights for global energy governance systems while
providing diverse perspectives for various economies to mitigate risks and
uphold energy security.
arXiv link: http://arxiv.org/abs/2406.11467v1
Management Decisions in Manufacturing using Causal Machine Learning -- To Rework, or not to Rework?
policies in manufacturing systems. We consider a single production stage within
a multistage, lot-based system that allows for optional rework steps. While the
rework decision depends on an intermediate state of the lot and system, the
final product inspection, and thus the assessment of the actual yield, is
delayed until production is complete. Repair steps are applied uniformly to the
lot, potentially improving some of the individual items while degrading others.
The challenge is thus to balance potential yield improvement with the rework
costs incurred. Given the inherently causal nature of this decision problem, we
propose a causal model to estimate yield improvement. We apply methods from
causal machine learning, in particular double/debiased machine learning (DML)
techniques, to estimate conditional treatment effects from data and derive
policies for rework decisions. We validate our decision model using real-world
data from opto-electronic semiconductor manufacturing, achieving a yield
improvement of 2 - 3% during the color-conversion process of white
light-emitting diodes (LEDs).
arXiv link: http://arxiv.org/abs/2406.11308v1
Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data
significantly impacted software development. Utilizing novel data from GitHub
Innovation Graph, we hypothesize that ChatGPT enhances software production
efficiency. Utilizing natural experiments where some governments banned
ChatGPT, we employ Difference-in-Differences (DID), Synthetic Control (SC), and
Synthetic Difference-in-Differences (SDID) methods to estimate its effects. Our
findings indicate a significant positive impact on the number of git pushes,
repositories, and unique developers per 100,000 people, particularly for
high-level, general purpose, and shell scripting languages. These results
suggest that AI tools like ChatGPT can substantially boost developer
productivity, though further analysis is needed to address potential downsides
such as low quality code and privacy concerns.
arXiv link: http://arxiv.org/abs/2406.11046v1
EM Estimation of Conditional Matrix Variate $t$ Distributions
Battulga (2024a). In this paper, we propose a new version of the conditional
matrix variate student $t$ distribution. The paper provides EM algorithms,
which estimate parameters of the conditional matrix variate student $t$
distributions, including general cases and special cases with Minnesota prior.
arXiv link: http://arxiv.org/abs/2406.10837v3
Randomization Inference: Theory and Applications
Permutation tests are treated as an important special case. Under a certain
group invariance property, referred to as the “randomization hypothesis,”
randomization tests achieve exact control of the Type I error rate in finite
samples. Although this unequivocal precision is very appealing, the range of
problems that satisfy the randomization hypothesis is somewhat limited. We show
that randomization tests are often asymptotically, or approximately, valid and
efficient in settings that deviate from the conditions required for
finite-sample error control. When randomization tests fail to offer even
asymptotic Type 1 error control, their asymptotic validity may be restored by
constructing an asymptotically pivotal test statistic. Randomization tests can
then provide exact error control for tests of highly structured hypotheses with
good performance in a wider class of problems. We give a detailed overview of
several prominent applications of randomization tests, including two-sample
permutation tests, regression, and conformal inference.
arXiv link: http://arxiv.org/abs/2406.09521v2
Multidimensional clustering in judge designs
identities that are implicitly or explicitly used as instrumental variables.
The usual method to analyse judge designs, via a leave-out mean instrument,
eliminates this many instrument bias only in case the data are clustered in at
most one dimension. What is left out in the mean defines this clustering
dimension. How most judge designs cluster their standard errors, however,
implies that there are additional clustering dimensions, which makes that a
many instrument bias remains. We propose two estimators that are many
instrument bias free, also in multidimensional clustered judge designs. The
first generalises the one dimensional cluster jackknife instrumental variable
estimator, by removing from this estimator the additional bias terms due to the
extra dependence in the data. The second models all but one clustering
dimensions by fixed effects and we show how these numerous fixed effects can be
removed without introducing extra bias. A Monte-Carlo experiment and the
revisitation of two judge designs show the empirical relevance of properly
accounting for multidimensional clustering in estimation.
arXiv link: http://arxiv.org/abs/2406.09473v1
Jackknife inference with two-way clustering
to assume that the disturbances are clustered in two dimensions. However, the
finite-sample properties of two-way cluster-robust tests and confidence
intervals are often poor. We discuss several ways to improve inference with
two-way clustering. Two of these are existing methods for avoiding, or at least
ameliorating, the problem of undefined standard errors when a cluster-robust
variance matrix estimator (CRVE) is not positive definite. One is a new method
that always avoids the problem. More importantly, we propose a family of new
two-way CRVEs based on the cluster jackknife. Simulations for models with
two-way fixed effects suggest that, in many cases, the cluster-jackknife CRVE
combined with our new method yields surprisingly accurate inferences. We
provide a simple software package, twowayjack for Stata, that implements our
recommended variance estimator.
arXiv link: http://arxiv.org/abs/2406.08880v1
Identification and Inference on Treatment Effects under Covariate-Adaptive Randomization and Imperfect Compliance
randomization (CAR) (e.g., stratified block randomization) and commonly suffer
from imperfect compliance. This paper studies the identification and inference
for the average treatment effect (ATE) and the average treatment effect on the
treated (ATT) in such RCTs with a binary treatment.
We first develop characterizations of the identified sets for both estimands.
Since data are generally not i.i.d. under CAR, these characterizations do not
follow from existing results. We then provide consistent estimators of the
identified sets and asymptotically valid confidence intervals for the
parameters. Our asymptotic analysis leads to concrete practical recommendations
regarding how to estimate the treatment assignment probabilities that enter the
estimated bounds. For the ATE bounds, using sample analog assignment
frequencies is more efficient than relying on the true assignment
probabilities. For the ATT bounds, the most efficient approach is to use the
true assignment probability for the probabilities in the numerator and the
sample analog for those in the denominator.
arXiv link: http://arxiv.org/abs/2406.08419v3
Positive and negative word of mouth in the United States
sentiment to other consumers about a business. While this process has long been
recognized as a type of promotion for businesses, the value of word of mouth is
questionable. This study will examine the various correlates of word of mouth
to demographic variables, including the role of the trust of business owners.
Education level, region of residence, and income level were found to be
significant predictors of positive word of mouth. Although the results
generally suggest that the majority of respondents do not engage in word of
mouth, there are valuable insights to be learned.
arXiv link: http://arxiv.org/abs/2406.08279v1
HARd to Beat: The Overlooked Impact of Rolling Windows in the Era of Machine Learning
(HAR) model compared to machine learning (ML) techniques across an
unprecedented dataset of 1,455 stocks. Our analysis focuses on the role of
fitting schemes, particularly the training window and re-estimation frequency,
in determining the HAR model's performance. Despite extensive hyperparameter
tuning, ML models fail to surpass the linear benchmark set by HAR when
utilizing a refined fitting approach for the latter. Moreover, the simplicity
of HAR allows for an interpretable model with drastically lower computational
costs. We assess performance using QLIKE, MSE, and realized utility metrics,
finding that HAR consistently outperforms its ML counterparts when both rely
solely on realized volatility and VIX as predictors. Our results underscore the
importance of a correctly specified fitting scheme. They suggest that properly
fitted HAR models provide superior forecasting accuracy, establishing robust
guidelines for their practical application and use as a benchmark. This study
not only reaffirms the efficacy of the HAR model but also provides a critical
perspective on the practical limitations of ML approaches in realized
volatility forecasting.
arXiv link: http://arxiv.org/abs/2406.08041v1
Did Harold Zuercher Have Time-Separable Preferences?
for non-separable time preferences, generalizing the well-known Rust (1987)
model. Under weak conditions, we show the existence of value functions and
hence well-defined optimal choices. We construct a contraction mapping of the
value function and propose an estimation method similar to Rust's nested fixed
point algorithm. Finally, we apply the framework to the bus engine replacement
data. We improve the fit of the data with our general model and reject the null
hypothesis that Harold Zuercher has separable time preferences. Misspecifying
an agent's preference as time-separable when it is not leads to biased
inferences about structure parameters (such as the agent's risk attitudes) and
misleading policy recommendations.
arXiv link: http://arxiv.org/abs/2406.07809v1
Cluster GARCH
distributions that is applicable in high-dimensional systems. The model is
called Cluster GARCH because it can accommodate cluster structures in the
conditional correlation matrix and in the tail dependencies. The expressions
for the log-likelihood function and its derivatives are tractable, and the
latter facilitate a score-drive model for the dynamic correlation structure. We
apply the Cluster GARCH model to daily returns for 100 assets and find it
outperforms existing models, both in-sample and out-of-sample. Moreover, the
convolution-t distribution provides a better empirical performance than the
conventional multivariate t-distribution.
arXiv link: http://arxiv.org/abs/2406.06860v1
Robustness to Missing Data: Breakdown Point Analysis
plausible that the data are missing (completely) at random. This paper proposes
a methodology for studying the robustness of results drawn from incomplete
datasets. Selection is measured as the squared Hellinger divergence between the
distributions of complete and incomplete observations, which has a natural
interpretation. The breakdown point is defined as the minimal amount of
selection needed to overturn a given result. Reporting point estimates and
lower confidence intervals of the breakdown point is a simple, concise way to
communicate the robustness of a result. An estimator of the breakdown point of
a result drawn from a generalized method of moments model is proposed and shown
root-n consistent and asymptotically normal under mild assumptions. Lower
confidence intervals of the breakdown point are simple to construct. The paper
concludes with a simulation study illustrating the finite sample performance of
the estimators in several common models.
arXiv link: http://arxiv.org/abs/2406.06804v1
Data-Driven Switchback Experiments: Theoretical Tradeoffs and Empirical Bayes Designs
single aggregate unit. The design problem is to partition the continuous time
space into intervals and switch treatments between intervals, in order to
minimize the estimation error of the treatment effect. We show that the
estimation error depends on four factors: carryover effects, periodicity,
serially correlated outcomes, and impacts from simultaneous experiments. We
derive a rigorous bias-variance decomposition and show the tradeoffs of the
estimation error from these factors. The decomposition provides three new
insights in choosing a design: First, balancing the periodicity between treated
and control intervals reduces the variance; second, switching less frequently
reduces the bias from carryover effects while increasing the variance from
correlated outcomes, and vice versa; third, randomizing interval start and end
points reduces both bias and variance from simultaneous experiments. Combining
these insights, we propose a new empirical Bayes design approach. This approach
uses prior data and experiments for designing future experiments. We illustrate
this approach using real data from a ride-sharing platform, yielding a design
that reduces MSE by 33% compared to the status quo design used on the platform.
arXiv link: http://arxiv.org/abs/2406.06768v1
Data-Driven Real-time Coupon Allocation in the Online Platform
discount rates. However, advancements in machine learning and the availability
of abundant customer data now enable platforms to provide real-time customized
coupons to individuals. In this study, we partner with Meituan, a leading
shopping platform, to develop a real-time, end-to-end coupon allocation system
that is fast and effective in stimulating demand while adhering to marketing
budgets when faced with uncertain traffic from a diverse customer base.
Leveraging comprehensive customer and product features, we estimate Conversion
Rates (CVR) under various coupon values and employ isotonic regression to
ensure the monotonicity of predicted CVRs with respect to coupon value. Using
calibrated CVR predictions as input, we propose a Lagrangian Dual-based
algorithm that efficiently determines optimal coupon values for each arriving
customer within 50 milliseconds. We theoretically and numerically investigate
the model performance under parameter misspecifications and apply a control
loop to adapt to real-time updated information, thereby better adhering to the
marketing budget. Finally, we demonstrate through large-scale field experiments
and observational data that our proposed coupon allocation algorithm
outperforms traditional approaches in terms of both higher conversion rates and
increased revenue. As of May 2024, Meituan has implemented our framework to
distribute coupons to over 100 million users across more than 110 major cities
in China, resulting in an additional CNY 8 million in annual profit. We
demonstrate how to integrate a machine learning prediction model for estimating
customer CVR, a Lagrangian Dual-based coupon value optimizer, and a control
system to achieve real-time coupon delivery while dynamically adapting to
random customer arrival patterns.
arXiv link: http://arxiv.org/abs/2406.05987v3
Heterogeneous Treatment Effects in Panel Data
treatment effects using panel data with general treatment patterns. Many
existing methods either do not utilize the potential underlying structure in
panel data or have limitations in the allowable treatment patterns. In this
work, we propose and evaluate a new method that first partitions observations
into disjoint clusters with similar treatment effects using a regression tree,
and then leverages the (assumed) low-rank structure of the panel data to
estimate the average treatment effect for each cluster. Our theoretical results
establish the convergence of the resulting estimates to the true treatment
effects. Computation experiments with semi-synthetic data show that our method
achieves superior accuracy compared to alternative approaches, using a
regression tree with no more than 40 leaves. Hence, our method provides more
accurate and interpretable estimates than alternative methods.
arXiv link: http://arxiv.org/abs/2406.05633v1
Causal Interpretation of Regressions With Ranks
it is common to transform the key variables into percentile ranks. Yet, it
remains unclear what the regression coefficient estimates with ranks of the
outcome or the treatment. In this paper, we derive effective causal estimands
for a broad class of commonly-used regression methods, including the ordinary
least squares (OLS), two-stage least squares (2SLS), difference-in-differences
(DiD), and regression discontinuity designs (RDD). Specifically, we introduce a
novel primitive causal estimand, the Rank Average Treatment Effect (rank-ATE),
and prove that it serves as the building block of the effective estimands of
all the aforementioned econometrics methods. For 2SLS, DiD, and RDD, we show
that direct applications to outcome ranks identify parameters that are
difficult to interpret. To address this issue, we develop alternative methods
to identify more interpretable causal parameters.
arXiv link: http://arxiv.org/abs/2406.05548v1
Strong Approximations for Empirical Processes Indexed by Lipschitz Functions
processes indexed by classes of functions based on $d$-variate random vectors
($d\geq1$). First, a uniform Gaussian strong approximation is established for
general empirical processes indexed by possibly Lipschitz functions, improving
on previous results in the literature. In the setting considered by Rio (1994),
and if the function class is Lipschitzian, our result improves the
approximation rate $n^{-1/(2d)}$ to $n^{-1/\max\{d,2\}}$, up to a
$polylog(n)$ term, where $n$ denotes the sample size.
Remarkably, we establish a valid uniform Gaussian strong approximation at the
rate $n^{-1/2}\log n$ for $d=2$, which was previously known to be valid only
for univariate ($d=1$) empirical processes via the celebrated Hungarian
construction (Koml\'os et al., 1975). Second, a uniform Gaussian strong
approximation is established for multiplicative separable empirical processes
indexed by possibly Lipschitz functions, which addresses some outstanding
problems in the literature (Chernozhukov et al., 2014, Section 3). Finally, two
other uniform Gaussian strong approximation results are presented when the
function class is a sequence of Haar basis based on quasi-uniform partitions.
Applications to nonparametric density and regression estimation are discussed.
arXiv link: http://arxiv.org/abs/2406.04191v2
GLOBUS: Global building renovation potential by 2070
building sector accounted for 34% and 37% of global energy consumption and
carbon emissions in 2021, respectively. The building sector, the final piece to
be addressed in the transition to net-zero carbon emissions, requires a
comprehensive, multisectoral strategy for reducing emissions. Until now, the
absence of data on global building floorspace has impeded the measurement of
building carbon intensity (carbon emissions per floorspace) and the
identification of ways to achieve carbon neutrality for buildings. For this
study, we develop a global building stock model (GLOBUS) to fill that data gap.
Our study's primary contribution lies in providing a dataset of global building
stock turnover using scenarios that incorporate various levels of building
renovation. By unifying the evaluation indicators, the dataset empowers
building science researchers to perform comparative analyses based on
floorspace. Specifically, the building stock dataset establishes a reference
for measuring carbon emission intensity and decarbonization intensity of
buildings within different countries. Further, we emphasize the sufficiency of
existing buildings by incorporating building renovation into the model.
Renovation can minimize the need to expand the building stock, thereby
bolstering decarbonization of the building sector.
arXiv link: http://arxiv.org/abs/2406.04133v1
Comments on B. Hansen's Reply to "A Comment on: `A Modern Gauss-Markov Theorem'", and Some Related Discussion
P\"otscher and Preinerstorfer (2024, published in Econometrica) we have tried
to clear up the confusion introduced in Hansen (2022a) and in the earlier
versions Hansen (2021a,b). Unfortunatelly, Hansen's (2024) reply to P\"otscher
and Preinerstorfer (2024) further adds to the confusion. While we are already
somewhat tired of the matter, for the sake of the econometrics community we
feel compelled to provide clarification. We also add a comment on Portnoy
(2023), a "correction" to Portnoy (2022), as well as on Lei and Wooldridge
(2022).
arXiv link: http://arxiv.org/abs/2406.03971v1
Decision synthesis in monetary policy
uncertainties that complicate modelling. In response, decision-makers consider
multiple models that provide different predictions and policy recommendations
which are then synthesized into a policy decision. In this setting, we develop
Bayesian predictive decision synthesis (BPDS) to formalize monetary policy
decision processes. BPDS draws on recent developments in model combination and
statistical decision theory that yield new opportunities in combining multiple
models, emphasizing the integration of decision goals, expectations and
outcomes into the model synthesis process. Our case study concerns central bank
policy decisions about target interest rates with a focus on implications for
multi-step macroeconomic forecasting. This application also motivates new
methodological developments in conditional forecasting and BPDS, presented and
developed here.
arXiv link: http://arxiv.org/abs/2406.03321v2
Identification of structural shocks in Bayesian VEC models with two-state Markov-switching heteroskedasticity
identified by two-state Markovian breaks in conditional covariances. The
resulting structural VEC specification with Markov-switching heteroskedasticity
(SVEC-MSH) is formulated in the so-called B-parameterization, in which the
prior distribution is specified directly for the matrix of the instantaneous
reactions of the endogenous variables to structural innovations. We discuss
some caveats pertaining to the identification conditions presented earlier in
the literature on stationary structural VAR-MSH models, and revise the
restrictions to actually ensure the unique global identification through the
two-state heteroskedasticity. To enable the posterior inference in the proposed
model, we design an MCMC procedure, combining the Gibbs sampler and the
Metropolis-Hastings algorithm. The methodology is illustrated both with a
simulated as well as real-world data examples.
arXiv link: http://arxiv.org/abs/2406.03053v2
Is local opposition taking the wind out of the energy transition?
potential threat to the energy transition. Local communities tend to oppose the
construction of energy plants due to the associated negative externalities (the
so-called 'not in my backyard' or NIMBY phenomenon) according to widespread
belief, mostly based on anecdotal evidence. Using administrative data on wind
turbine installation and electoral outcomes across municipalities located in
the South of Italy during 2000-19, we estimate the impact of wind turbines'
installation on incumbent regional governments' electoral support during the
next elections. Our main findings, derived by a wind-speed based instrumental
variable strategy, point in the direction of a mild and not statistically
significant electoral backlash for right-wing regional administrations and of a
strong and statistically significant positive reinforcement for left-wing
regional administrations. Based on our analysis, the hypothesis of an electoral
effect of NIMBY type of behavior in connection with the development of wind
turbines appears not to be supported by the data.
arXiv link: http://arxiv.org/abs/2406.03022v1
When does IV identification not restrict outcomes?
without requiring any restrictions on the distribution of potential outcomes,
or how those outcomes are correlated with selection behavior. This enables IV
models to allow for arbitrary heterogeneity in treatment effects and the
possibility of selection on gains in the outcome. I provide a necessary and
sufficient condition for treatment effects to be point identified in a manner
that does not restrict outcomes, when the instruments take a finite number of
values. The condition generalizes the well-known LATE monotonicity assumption,
and unifies a wide variety of other known IV identification results. The result
also yields a brute-force approach to reveal all selection models that allow
for point identification of treatment effects without restricting outcomes, and
then enumerate all of the identified parameters within each such selection
model. The search uncovers new selection models that yield identification,
provides impossibility results for others, and offers opportunities to relax
assumptions on selection used in existing literature. An application considers
the identification of complementarities between two cross-randomized
treatments, obtaining a necessary and sufficient condition on selection for
local average complementarities among compliers to be identified in a manner
that does not restrict outcomes. I use this result to revisit two empirical
settings, one in which the data are incompatible with this restriction on
selection, and another in which the data are compatible with the restriction.
arXiv link: http://arxiv.org/abs/2406.02835v6
The Impact of Acquisition on Product Quality in the Console Gaming Industry
sector, has witnessed a wave of consolidation in recent years, epitomized by
Microsoft's high-profile acquisitions of Activision Blizzard and Zenimax. This
study investigates the repercussions of such mergers on consumer welfare and
innovation within the gaming landscape, focusing on product quality as a key
metric. Through a comprehensive analysis employing a difference-in-difference
model, the research evaluates the effects of acquisition on game review
ratings, drawing from a dataset comprising over 16,000 console games released
between 2000 and 2023. The research addresses key assumptions underlying the
difference-in-difference methodology, including parallel trends and spillover
effects, to ensure the robustness of the findings. The DID results suggest a
positive and statistically significant impact of acquisition on game review
ratings, when controlling for genre and release year. The study contributes to
the literature by offering empirical evidence on the direct consequences of
industry consolidation on consumer welfare and competition dynamics within the
gaming sector.
arXiv link: http://arxiv.org/abs/2406.02525v1
Enabling Decision-Making with the Modified Causal Forest: Policy Trees for Treatment Assignment
disciplines, such as medicine, economics, and business. This paper provides
guidance to practitioners on how to implement a decision tree designed to
address treatment assignment policies using an interpretable and non-parametric
algorithm. Our Policy Tree is motivated on the method proposed by Zhou, Athey,
and Wager (2023), distinguishing itself for the policy score calculation,
incorporating constraints, and handling categorical and continuous variables.
We demonstrate the usage of the Policy Tree for multiple, discrete treatments
on data sets from different fields. The Policy Tree is available in Python's
open-source package mcf (Modified Causal Forest).
arXiv link: http://arxiv.org/abs/2406.02241v1
A sequential test procedure for the choice of the number of regimes in multivariate nonlinear models
regimes in nonlinear multivariate autoregressive models. The procedure relies
on linearity and no additional nonlinearity tests for both multivariate smooth
transition and threshold autoregressive models. We conduct a simulation study
to evaluate the finite-sample properties of the proposed test in small samples.
Our findings indicate that the test exhibits satisfactory size properties, with
the rescaled version of the Lagrange Multiplier test statistics demonstrating
the best performance in most simulation settings. The sequential procedure is
also applied to two empirical cases, the US monthly interest rates and
Icelandic river flows. In both cases, the detected number of regimes aligns
well with the existing literature.
arXiv link: http://arxiv.org/abs/2406.02152v1
Random Subspace Local Projections
projections with many controls. Random subspace methods have their roots in the
machine learning literature and are implemented by averaging over regressions
estimated over different combinations of subsets of these controls. We document
three key results: (i) Our approach can successfully recover the impulse
response functions across Monte Carlo experiments representative of different
macroeconomic settings and identification schemes. (ii) Our results suggest
that random subspace methods are more accurate than other dimension reduction
methods if the underlying large dataset has a factor structure similar to
typical macroeconomic datasets such as FRED-MD. (iii) Our approach leads to
differences in the estimated impulse response functions relative to benchmark
methods when applied to two widely studied empirical applications.
arXiv link: http://arxiv.org/abs/2406.01002v1
A Robust Residual-Based Test for Structural Changes in Factor Models
testing procedure for detecting structural changes in factor models, which is
powerful against both smooth and abrupt structural changes with unknown break
dates. The proposed test is robust against the over-specified number of
factors, and serially and crosssectionally correlated error processes. A new
central limit theorem is given for the quadratic forms of panel data with
dependence over both dimensions, thereby filling a gap in the literature. We
establish the asymptotic properties of the proposed test statistic, and
accordingly develop a simulation-based scheme to select critical value in order
to improve finite sample performance. Through extensive simulations and a
real-world application, we confirm our theoretical results and demonstrate that
the proposed test exhibits desirable size and power in practice.
arXiv link: http://arxiv.org/abs/2406.00941v2
Comparing Experimental and Nonexperimental Methods: What Lessons Have We Learned Four Decades After LaLonde (1986)?
estimates to experimental benchmarks (LaLonde 1986). He concluded that the
nonexperimental methods at the time could not systematically replicate
experimental benchmarks, casting doubt on their credibility. Following
LaLonde's critical assessment, there have been significant methodological
advances and practical changes, including (i) an emphasis on the
unconfoundedness assumption separated from functional form considerations, (ii)
a focus on the importance of overlap in covariate distributions, (iii) the
introduction of propensity score-based methods leading to doubly robust
estimators, (iv) methods for estimating and exploiting treatment effect
heterogeneity, and (v) a greater emphasis on validation exercises to bolster
research credibility. To demonstrate the practical lessons from these advances,
we reexamine the LaLonde data. We show that modern methods, when applied in
contexts with sufficient covariate overlap, yield robust estimates for the
adjusted differences between the treatment and control groups. However, this
does not imply that these estimates are causally interpretable. To assess their
credibility, validation exercises (such as placebo tests) are essential,
whereas goodness-of-fit tests alone are inadequate. Our findings highlight the
importance of closely examining the assignment process, carefully inspecting
overlap, and conducting validation exercises when analyzing causal effects with
nonexperimental data.
arXiv link: http://arxiv.org/abs/2406.00827v3
On the modelling and prediction of high-dimensional functional time series
functional time series, where the number of function-valued time series $p$ is
large in relation to the length of time series $n$. Our first step performs an
eigenanalysis of a positive definite matrix, which leads to a one-to-one linear
transformation for the original high-dimensional functional time series, and
the transformed curve series can be segmented into several groups such that any
two subseries from any two different groups are uncorrelated both
contemporaneously and serially. Consequently in our second step those groups
are handled separately without the information loss on the overall linear
dynamic structure. The second step is devoted to establishing a
finite-dimensional dynamical structure for all the transformed functional time
series within each group. Furthermore the finite-dimensional structure is
represented by that of a vector time series. Modelling and forecasting for the
original high-dimensional functional time series are realized via those for the
vector time series in all the groups. We investigate the theoretical properties
of our proposed methods, and illustrate the finite-sample performance through
both extensive simulation and two real datasets.
arXiv link: http://arxiv.org/abs/2406.00700v1
Cluster-robust jackknife and bootstrap inference for logistic regression models
Inference based on the most commonly-used cluster-robust variance matrix
estimator (CRVE) can be very unreliable. We study several alternatives.
Conceptually the simplest of these, but also the most computationally
demanding, involves jackknifing at the cluster level. We also propose a
linearized version of the cluster-jackknife variance matrix estimator as well
as linearized versions of the wild cluster bootstrap. The linearizations are
based on empirical scores and are computationally efficient. Our results can
readily be generalized to other binary response models. We also discuss a new
Stata software package called logitjack which implements these procedures.
Simulation results strongly favor the new methods, and two empirical examples
suggest that it can be important to use them in practice.
arXiv link: http://arxiv.org/abs/2406.00650v2
Portfolio Optimization with Robust Covariance and Conditional Value-at-Risk Constraints
framework. In this study, we explored various methods to obtain a robust
covariance estimators that are less susceptible to financial data noise. We
evaluated the performance of large-cap portfolio using various forms of Ledoit
Shrinkage Covariance and Robust Gerber Covariance matrix during the period of
2012 to 2022. Out-of-sample performance indicates that robust covariance
estimators can outperform the market capitalization-weighted benchmark
portfolio, particularly during bull markets. The Gerber covariance with
Mean-Absolute-Deviation (MAD) emerged as the top performer. However, robust
estimators do not manage tail risk well under extreme market conditions, for
example, Covid-19 period. When we aim to control for tail risk, we should add
constraint on Conditional Value-at-Risk (CVaR) to make more conservative
decision on risk exposure. Additionally, we incorporated unsupervised
clustering algorithm K-means to the optimization algorithm (i.e. Nested
Clustering Optimization, NCO). It not only helps mitigate numerical instability
of the optimization algorithm, but also contributes to lower drawdown as well.
arXiv link: http://arxiv.org/abs/2406.00610v1
Financial Deepening and Economic Growth in Select Emerging Markets with Currency Board Systems: Theory and Evidence
countries with currency board systems and raises some questions about the
connection between financial development and growth in currency board systems.
Most of those cases are long past episodes of what we would now call emerging
markets. However, the paper also looks at Hong Kong, the currency board system
that is one of the world's largest and most advanced financial markets. The
global financial crisis of 2008 09 created doubts about the efficiency of
financial markets in advanced economies, including in Hong Kong, and unsettled
the previous consensus that a large financial sector would be more stable than
a smaller one.
arXiv link: http://arxiv.org/abs/2406.00472v1
Optimizing hydrogen and e-methanol production through Power-to-X integration in biogas plants
hydrogen and electro fuels infrastructure. These fuels will be crucial as
energy carriers and balancing agents for renewable energy variability. Large
scale production requires more renewable capacity, and various Power to X (PtX)
concepts are emerging in renewable rich countries. However, sourcing renewable
carbon to scale carbon based electro fuels is a significant challenge. This
study explores a PtX hub that sources renewable CO2 from biogas plants,
integrating renewable energy, hydrogen production, and methanol synthesis on
site. This concept creates an internal market for energy and materials,
interfacing with the external energy system. The size and operation of the PtX
hub were optimized, considering integration with local energy systems and a
potential hydrogen grid. The levelized costs of hydrogen and methanol were
estimated for a 2030 start, considering new legislation on renewable fuels of
non biological origin (RFNBOs). Our results show the PtX hub can rely mainly on
on site renewable energy, selling excess electricity to the grid. A local
hydrogen grid connection improves operations, and the behind the meter market
lowers energy prices, buffering against market variability. We found methanol
costs could be below 650 euros per ton and hydrogen production costs below 3
euros per kg, with standalone methanol plants costing 23 per cent more. The CO2
recovery to methanol production ratio is crucial, with over 90 per cent
recovery requiring significant investment in CO2 and H2 storage. Overall, our
findings support planning PtX infrastructures integrated with the agricultural
sector as a cost effective way to access renewable carbon.
arXiv link: http://arxiv.org/abs/2406.00442v1
From day-ahead to mid and long-term horizons with econometric electricity price forecasting models
carbon and power prices, with electricity reaching up to 40 times the
pre-crisis average. This had dramatic consequences for operational and risk
management prompting the need for robust econometric models for mid to
long-term electricity price forecasting. After a comprehensive literature
analysis, we identify key challenges and address them with novel approaches: 1)
Fundamental information is incorporated by constraining coefficients with
bounds derived from fundamental models offering interpretability; 2) Short-term
regressors such as load and renewables can be used in long-term forecasts by
incorporating their seasonal expectations to stabilize the model; 3) Unit root
behavior of power prices, induced by fuel prices, can be managed by estimating
same-day relationships and projecting them forward. We develop interpretable
models for a range of forecasting horizons from one day to one year ahead,
providing guidelines on robust modeling frameworks and key explanatory
variables for each horizon. Our study, focused on Europe's largest energy
market, Germany, analyzes hourly electricity prices using regularized
regression methods and generalized additive models.
arXiv link: http://arxiv.org/abs/2406.00326v2
Transforming Japan Real Estate
significant investment opportunities. Accurate rent and price forecasting could
provide a substantial competitive edge. This paper explores using alternative
data variables to predict real estate performance in 1100 Japanese
municipalities. A comprehensive house price index was created, covering all
municipalities from 2005 to the present, using a dataset of over 5 million
transactions. This core dataset was enriched with economic factors spanning
decades, allowing for price trajectory predictions.
The findings show that alternative data variables can indeed forecast real
estate performance effectively. Investment signals based on these variables
yielded notable returns with low volatility. For example, the net migration
ratio delivered an annualized return of 4.6% with a Sharpe ratio of 1.5.
Taxable income growth and new dwellings ratio also performed well, with
annualized returns of 4.1% (Sharpe ratio of 1.3) and 3.3% (Sharpe ratio of
0.9), respectively. When combined with transformer models to predict
risk-adjusted returns 4 years in advance, the model achieved an R-squared score
of 0.28, explaining nearly 30% of the variation in future municipality prices.
These results highlight the potential of alternative data variables in real
estate investment. They underscore the need for further research to identify
more predictive factors. Nonetheless, the evidence suggests that such data can
provide valuable insights into real estate price drivers, enabling more
informed investment decisions in the Japanese market.
arXiv link: http://arxiv.org/abs/2405.20715v1
Multidimensional spatiotemporal clustering -- An application to environmental sustainability scores in Europe
in facilitating the transition to a green and low-carbon intensity economy.
However, companies located in different areas may be subject to different
sustainability and environmental risks and policies. Henceforth, the main
objective of this paper is to investigate the spatial and temporal pattern of
the sustainability evaluations of European firms. We leverage on a large
dataset containing information about companies' sustainability performances,
measured by MSCI ESG ratings, and geographical coordinates of firms in Western
Europe between 2013 and 2023. By means of a modified version of the Chavent et
al. (2018) hierarchical algorithm, we conduct a spatial clustering analysis,
combining sustainability and spatial information, and a spatiotemporal
clustering analysis, which combines the time dynamics of multiple
sustainability features and spatial dissimilarities, to detect groups of firms
with homogeneous sustainability performance. We are able to build
cross-national and cross-industry clusters with remarkable differences in terms
of sustainability scores. Among other results, in the spatio-temporal analysis,
we observe a high degree of geographical overlap among clusters, indicating
that the temporal dynamics in sustainability assessment are relevant within a
multidimensional approach. Our findings help to capture the diversity of ESG
ratings across Western Europe and may assist practitioners and policymakers in
evaluating companies facing different sustainability-linked risks in different
areas.
arXiv link: http://arxiv.org/abs/2405.20191v1
The ARR2 prior: flexible predictive prior definition for Bayesian auto-regressions
in Bayesian time-series models and their induced $R^2$. Compared to other
priors designed for times-series models, the ARR2 prior allows for flexible and
intuitive shrinkage. We derive the prior for pure auto-regressive models, and
extend it to auto-regressive models with exogenous inputs, and state-space
models. Through both simulations and real-world modelling exercises, we
demonstrate the efficacy of the ARR2 prior in improving sparse and reliable
inference, while showing greater inference quality and predictive performance
than other shrinkage priors. An open-source implementation of the prior is
provided.
arXiv link: http://arxiv.org/abs/2405.19920v3
The Political Resource Curse Redux
authors identified a new channel to investigate whether the windfalls of
resources are unambiguously beneficial to society, both with theory and
empirical evidence. This paper revisits the framework with a new dataset.
Specifically, we implemented a regression discontinuity design and
difference-in-difference specification
arXiv link: http://arxiv.org/abs/2405.19897v1
Modelling and Forecasting Energy Market Volatility Using GARCH and Machine Learning Approach
GARCH-family models and machine learning algorithms in modeling and forecasting
the volatility of major energy commodities: crude oil, gasoline, heating oil,
and natural gas. It uses a comprehensive dataset incorporating financial,
macroeconomic, and environmental variables to assess predictive performance and
discusses volatility persistence and transmission across these commodities.
Aspects of volatility persistence and transmission, traditionally examined by
GARCH-class models, are jointly explored using the SHAP (Shapley Additive
exPlanations) method. The findings reveal that machine learning models
demonstrate superior out-of-sample forecasting performance compared to
traditional GARCH models. Machine learning models tend to underpredict, while
GARCH models tend to overpredict energy market volatility, suggesting a hybrid
use of both types of models. There is volatility transmission from crude oil to
the gasoline and heating oil markets. The volatility transmission in the
natural gas market is less prevalent.
arXiv link: http://arxiv.org/abs/2405.19849v1
Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data
viewing the problem as a conditional stochastic optimization problem. In the
context of least-squares instrumental variable regression, our algorithms
neither require matrix inversions nor mini-batches and provides a fully online
approach for performing instrumental variable regression with streaming data.
When the true model is linear, we derive rates of convergence in expectation,
that are of order $O(\log T/T)$ and $O(1/T^{1-\iota})$ for
any $\iota>0$, respectively under the availability of two-sample and one-sample
oracles, respectively, where $T$ is the number of iterations. Importantly,
under the availability of the two-sample oracle, our procedure avoids
explicitly modeling and estimating the relationship between confounder and the
instrumental variables, demonstrating the benefit of the proposed approach over
recent works based on reformulating the problem as minimax optimization
problems. Numerical experiments are provided to corroborate the theoretical
results.
arXiv link: http://arxiv.org/abs/2405.19463v1
Generalized Neyman Allocation for Locally Minimax Optimal Best-Arm Identification
for fixed-budget best-arm identification (BAI). We propose the Generalized
Neyman Allocation (GNA) algorithm and demonstrate that its worst-case upper
bound on the probability of misidentifying the best arm aligns with the
worst-case lower bound under the small-gap regime, where the gap between the
expected outcomes of the best and suboptimal arms is small. Our lower and upper
bounds are tight, matching exactly including constant terms within the
small-gap regime. The GNA algorithm generalizes the Neyman allocation for
two-armed bandits (Neyman, 1934; Kaufmann et al., 2016) and refines existing
BAI algorithms, such as those proposed by Glynn & Juneja (2004). By proposing
an asymptotically minimax optimal algorithm, we address the longstanding open
issue in BAI (Kaufmann, 2020) and treatment choice (Kasy & Sautmann, 202) by
restricting a class of distributions to the small-gap regimes.
arXiv link: http://arxiv.org/abs/2405.19317v4
Synthetic Potential Outcomes and Causal Mixture Identifiability
represented as a “mixture model” with a single latent class influencing all
of the observed covariates. Heterogeneity can be resolved at multiple levels by
grouping populations according to different notions of similarity. This paper
proposes grouping with respect to the causal response of an intervention or
perturbation on the system. This definition is distinct from previous notions,
such as similar covariate values (e.g. clustering) or similar correlations
between covariates (e.g. Gaussian mixture models). To solve the problem, we
“synthetically sample” from a counterfactual distribution using higher-order
multi-linear moments of the observable data. To understand how these “causal
mixtures” fit in with more classical notions, we develop a hierarchy of
mixture identifiability.
arXiv link: http://arxiv.org/abs/2405.19225v4
Transmission Channel Analysis in Dynamic Models
of dynamic models. We formulate our approach both using graph theory and
potential outcomes, which we show to be equivalent. Our method, labelled
Transmission Channel Analysis (TCA), allows for the decomposition of total
effects captured by impulse response functions into the effects flowing through
transmission channels, thereby providing a quantitative assessment of the
strength of various well-defined channels. We establish that this requires no
additional identification assumptions beyond the identification of the
structural shock whose effects the researcher wants to decompose. Additionally,
we prove that impulse response functions are sufficient statistics for the
computation of transmission effects. We demonstrate the empirical relevance of
TCA for policy evaluation by decomposing the effects of policy shocks arising
from a variety of popular macroeconomic models.
arXiv link: http://arxiv.org/abs/2405.18987v3
Difference-in-Discontinuities: Estimation, Inference and Validity Tests
difference-in-discontinuities design (DiDC). Despite its increasing use in
applied research, there are currently limited studies of its properties. The
method combines elements of regression discontinuity (RDD) and
difference-in-differences (DiD) designs, allowing researchers to eliminate the
effects of potential confounders at the discontinuity. We formalize the
difference-in-discontinuity theory by stating the identification assumptions
and proposing a nonparametric estimator, deriving its asymptotic properties and
examining the scenarios in which the DiDC has desirable bias properties when
compared to the standard RDD. We also provide comprehensive tests for one of
the identification assumption of the DiDC. Monte Carlo simulation studies show
that the estimators have good performance in finite samples. Finally, we
revisit Grembi et al. (2016), that studies the effects of relaxing fiscal rules
on public finance outcomes in Italian municipalities. The results show that the
proposed estimator exhibits substantially smaller confidence intervals for the
estimated effects.
arXiv link: http://arxiv.org/abs/2405.18531v1
Semi-nonparametric models of multidimensional matching: an optimal transport approach
focusing on worker-job matching. We generalize the parametric model proposed by
Lindenlaub (2017), which relies on the assumption of joint normality of
observed characteristics of workers and jobs. In our paper, we allow
unrestricted distributions of characteristics and show identification of the
production technology, and equilibrium wage and matching functions using tools
from optimal transport theory. Given identification, we propose efficient,
consistent, asymptotically normal sieve estimators. We revisit Lindenlaub's
empirical application and show that, between 1990 and 2010, the U.S. economy
experienced much larger technological progress favoring cognitive abilities
than the original findings suggest. Furthermore, our flexible model
specifications provide a significantly better fit for patterns in the evolution
of wage inequality.
arXiv link: http://arxiv.org/abs/2405.18089v1
Dyadic Regression with Sample Selection
analysis. Dyadic data often include many zeros in the main outcomes due to the
underlying network formation process. This not only contaminates popular
estimators used in practice but also complicates the inference due to the
dyadic dependence structure. We extend Kyriazidou (1997)'s approach to dyadic
data and characterize the asymptotic distribution of our proposed estimator.
The convergence rates are $n$ or $n^{2h_{n}}$, depending on the
degeneracy of the H\'{a}jek projection part of the estimator, where $n$ is the
number of nodes and $h_{n}$ is a bandwidth. We propose a bias-corrected
confidence interval and a variance estimator that adapts to the degeneracy. A
Monte Carlo simulation shows the good finite sample performance of our
estimator and highlights the importance of bias correction in both asymptotic
regimes when the fraction of zeros in outcomes varies. We illustrate our
procedure using data from Moretti and Wilson (2017)'s paper on migration.
arXiv link: http://arxiv.org/abs/2405.17787v3
Count Data Models with Heterogeneous Peer Effects under Rational Expectations
expectations. The model accounts for heterogeneity in peer effects through
groups based on observed characteristics. Identification is based on the linear
model condition requiring friends' friends who are not direct friends, which I
show extends to a broad class of nonlinear models. Parameters are estimated
using a nested pseudo-likelihood approach. An empirical application on
students' extracurricular participation reveals that females are more
responsive to peers than males. An easy-to-use R package, CDatanet, is
available for implementing the model.
arXiv link: http://arxiv.org/abs/2405.17290v2
Estimating treatment-effect heterogeneity across sites, in multi-site randomized experiments with few units per site
per site, an Empirical-Bayes estimator can be used to estimate the variance of
the treatment effect across sites. When this estimator indicates that treatment
effects do vary, we propose estimators of the coefficients from regressions of
site-level effects on site-level characteristics that are unobserved but can be
unbiasedly estimated, such as sites' average outcome without treatment, or
site-specific treatment effects on mediator variables. In experiments with
imperfect compliance, we show that the sign of the correlation between local
average treatment effects (LATEs) and site-level characteristics is identified,
and we propose a partly testable assumption under which the variance of LATEs
is identified. We use our results to revisit Behaghel et al (2014), who study
the effect of counseling programs on job seekers' job-finding rate, in 200 job
placement agencies in France. We find considerable treatment-effect
heterogeneity, both for intention to treat and LATE effects, and the treatment
effect is negatively correlated with sites' job-finding rate without treatment.
arXiv link: http://arxiv.org/abs/2405.17254v3
Mixing it up: Inflation at risk
was crucial for guiding monetary policy during the recent high inflation
period. However, existing methodologies often provide limited insights by
focusing solely on specific percentiles of the forecast distribution. In
contrast, this paper introduces a comprehensive framework that examines how
economic indicators impact the entire forecast distribution of macroeconomic
variables, facilitating the decomposition of the overall risk outlook into its
underlying drivers. Additionally, the framework allows for the construction of
risk measures that align with central bank preferences, serving as valuable
summary statistics. Applied to the recent inflation surge, the framework
reveals that U.S. inflation risk was primarily influenced by the recovery of
the U.S. business cycle and surging commodity prices, partially mitigated by
adjustments in monetary policy and credit spreads.
arXiv link: http://arxiv.org/abs/2405.17237v2
Quantifying the Reliance of Black-Box Decision-Makers on Variables of Interest
decision-makers rely on variables of interest. The framework adapts a
permutation-based measure of variable importance from the explainable machine
learning literature. With an emphasis on applicability, I present some of the
framework's theoretical and computational properties, explain how reliance
computations have policy implications, and work through an illustrative
example. In the empirical application to interruptions by Supreme Court
Justices during oral argument, I find that the effect of gender is more muted
compared to the existing literature's estimate; I then use this paper's
framework to compare Justices' reliance on gender and alignment to their
reliance on experience, which are incomparable using regression coefficients.
arXiv link: http://arxiv.org/abs/2405.17225v1
Statistical Mechanism Design: Robust Pricing, Estimation, and Inference
consumer uncertainty. We propose a novel data-based approach for firms facing
unknown consumer type distributions. Unlike existing methods, we assume firms
only observe a finite sample of consumers' types. We introduce
empirically optimal mechanisms, a simple and intuitive class of
sample-based mechanisms with strong finite-sample revenue guarantees.
Furthermore, we leverage our results to develop a toolkit for statistical
inference on profits. Our approach allows to reliably estimate the profits
associated for any particular mechanism, to construct confidence intervals, and
to, more generally, conduct valid hypothesis testing.
arXiv link: http://arxiv.org/abs/2405.17178v1
Cross-border cannibalization: Spillover effects of wind and solar energy on interconnected European electricity markets
with increasing market shares, as is now evident across European electricity
markets. At the same time, these markets have become more interconnected. In
this paper, we empirically study the multiple cross-border effects on the value
of renewable energy: on one hand, interconnection is a flexibility resource
that allows to export energy when it is locally abundant, benefitting
renewables. On the other hand, wind and solar radiation are correlated across
space, so neighboring supply adds to the local one to depress domestic prices.
We estimate both effects, using spatial panel regression on electricity market
data from 2015 to 2023 from 30 European bidding zones. We find that domestic
wind and solar value is not only depressed by domestic, but also by neighboring
renewables expansion. The better interconnected a market is, the smaller the
effect of domestic but the larger the effect of neighboring renewables. While
wind value is stabilized by interconnection, solar value is not. If wind market
share increases both at home and in neighboring markets by one percentage
point, the value factor of wind energy is reduced by just above 1 percentage
points. For solar, this number is almost 4 percentage points.
arXiv link: http://arxiv.org/abs/2405.17166v1
Estimating Dyadic Treatment Effects with Unknown Confounders
effects with dyadic data. Under the assumption that the treatments follow an
exchangeable distribution, our approach allows for the presence of any
unobserved confounding factors that potentially cause endogeneity of treatment
choice without requiring additional information other than the treatments and
outcomes. Building on the literature of graphon estimation in network data
analysis, we propose a neighborhood kernel smoothing method for estimating
dyadic average treatment effects. We also develop a permutation inference
method for testing the sharp null hypothesis. Under certain regularity
conditions, we derive the rate of convergence of the proposed estimator and
demonstrate the size control property of our test. We apply our method to
international trade data to assess the impact of free trade agreements on
bilateral trade flows.
arXiv link: http://arxiv.org/abs/2405.16547v1
Two-way fixed effects instrumental variable regressions in staggered DID-IV designs
regressions, leveraging variation in the timing of policy adoption across units
as an instrument for treatment. This paper studies the properties of the TWFEIV
estimator in staggered instrumented difference-in-differences (DID-IV) designs.
We show that in settings with the staggered adoption of the instrument across
units, the TWFEIV estimator can be decomposed into a weighted average of all
possible two-group/two-period Wald-DID estimators. Under staggered DID-IV
designs, a causal interpretation of the TWFEIV estimand hinges on the stable
effects of the instrument on the treatment and the outcome over time. We
illustrate the use of our decomposition theorem for the TWFEIV estimator
through an empirical application.
arXiv link: http://arxiv.org/abs/2405.16467v1
Dynamic Latent-Factor Model with High-Dimensional Asset Characteristics
a dynamic latent-factor model with high-dimensional asset characteristics, that
is, the number of characteristics is on the order of the sample size. Utilizing
the Double Selection Lasso estimator, our procedure employs regularization to
eliminate characteristics with low signal-to-noise ratios yet maintains
asymptotically valid inference for asset pricing tests. The crypto asset class
is well-suited for applying this model given the limited number of tradable
assets and years of data as well as the rich set of available asset
characteristics. The empirical results present out-of-sample pricing abilities
and risk-adjusted returns for our novel estimator as compared to benchmark
methods. We provide an inference procedure for measuring the risk premium of an
observable nontradable factor, and employ this to find that the
inflation-mimicking portfolio in the crypto asset class has positive risk
compensation.
arXiv link: http://arxiv.org/abs/2405.15721v1
Empirical Crypto Asset Pricing
and study the drivers of crypto asset returns through the lens of univariate
factors. We argue crypto assets are a new, attractive, and independent asset
class. In a novel and rigorously built panel of crypto assets, we examine
pricing ability of sixty three asset characteristics to find rich signal
content across the characteristics and at several future horizons. Only
univariate financial factors (i.e., functions of previous returns) were
associated with statistically significant long-short strategies, suggestive of
speculatively driven returns as opposed to more fundamental pricing factors.
arXiv link: http://arxiv.org/abs/2405.15716v1
Generating density nowcasts for U.S. GDP growth with deep learning: Bayes by Backprop and Monte Carlo dropout
(ANNs) can outperform the dynamic factor model (DFM) in terms of the accuracy
of GDP nowcasts. Compared to the DFM, the performance advantage of these highly
flexible, nonlinear estimators is particularly evident in periods of recessions
and structural breaks. From the perspective of policy-makers, however, nowcasts
are the most useful when they are conveyed with uncertainty attached to them.
While the DFM and other classical time series approaches analytically derive
the predictive (conditional) distribution for GDP growth, ANNs can only produce
point nowcasts based on their default training procedure (backpropagation). To
fill this gap, first in the literature, we adapt two different deep learning
algorithms that enable ANNs to generate density nowcasts for U.S. GDP growth:
Bayes by Backprop and Monte Carlo dropout. The accuracy of point nowcasts,
defined as the mean of the empirical predictive distribution, is evaluated
relative to a naive constant growth model for GDP and a benchmark DFM
specification. Using a 1D CNN as the underlying ANN architecture, both
algorithms outperform those benchmarks during the evaluation period (2012:Q1 --
2022:Q4). Furthermore, both algorithms are able to dynamically adjust the
location (mean), scale (variance), and shape (skew) of the empirical predictive
distribution. The results indicate that both Bayes by Backprop and Monte Carlo
dropout can effectively augment the scope and functionality of ANNs, rendering
them a fully compatible and competitive alternative for classical time series
approaches.
arXiv link: http://arxiv.org/abs/2405.15579v1
Modularity, Higher-Order Recombination, and New Venture Success
natural, and technological systems robust to exploratory failure. We consider
this in the context of emerging business organizations, which can be understood
as complex systems. We build a theory of organizational emergence as
higher-order, modular recombination wherein successful start-ups assemble novel
combinations of successful modular components, rather than engage in the
lower-order combination of disparate, singular components. Lower-order
combinations are critical for long-term socio-economic transformation, but
manifest diffuse benefits requiring support as public goods. Higher-order
combinations facilitate rapid experimentation and attract private funding. We
evaluate this with U.S. venture-funded start-ups over 45 years using company
descriptions. We build a dynamic semantic space with word embedding models
constructed from evolving business discourse, which allow us to measure the
modularity of and distance between new venture components. Using event history
models, we demonstrate how ventures more likely achieve successful IPOs and
high-priced acquisitions when they combine diverse modules of clustered
components. We demonstrate how higher-order combination enables venture success
by accelerating firm development and diversifying investment, and we reflect on
its implications for social innovation.
arXiv link: http://arxiv.org/abs/2405.15042v1
On the Identifying Power of Monotonicity for Average Treatment Effects
Pearl (1993, 1997) establish that the monotonicity condition of Imbens and
Angrist (1994) has no identifying power beyond instrument exogeneity for
average potential outcomes and average treatment effects in the sense that
adding it to instrument exogeneity does not decrease the identified sets for
those parameters whenever those restrictions are consistent with the
distribution of the observable data. This paper shows that this phenomenon
holds in a broader setting with a multi-valued outcome, treatment, and
instrument, under an extension of the monotonicity condition that we refer to
as generalized monotonicity. We further show that this phenomenon holds for any
restriction on treatment response that is stronger than generalized
monotonicity provided that these stronger restrictions do not restrict
potential outcomes. Importantly, many models of potential treatments previously
considered in the literature imply generalized monotonicity, including the
types of monotonicity restrictions considered by Kline and Walters (2016),
Kirkeboen et al. (2016), and Heckman and Pinto (2018), and the restriction that
treatment selection is determined by particular classes of additive random
utility models. We show through a series of examples that restrictions on
potential treatments can provide identifying power beyond instrument exogeneity
for average potential outcomes and average treatment effects when the
restrictions imply that the generalized monotonicity condition is violated. In
this way, our results shed light on the types of restrictions required for help
in identifying average potential outcomes and average treatment effects.
arXiv link: http://arxiv.org/abs/2405.14104v3
Exogenous Consideration and Extended Random Utility
considered alternatives. I relate a consideration set additive random utility
model to classic discrete choice and the extended additive random utility
model, in which utility can be $-\infty$ for infeasible alternatives. When
observable utility shifters are bounded, all three models are observationally
equivalent. Moreover, they have the same counterfactual bounds and welfare
formulas for changes in utility shifters like price. For attention
interventions, welfare cannot change in the full consideration model but is
completely unbounded in the limited consideration model. The identified set for
consideration set probabilities has a minimal width for any bounded support of
shifters, but with unbounded support it is a point: identification "towards"
infinity does not resemble identification "at" infinity.
arXiv link: http://arxiv.org/abs/2405.13945v1
Some models are useful, but for how long?: A decision theoretic approach to choosing when to refit large-scale prediction models
or machine learning (ML) are increasingly common across a variety of industries
and scientific domains. Despite their effectiveness, training AI and ML tools
at scale can cost tens or hundreds of thousands of dollars (or more); and even
after a model is trained, substantial resources must be invested to keep models
up-to-date. This paper presents a decision-theoretic framework for deciding
when to refit an AI/ML model when the goal is to perform unbiased statistical
inference using partially AI/ML-generated data. Drawing on portfolio
optimization theory, we treat the decision of {\it recalibrating} a model or
statistical inference versus {\it refitting} the model as a choice between
“investing” in one of two “assets.” One asset, recalibrating the model
based on another model, is quick and relatively inexpensive but bears
uncertainty from sampling and may not be robust to model drift. The other
asset, {\it refitting} the model, is costly but removes the drift concern
(though not statistical uncertainty from sampling). We present a framework for
balancing these two potential investments while preserving statistical
validity. We evaluate the framework using simulation and data on electricity
usage and predicting flu trends.
arXiv link: http://arxiv.org/abs/2405.13926v2
Integrating behavioral experimental findings into dynamical models to inform social change interventions
involves stimulating the large-scale adoption of new products or behaviors.
Research traditions that focus on individual decision making suggest that
achieving this objective requires better identifying the drivers of individual
adoption choices. On the other hand, computational approaches rooted in
complexity science focus on maximizing the propagation of a given product or
behavior throughout social networks of interconnected adopters. The integration
of these two perspectives -- although advocated by several research communities
-- has remained elusive so far. Here we show how achieving this integration
could inform seeding policies to facilitate the large-scale adoption of a given
behavior or product. Drawing on complex contagion and discrete choice theories,
we propose a method to estimate individual-level thresholds to adoption, and
validate its predictive power in two choice experiments. By integrating the
estimated thresholds into computational simulations, we show that
state-of-the-art seeding methods for social influence maximization might be
suboptimal if they neglect individual-level behavioral drivers, which can be
corrected through the proposed experimental method.
arXiv link: http://arxiv.org/abs/2405.13224v1
Conditional Choice Probability Estimation of Dynamic Discrete Choice Models with 2-period Finite Dependence
introducing a novel characterization of finite dependence within dynamic
discrete choice models, demonstrating that numerous models display 2-period
finite dependence. We recast finite dependence as a problem of sequentially
searching for weights and introduce a computationally efficient method for
determining these weights by utilizing the Kronecker product structure embedded
in state transitions. With the estimated weights, we develop a computationally
attractive Conditional Choice Probability estimator with 2-period finite
dependence. The computational efficacy of our proposed estimator is
demonstrated through Monte Carlo simulations.
arXiv link: http://arxiv.org/abs/2405.12467v1
Estimating the Impact of Social Distance Policy in Mitigating COVID-19 Spread with Factor-Based Imputation Approach
transmission of the COVID-19 spread. We build a model that measures the
relative frequency and geographic distribution of the virus growth rate and
provides hypothetical infection distribution in the states that enacted the
social distancing policies, where we control time-varying, observed and
unobserved, state-level heterogeneities. Using panel data on infection and
deaths in all US states from February 20 to April 20, 2020, we find that
stay-at-home orders and other types of social distancing policies significantly
reduced the growth rate of infection and deaths. We show that the effects are
time-varying and range from the weakest at the beginning of policy intervention
to the strongest by the end of our sample period. We also found that social
distancing policies were more effective in states with higher income, better
education, more white people, more democratic voters, and higher CNN
viewership.
arXiv link: http://arxiv.org/abs/2405.12180v1
Instrumented Difference-in-Differences with Heterogeneous Treatment Effects
instrument for treatment. This paper formalizes the underlying identification
strategy as an instrumented difference-in-differences (DID-IV). In this design,
a Wald-DID estimand, which scales the DID estimand of the outcome by the DID
estimand of the treatment, captures the local average treatment effect on the
treated (LATET). We extend the canonical DID-IV design to multiple period
settings with the staggered adoption of the instrument across units. Moreover,
we propose a credible estimation method in this design that is robust to
treatment effect heterogeneity. We illustrate the empirical relevance of our
findings, estimating returns to schooling in the United Kingdom. In this
application, the two-way fixed effects instrumental variable regression, the
conventional approach to implement DID-IV designs, yields a negative estimate.
By contrast, our estimation method indicates a substantial gain from schooling.
arXiv link: http://arxiv.org/abs/2405.12083v5
Comparing predictive ability in presence of instability over a very short time
affects only a short period of time. We demonstrate that global tests do not
perform well in this case, as they were not designed to capture very
short-lived instabilities, and their power vanishes altogether when the
magnitude of the shock is very large. We then discuss and propose approaches
that are more suitable to detect such situations, such as nonparametric methods
(S test or MAX procedure). We illustrate these results in different Monte Carlo
exercises and in evaluating the nowcast of the quarterly US nominal GDP from
the Survey of Professional Forecasters (SPF) against a naive benchmark of no
growth, over the period that includes the GDP instability brought by the
Covid-19 crisis. We recommend that the forecaster should not pool the sample,
but exclude the short periods of high local instability from the evaluation
exercise.
arXiv link: http://arxiv.org/abs/2405.11954v1
Revisiting Day-ahead Electricity Price: Simple Model Save Millions
welfare, yet current methods often fall short in forecast accuracy. We observe
that commonly used time series models struggle to utilize the prior correlation
between price and demand-supply, which, we found, can contribute a lot to a
reliable electricity price forecaster. Leveraging this prior, we propose a
simple piecewise linear model that significantly enhances forecast accuracy by
directly deriving prices from readily forecastable demand-supply values.
Experiments in the day-ahead electricity markets of Shanxi province and ISO New
England reveal that such forecasts could potentially save residents millions of
dollars a year compared to existing methods. Our findings underscore the value
of suitably integrating time series modeling with economic prior for enhanced
electricity price forecasting accuracy.
arXiv link: http://arxiv.org/abs/2405.14893v2
Testing Sign Congruence Between Two Parameters
sign, assuming that (asymptotically) normal estimators
$(\mu_1,\mu_2)$ are available. Examples of this problem include the
analysis of heterogeneous treatment effects, causal interpretation of
reduced-form estimands, meta-studies, and mediation analysis. A number of tests
were recently proposed. We recommend a test that is simple and rejects more
often than many of these recent proposals. Like all other tests in the
literature, it is conservative if the truth is near $(0,0)$ and therefore also
biased. To clarify whether these features are avoidable, we also provide a test
that is unbiased and has exact size control on the boundary of the null
hypothesis, but which has counterintuitive properties and hence we do not
recommend. We use the test to improve p-values in Kowalski (2022) from
information contained in that paper's main text and to establish statistical
significance of some key estimates in Dippel et al. (2021).
arXiv link: http://arxiv.org/abs/2405.11759v4
Transfer Learning for Spatial Autoregressive Models with Application to U.S. Presidential Election Prediction
presidential election analysis, especially for swing states. The state-level
analysis also faces significant challenges of limited spatial data
availability. To address the challenges of spatial dependence and small sample
sizes in predicting U.S. presidential election results using spatially
dependent data, we propose a novel transfer learning framework within the SAR
model, called as tranSAR. Classical SAR model estimation often loses accuracy
with small target data samples. Our framework enhances estimation and
prediction by leveraging information from similar source data. We introduce a
two-stage algorithm, consisting of a transferring stage and a debiasing stage,
to estimate parameters and establish theoretical convergence rates for the
estimators. Additionally, if the informative source data are unknown, we
propose a transferable source detection algorithm using spatial residual
bootstrap to maintain spatial dependence and derive its detection consistency.
Simulation studies show our algorithm substantially improves the classical
two-stage least squares estimator. We demonstrate our method's effectiveness in
predicting outcomes in U.S. presidential swing states, where it outperforms
traditional methods. In addition, our tranSAR model predicts that the
Democratic party will win the 2024 U.S. presidential election.
arXiv link: http://arxiv.org/abs/2405.15600v2
The Logic of Counterfactuals and the Epistemology of Causal Inference
inference based on the Rubin causal model (Rubin 1974), which merits broader
attention in philosophy. This model, in fact, presupposes a logical principle
of counterfactuals, Conditional Excluded Middle (CEM), the locus of a pivotal
debate between Stalnaker (1968) and Lewis (1973) on the semantics of
counterfactuals. Proponents of CEM should recognize that this connection points
to a new argument for CEM -- a Quine-Putnam indispensability argument grounded
in the Nobel-winning applications of the Rubin model in health and social
sciences. To advance the dialectic, I challenge this argument with an updated
Rubin causal model that retains its successes while dispensing with CEM. This
novel approach combines the strengths of the Rubin causal model and a causal
model familiar in philosophy, the causal Bayes net. The takeaway: deductive
logic and inductive inference, often studied in isolation, are deeply
interconnected.
arXiv link: http://arxiv.org/abs/2405.11284v3
Macroeconomic Factors, Industrial Indexes and Bank Spread in Brazil
and industrial indexes influenced the total Brazilian banking spread between
March 2011 and March 2015. This paper considers subclassification of industrial
activities in Brazil. Monthly time series data were used in multivariate linear
regression models using Eviews (7.0). Eighteen variables were considered as
candidates to be determinants. Variables which positively influenced bank
spread are; Default, IPIs (Industrial Production Indexes) for capital goods,
intermediate goods, du rable consumer goods, semi-durable and non-durable
goods, the Selic, GDP, unemployment rate and EMBI +. Variables which influence
negatively are; Consumer and general consumer goods IPIs, IPCA, the balance of
the loan portfolio and the retail sales index. A p-value of 05% was considered.
The main conclusion of this work is that the progress of industry, job creation
and consumption can reduce bank spread. Keywords: Credit. Bank spread.
Macroeconomics. Industrial Production Indexes. Finance.
arXiv link: http://arxiv.org/abs/2405.10655v1
Overcoming Medical Overuse with AI Assistance: An Experimental Investigation
mitigating medical overtreatment, a significant issue characterized by
unnecessary interventions that inflate healthcare costs and pose risks to
patients. We conducted a lab-in-the-field experiment at a medical school,
utilizing a novel medical prescription task, manipulating monetary incentives
and the availability of AI assistance among medical students using a
three-by-two factorial design. We tested three incentive schemes: Flat
(constant pay regardless of treatment quantity), Progressive (pay increases
with the number of treatments), and Regressive (penalties for overtreatment) to
assess their influence on the adoption and effectiveness of AI assistance. Our
findings demonstrate that AI significantly reduced overtreatment rates by up to
62% in the Regressive incentive conditions where (prospective) physician and
patient interests were most aligned. Diagnostic accuracy improved by 17% to
37%, depending on the incentive scheme. Adoption of AI advice was high, with
approximately half of the participants modifying their decisions based on AI
input across all settings. For policy implications, we quantified the monetary
(57%) and non-monetary (43%) incentives of overtreatment and highlighted AI's
potential to mitigate non-monetary incentives and enhance social welfare. Our
results provide valuable insights for healthcare administrators considering AI
integration into healthcare systems.
arXiv link: http://arxiv.org/abs/2405.10539v1
Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail Promotions
adoption of AI agents in retail. This paper presents comprehensive simulations
of customer shopping behaviors for the purpose of benchmarking reinforcement
learning (RL) agents that optimize coupon targeting. The difficulty of this
learning problem is largely driven by the sparsity of customer purchase events.
We trained agents using offline batch data comprising summarized customer
purchase histories to help mitigate this effect. Our experiments revealed that
contextual bandit and deep RL methods that are less prone to over-fitting the
sparse reward distributions significantly outperform static policies. This
study offers a practical framework for simulating AI agents that optimize the
entire retail customer journey. It aims to inspire the further development of
simulation tools for retail AI systems.
arXiv link: http://arxiv.org/abs/2405.10469v1
Optimal Text-Based Time-Series Indices
optimal way--typically, indices that maximize the contemporaneous relation or
the predictive performance with respect to a target variable, such as
inflation. We illustrate our methodology with a corpus of news articles from
the Wall Street Journal by optimizing text-based indices focusing on tracking
the VIX index and inflation expectations. Our results highlight the superior
performance of our approach compared to existing indices.
arXiv link: http://arxiv.org/abs/2405.10449v1
Comprehensive Causal Machine Learning
granularity provides substantial value to decision makers. Comprehensive
machine learning approaches to causal effect estimation allow to use a single
causal machine learning approach for estimation and inference of causal mean
effects for all levels of granularity. Focusing on selection-on-observables,
this paper compares three such approaches, the modified causal forest (mcf),
the generalized random forest (grf), and double machine learning (dml). It also
compares the theoretical properties of the approaches and provides proven
theoretical guarantees for the mcf. The findings indicate that dml-based
methods excel for average treatment effects at the population level (ATE) and
group level (GATE) with few groups, when selection into treatment is not too
strong. However, for finer causal heterogeneity, explicitly outcome-centred
forest-based approaches are superior. The mcf has three additional benefits:
(i) It is the most robust estimator in cases when dml-based approaches
underperform because of substantial selection into treatment; (ii) it is the
best estimator for GATEs when the number of groups gets larger; and (iii), it
is the only estimator that is internally consistent, in the sense that
low-dimensional causal ATEs and GATEs are obtained as aggregates of
finer-grained causal parameters.
arXiv link: http://arxiv.org/abs/2405.10198v2
Double Robustness of Local Projections and Some Unpleasant VARithmetic
autoregression (VAR) model. The conventional local projection (LP) confidence
interval has correct coverage even when the misspecification is so large that
it can be detected with probability approaching 1. This result follows from a
"double robustness" property analogous to that of popular partially linear
regression estimators. In contrast, the conventional VAR confidence interval
with short-to-moderate lag length can severely undercover, even for
misspecification that is small, economically plausible, and difficult to detect
statistically. There is no free lunch: the VAR confidence interval has robust
coverage only if the lag length is so large that the interval is as wide as the
LP interval.
arXiv link: http://arxiv.org/abs/2405.09509v2
Identifying Heterogeneous Decision Rules From Choices When Menus Are Unobserved
distributed across the population, we describe what can be inferred robustly
about the distribution of preferences (or more general decision rules). We
strengthen and generalize existing results on such identification and provide
an alternative analytical approach to study the problem. We show further that
our model and results are applicable, after suitable reinterpretation, to other
contexts. One application is to the robust identification of the distribution
of updating rules given only the population distribution of beliefs and limited
information about heterogeneous information sources.
arXiv link: http://arxiv.org/abs/2405.09500v1
Optimizing Sales Forecasts through Automated Integration of Market Indicators
historical demand, this work investigates the potential of data-driven
techniques to automatically select and integrate market indicators for
improving customer demand predictions. By adopting an exploratory methodology,
we integrate macroeconomic time series, such as national GDP growth, from the
Eurostat database into Neural Prophet and SARIMAX
forecasting models. Suitable time series are automatically identified through
different state-of-the-art feature selection methods and applied to sales data
from our industrial partner. It could be shown that forecasts can be
significantly enhanced by incorporating external information. Notably, the
potential of feature selection methods stands out, especially due to their
capability for automation without expert knowledge and manual selection effort.
In particular, the Forward Feature Selection technique consistently yielded
superior forecasting accuracy for both SARIMAX and Neural Prophet across
different company sales datasets. In the comparative analysis of the errors of
the selected forecasting models, namely Neural Prophet and SARIMAX, it is
observed that neither model demonstrates a significant superiority over the
other.
arXiv link: http://arxiv.org/abs/2406.07564v1
Bounds on the Distribution of a Sum of Two Random Variables: Revisiting a problem of Kolmogorov with application to Individual Treatment Effects
marginal distributions $F$ and $G$ for random variables $X,Y$ respectively,
characterize the set of compatible distribution functions for the sum $Z=X+Y$.
Bounds on the distribution function for $Z$ were first given by Markarov (1982)
and R\"uschendorf (1982) independently. Frank et al. (1987) provided a solution
to the same problem using copula theory. However, though these authors obtain
the same bounds, they make different assertions concerning their sharpness. In
addition, their solutions leave some open problems in the case when the given
marginal distribution functions are discontinuous. These issues have led to
some confusion and erroneous statements in subsequent literature, which we
correct.
Kolmogorov's problem is closely related to inferring possible distributions
for individual treatment effects $Y_1 - Y_0$ given the marginal distributions
of $Y_1$ and $Y_0$; the latter being identified from a randomized experiment.
We use our new insights to sharpen and correct the results due to Fan and Park
(2010) concerning individual treatment effects, and to fill some other logical
gaps.
arXiv link: http://arxiv.org/abs/2405.08806v2
Variational Bayes and non-Bayesian Updating
model of non-Bayesian updating.
arXiv link: http://arxiv.org/abs/2405.08796v2
Latent group structure in linear panel data models with endogenous regressors
endogenous regressors and a latent group structure in the coefficients. We
consider instrumental variables estimation of the group-specific coefficient
vector. We show that direct application of the Kmeans algorithm to the
generalized method of moments objective function does not yield unique
estimates. We newly develop and theoretically justify two-stage estimation
methods that apply the Kmeans algorithm to a regression of the dependent
variable on predicted values of the endogenous regressors. The results of Monte
Carlo simulations demonstrate that two-stage estimation with the first stage
modeled using a latent group structure achieves good classification accuracy,
even if the true first-stage regression is fully heterogeneous. We apply our
estimation methods to revisiting the relationship between income and democracy.
arXiv link: http://arxiv.org/abs/2405.08687v1
Predicting NVIDIA's Next-Day Stock Price: A Comparative Analysis of LSTM, MLP, ARIMA, and ARIMA-GARCH Models
markets, bearing significant implications for investors, traders, and financial
institutions. Amid the ongoing AI revolution, NVIDIA has emerged as a key
player driving innovation across various sectors. Given its prominence, we
chose NVIDIA as the subject of our study.
arXiv link: http://arxiv.org/abs/2405.08284v1
Random Utility Models with Skewed Random Components: the Smallest versus Largest Extreme Value Distribution
a random utility component following a largest extreme value Type I (LEVI)
distribution. What if, instead, the random component follows its mirror image
-- the smallest extreme value Type I (SEVI) distribution? Differences between
these specifications, closely tied to the random component's skewness, can be
quite profound. For the same preference parameters, the two RUMs, equivalent
with only two choice alternatives, diverge progressively as the number of
alternatives increases, resulting in substantially different estimates and
predictions for key measures, such as elasticities and market shares.
The LEVI model imposes the well-known independence-of-irrelevant-alternatives
property, while SEVI does not. Instead, the SEVI choice probability for a
particular option involves enumerating all subsets that contain this option.
The SEVI model, though more complex to estimate, is shown to have
computationally tractable closed-form choice probabilities. Much of the paper
delves into explicating the properties of the SEVI model and exploring
implications of the random component's skewness.
Conceptually, the difference between the LEVI and SEVI models centers on
whether information, known only to the agent, is more likely to increase or
decrease the systematic utility parameterized using observed attributes. LEVI
does the former; SEVI the latter. An immediate implication is that if choice is
characterized by SEVI random components, then the observed choice is more
likely to correspond to the systematic-utility-maximizing choice than if
characterized by LEVI. Examining standard empirical examples from different
applied areas, we find that the SEVI model outperforms the LEVI model,
suggesting the relevance of its inclusion in applied researchers' toolkits.
arXiv link: http://arxiv.org/abs/2405.08222v2
Simultaneous Inference for Local Structural Parameters with Random Forests
moment equations. The intervals are built around a class of nonparametric
regression algorithms based on subsampled kernels. This class encompasses
various forms of subsampled random forest regression, including Generalized
Random Forests (Athey et al., 2019). Although simultaneous validity is often
desirable in practice -- for example, for fine-grained characterization of
treatment effect heterogeneity -- only confidence intervals that confer
pointwise guarantees were previously available. Our work closes this gap. As a
by-product, we obtain several new order-explicit results on the concentration
and normal approximation of high-dimensional U-statistics.
arXiv link: http://arxiv.org/abs/2405.07860v3
Robust Estimation and Inference for High-Dimensional Panel Data Models
conducting robust estimation and inference about the parameters of interest
involved in a high-dimensional panel data framework. Specifically, (1) we allow
for non-Gaussian, serially and cross-sectionally correlated and heteroskedastic
error processes, (2) we develop an estimation method for high-dimensional
long-run covariance matrix using a thresholded estimator, (3) we also allow for
the number of regressors to grow faster than the sample size.
Methodologically and technically, we develop two Nagaev--types of
concentration inequalities: one for a partial sum and the other for a quadratic
form, subject to a set of easily verifiable conditions. Leveraging these two
inequalities, we derive a non-asymptotic bound for the LASSO estimator, achieve
asymptotic normality via the node-wise LASSO regression, and establish a sharp
convergence rate for the thresholded heteroskedasticity and autocorrelation
consistent (HAC) estimator.
We demonstrate the practical relevance of these theoretical results by
investigating a high-dimensional panel data model with interactive effects.
Moreover, we conduct extensive numerical studies using simulated and real data
examples.
arXiv link: http://arxiv.org/abs/2405.07420v3
Kernel Three Pass Regression Filter
When these predictors share common underlying dynamics, an approximate latent
factor model provides a powerful characterization of their co-movements
Bai(2003). These latent factors succinctly summarize the data and can also be
used for prediction, alleviating the curse of dimensionality in
high-dimensional prediction exercises, see Stock & Watson (2002a). However,
forecasting using these latent factors suffers from two potential drawbacks.
First, not all pervasive factors among the set of predictors may be relevant,
and using all of them can lead to inefficient forecasts. The second shortcoming
is the assumption of linear dependence of predictors on the underlying factors.
The first issue can be addressed by using some form of supervision, which leads
to the omission of irrelevant information. One example is the three-pass
regression filter proposed by Kelly & Pruitt (2015). We extend their framework
to cases where the form of dependence might be nonlinear by developing a new
estimator, which we refer to as the Kernel Three-Pass Regression Filter
(K3PRF). This alleviates the aforementioned second shortcoming. The estimator
is computationally efficient and performs well empirically. The short-term
performance matches or exceeds that of established models, while the long-term
performance shows significant improvement.
arXiv link: http://arxiv.org/abs/2405.07292v3
On the Ollivier-Ricci curvature as fragility indicator of the stock markets
the Ollivier-Ricci curvature has been proposed. We study analytical and
empirical properties of such indicator, test its elasticity with respect to
different parameters and provide heuristics for the parameters involved. We
show when and how the indicator accurately describes a financial crisis. We
also propose an alternate method for calculating the indicator using a specific
sub-graph with special curvature properties.
arXiv link: http://arxiv.org/abs/2405.07134v1
Identifying Peer Effects in Networks with Unobserved Effort and Isolated Students
proxy variables when actual effort is unobserved. For instance, in education,
academic effort is often proxied by GPA. We propose an alternative approach
that circumvents this approximation. Our framework distinguishes unobserved
shocks to GPA that do not affect effort from preference shocks that do affect
effort levels. We show that peer effects estimates obtained using our approach
can differ significantly from classical estimates (where effort is
approximated) if the network includes isolated students. Applying our approach
to data on high school students in the United States, we find that peer effect
estimates relying on GPA as a proxy for effort are 40% lower than those
obtained using our approach.
arXiv link: http://arxiv.org/abs/2405.06850v1
Generalization Issues in Conjoint Experiment: Attention and Salience
real-world scenarios? This question lies at the heart of social science
studies. External validity primarily assesses whether experimental effects
persist across different settings, implicitly presuming the consistency of
experimental effects with their real-life counterparts. However, we argue that
this presumed consistency may not always hold, especially in experiments
involving multi-dimensional decision processes, such as conjoint experiments.
We introduce a formal model to elucidate how attention and salience effects
lead to three types of inconsistencies between experimental findings and
real-world phenomena: amplified effect magnitude, effect sign reversal, and
effect importance reversal. We derive testable hypotheses from each theoretical
outcome and test these hypotheses using data from various existing conjoint
experiments and our own experiments. Drawing on our theoretical framework, we
propose several recommendations for experimental design aimed at enhancing the
generalizability of survey experiment findings.
arXiv link: http://arxiv.org/abs/2405.06779v3
A Sharp Test for the Judge Leniency Design
leniency design. We characterize a set of sharp testable implications, which
exploit all the relevant information in the observed data distribution to
detect violations of the judge leniency design assumptions. The proposed sharp
test is asymptotically valid and consistent and will not make discordant
recommendations. When the judge's leniency design assumptions are rejected, we
propose a way to salvage the model using partial monotonicity and exclusion
assumptions, under which a variant of the Local Instrumental Variable (LIV)
estimand can recover the Marginal Treatment Effect. Simulation studies show our
test outperforms existing non-sharp tests by significant margins. We apply our
test to assess the validity of the judge leniency design using data from
Stevenson (2018), and it rejects the validity for three crime categories:
robbery, drug selling, and drug possession.
arXiv link: http://arxiv.org/abs/2405.06156v1
Advancing Distribution Decomposition Methods Beyond Common Supports: Applications to Racial Wealth Disparities
distribution of a variable of interest between two groups into a portion
explained by covariates and a residual portion. The method that I propose
relaxes the overlapping supports assumption, allowing the groups being compared
to not necessarily share exactly the same covariate support. I illustrate my
method revisiting the black-white wealth gap in the U.S. as a function of labor
income and other variables. Traditionally used decomposition methods would trim
(or assign zero weight to) observations that lie outside the common covariate
support region. On the other hand, by allowing all observations to contribute
to the existing wealth gap, I find that otherwise trimmed observations
contribute from 3% to 19% to the overall wealth gap, at different portions of
the wealth distribution.
arXiv link: http://arxiv.org/abs/2405.05759v1
Sequential Validation of Treatment Heterogeneity
develop tests for the presence of treatment heterogeneity. The resulting
sequential validation approach can be instantiated using various validation
metrics, such as BLPs, GATES, QINI curves, etc., and provides an alternative to
cross-validation-like cross-fold application of these metrics.
arXiv link: http://arxiv.org/abs/2405.05534v1
Causal Duration Analysis with Diff-in-Diff
outcomes are indicators that an individual has reached an absorbing state. For
example, they may indicate whether an individual has exited a period of
unemployment, passed an exam, left a marriage, or had their parole revoked. The
parallel trends assumption that underpins difference-in-differences generally
fails in such settings. We suggest identifying conditions that are analogous to
those of difference-in-differences but apply to hazard rates rather than mean
outcomes. These alternative assumptions motivate estimators that retain the
simplicity and transparency of standard diff-in-diff, and we suggest analogous
specification tests. Our approach can be adapted to general linear restrictions
between the hazard rates of different groups, motivating duration analogues of
the triple differences and synthetic control methods. We apply our procedures
to examine the impact of a policy that increased the generosity of unemployment
benefits, using a cross-cohort comparison.
arXiv link: http://arxiv.org/abs/2405.05220v1
SVARs with breaks: Identification and inference
characterized by structural breaks (SVAR-WB). Together with standard
restrictions on the parameters and on functions of them, we also consider
constraints across the different regimes. Such constraints can be either (a) in
the form of stability restrictions, indicating that not all the parameters or
impulse responses are subject to structural changes, or (b) in terms of
inequalities regarding particular characteristics of the SVAR-WB across the
regimes. We show that all these kinds of restrictions provide benefits in terms
of identification. We derive conditions for point and set identification of the
structural parameters of the SVAR-WB, mixing equality, sign, rank and stability
restrictions, as well as constraints on forecast error variances (FEVs). As
point identification, when achieved, holds locally but not globally, there will
be a set of isolated structural parameters that are observationally equivalent
in the parametric space. In this respect, both common frequentist and Bayesian
approaches produce unreliable inference as the former focuses on just one of
these observationally equivalent points, while for the latter on a
non-vanishing sensitivity to the prior. To overcome these issues, we propose
alternative approaches for estimation and inference that account for all
admissible observationally equivalent structural parameters. Moreover, we
develop a pure Bayesian and a robust Bayesian approach for doing inference in
set-identified SVAR-WBs. Both the theory of identification and inference are
illustrated through a set of examples and an empirical application on the
transmission of US monetary policy over the great inflation and great
moderation regimes.
arXiv link: http://arxiv.org/abs/2405.04973v1
Testing the Fairness-Accuracy Improvability of Algorithms
benefits or harms of the algorithm fall disproportionately on certain social
groups. Addressing an algorithm's disparate impact can be challenging, however,
because it is often unclear whether it is possible to reduce this impact
without sacrificing other objectives of the organization, such as accuracy or
profit. Establishing the improvability of algorithms with respect to multiple
criteria is of both conceptual and practical interest: in many settings,
disparate impact that would otherwise be prohibited under US federal law is
permissible if it is necessary to achieve a legitimate business interest. The
question is how a policy-maker can formally substantiate, or refute, this
"necessity" defense. In this paper, we provide an econometric framework for
testing the hypothesis that it is possible to improve on the fairness of an
algorithm without compromising on other pre-specified objectives. Our proposed
test is simple to implement and can be applied under any exogenous constraint
on the algorithm space. We establish the large-sample validity and consistency
of our test, and microfound the test's robustness to manipulation based on a
game between a policymaker and the analyst. Finally, we apply our approach to
evaluate a healthcare algorithm originally considered by Obermeyer et al.
(2019), and quantify the extent to which the algorithm's disparate impact can
be reduced without compromising the accuracy of its predictions.
arXiv link: http://arxiv.org/abs/2405.04816v4
Difference-in-Differences Estimators When No Unit Remains Untreated
are untreated at period one, and receive strictly positive doses at period two.
First, we consider designs with some quasi-untreated units, with a period-two
dose local to zero. We show that under a parallel-trends assumption, a weighted
average of slopes of units' potential outcomes is identified by a
difference-in-difference estimand using quasi-untreated units as the control
group. We leverage results from the regression-discontinuity-design literature
to propose a nonparametric estimator. Then, we propose estimators for designs
without quasi-untreated units. Finally, we propose a test of the
homogeneous-effect assumption underlying two-way-fixed-effects regressions.
arXiv link: http://arxiv.org/abs/2405.04465v5
Detailed Gender Wage Gap Decompositions: Controlling for Worker Unobserved Heterogeneity Using Network Theory
allowed for the identification and estimation of detailed wage gap
decompositions. In this context, building reliable counterfactuals requires
using tighter controls to ensure that similar workers are correctly identified
by making sure that important unobserved variables such as skills are
controlled for, as well as comparing only workers with similar observable
characteristics. This paper contributes to the wage decomposition literature in
two main ways: (i) developing an economic principled network based approach to
control for unobserved worker skills heterogeneity in the presence of potential
discrimination; and (ii) extending existing generic decomposition tools to
accommodate for potential lack of overlapping supports in covariates between
groups being compared, which is likely to be the norm in more detailed
decompositions. We illustrate the methodology by decomposing the gender wage
gap in Brazil.
arXiv link: http://arxiv.org/abs/2405.04365v1
A Primer on the Analysis of Randomized Experiments and a Survey of some Recent Advances
of randomized experiments. The emergence of this literature may seem surprising
given the widespread use and long history of experiments as the "gold standard"
in program evaluation, but this body of work has revealed many subtle aspects
of randomized experiments that may have been previously unappreciated. This
article provides an overview of some of these topics, primarily focused on
stratification, regression adjustment, and cluster randomization.
arXiv link: http://arxiv.org/abs/2405.03910v2
A quantile-based nonadditive fixed effects model
heterogeneous causal effects. Similar to standard fixed effects (FE) model, my
model allows arbitrary dependence between regressors and unobserved
heterogeneity, but it generalizes the additive separability of standard FE to
allow the unobserved heterogeneity to enter nonseparably. Similar to structural
quantile models, my model's random coefficient vector depends on an unobserved,
scalar ”rank” variable, in which outcomes (excluding an additive noise term)
are monotonic at a particular value of the regressor vector, which is much
weaker than the conventional monotonicity assumption that must hold at all
possible values. This rank is assumed to be stable over time, which is often
more economically plausible than the panel quantile studies that assume
individual rank is iid over time. It uncovers the heterogeneous causal effects
as functions of the rank variable. I provide identification and estimation
results, establishing uniform consistency and uniform asymptotic normality of
the heterogeneous causal effect function estimator. Simulations show reasonable
finite-sample performance and show my model complements fixed effects quantile
regression. Finally, I illustrate the proposed methods by examining the causal
effect of a country's oil wealth on its military defense spending.
arXiv link: http://arxiv.org/abs/2405.03826v1
Tuning parameter selection in econometrics
nonparametric and $\ell_1$-penalized estimation. For the nonparametric
estimation, I consider the methods of Mallows, Stein, Lepski, cross-validation,
penalization, and aggregation in the context of series estimation. For the
$\ell_1$-penalized estimation, I consider the methods based on the theory of
self-normalized moderate deviations, bootstrap, Stein's unbiased risk
estimation, and cross-validation in the context of Lasso estimation. I explain
the intuition behind each of the methods and discuss their comparative
advantages. I also give some extensions.
arXiv link: http://arxiv.org/abs/2405.03021v1
A Network Simulation of OTC Markets with Multiple Agents
(OTC) financial market in which trades are intermediated solely by market
makers and agent visibility is constrained to a network topology. Dynamics,
such as changes in price, result from agent-level interactions that
ubiquitously occur via market maker agents acting as liquidity providers. Two
additional agents are considered: trend investors use a deep convolutional
neural network paired with a deep Q-learning framework to inform trading
decisions by analysing price history; and value investors use a static
price-target to determine their trade directions and sizes. We demonstrate that
our novel inclusion of a network topology with market makers facilitates
explorations into various market structures. First, we present the model and an
overview of its mechanics. Second, we validate our findings via comparison to
the real-world: we demonstrate a fat-tailed distribution of price changes,
auto-correlated volatility, a skew negatively correlated to market maker
positioning, predictable price-history patterns and more. Finally, we
demonstrate that our network-based model can lend insights into the effect of
market-structure on price-action. For example, we show that markets with
sparsely connected intermediaries can have a critical point of fragmentation,
beyond which the market forms distinct clusters and arbitrage becomes rapidly
possible between the prices of different market makers. A discussion is
provided on future work that would be beneficial.
arXiv link: http://arxiv.org/abs/2405.02480v1
Identifying and exploiting alpha in linear asset pricing models with strong, semi-strong, and latent factors
vector we denote by {\phi} which is identified from the cross section
regression of alpha of individual securities on the vector of factor loadings.
If phi is non-zero one can construct "phi-portfolios" which exploit the
systematic components of non-zero alpha. We show that for known values of betas
and when phi is non-zero there exist phi-portfolios that dominate mean-variance
portfolios. The paper then proposes a two-step bias corrected estimator of phi
and derives its asymptotic distribution allowing for idiosyncratic pricing
errors, weak missing factors, and weak error cross-sectional dependence. Small
sample results from extensive Monte Carlo experiments show that the proposed
estimator has the correct size with good power properties. The paper also
provides an empirical application to a large number of U.S. securities with
risk factors selected from a large number of potential risk factors according
to their strength and constructs phi-portfolios and compares their Sharpe
ratios to mean variance and S&P 500 portfolio.
arXiv link: http://arxiv.org/abs/2405.02217v4
Testing for an Explosive Bubble using High-Frequency Volatility
we develop a test for explosive behavior in financial asset prices at a low
frequency when prices are sampled at a higher frequency. The test exploits the
volatility information in the high-frequency data. The method consists of
devolatizing log-asset price increments with realized volatility measures and
performing a supremum-type recursive Dickey-Fuller test on the devolatized
sample. The proposed test has a nuisance-parameter-free asymptotic distribution
and is easy to implement. We study the size and power properties of the test in
Monte Carlo simulations. A real-time date-stamping strategy based on the
devolatized sample is proposed for the origination and conclusion dates of the
explosive regime. Conditions under which the real-time date-stamping strategy
is consistent are established. The test and the date-stamping strategy are
applied to study explosive behavior in cryptocurrency and stock markets.
arXiv link: http://arxiv.org/abs/2405.02087v1
Unleashing the Power of AI: Transforming Marketing Decision-Making in Heavy Machinery with Machine Learning, Radar Chart Simulation, and Markov Chain Analysis
the heavy machinery industry, specifically focusing on production management.
The study integrates machine learning techniques like Ridge Regression, Markov
chain analysis, and radar charts to optimize North American Crawler Cranes
market production processes. Ridge Regression enables growth pattern
identification and performance assessment, facilitating comparisons and
addressing industry challenges. Markov chain analysis evaluates risk factors,
aiding in informed decision-making and risk management. Radar charts simulate
benchmark product designs, enabling data-driven decisions for production
optimization. This interdisciplinary approach equips decision-makers with
transformative insights, enhancing competitiveness in the heavy machinery
industry and beyond. By leveraging these techniques, companies can
revolutionize their production management strategies, driving success in
diverse markets.
arXiv link: http://arxiv.org/abs/2405.01913v1
Synthetic Controls with spillover effects: A comparative study
modification of the Synthetic Control Method (SCM) designed to improve its
predictive performance by utilizing control units affected by the treatment in
question. This method is then compared to other SCM modifications: SCM without
any modifications, SCM after removing all spillover-affected units, Inclusive
SCM, and the SP SCM model. For the comparison, Monte Carlo simulations are
utilized, generating artificial datasets with known counterfactuals and
comparing the predictive performance of the methods. Generally, the Inclusive
SCM performed best in all settings and is relatively simple to implement. The
Iterative SCM, introduced in this paper, was in close seconds, with a small
difference in performance and a simpler implementation.
arXiv link: http://arxiv.org/abs/2405.01645v1
Designing Algorithmic Recommendations to Achieve Human-AI Complementarity
However, the design and analysis of algorithms often focus on predicting
outcomes and do not explicitly model their effect on human decisions. This
discrepancy between the design and role of algorithmic assistants becomes
particularly concerning in light of empirical evidence that suggests that
algorithmic assistants again and again fail to improve human decisions. In this
article, we formalize the design of recommendation algorithms that assist human
decision-makers without making restrictive ex-ante assumptions about how
recommendations affect decisions. We formulate an algorithmic-design problem
that leverages the potential-outcomes framework from causal inference to model
the effect of recommendations on a human decision-maker's binary treatment
choice. Within this model, we introduce a monotonicity assumption that leads to
an intuitive classification of human responses to the algorithm. Under this
assumption, we can express the human's response to algorithmic recommendations
in terms of their compliance with the algorithm and the active decision they
would take if the algorithm sends no recommendation. We showcase the utility of
our framework using an online experiment that simulates a hiring task. We argue
that our approach can make sense of the relative performance of different
recommendation algorithms in the experiment and can help design solutions that
realize human-AI complementarity. Finally, we leverage our approach to derive
minimax optimal recommendation algorithms that can be implemented with machine
learning using limited training data.
arXiv link: http://arxiv.org/abs/2405.01484v2
Dynamic Local Average Treatment Effects
that arise in applications such as digital recommendations and adaptive medical
trials. These are settings where decision makers encourage individuals to take
treatments over time, but adapt encouragements based on previous
encouragements, treatments, states, and outcomes. Importantly, individuals may
not comply with encouragements based on unobserved confounders. For settings
with binary treatments and encouragements, we provide nonparametric
identification, estimation, and inference for Dynamic Local Average Treatment
Effects (LATEs), which are expected values of multiple time period treatment
effect contrasts for the respective complier subpopulations. Under One Sided
Noncompliance and sequential extensions of the assumptions in Imbens and
Angrist (1994), we show that one can identify Dynamic LATEs that correspond to
treating at single time steps. In Staggered Adoption settings, we show that the
assumptions are sufficient to identify Dynamic LATEs for treating in multiple
time periods. Moreover, this result extends to any setting where the effect of
a treatment in one period is uncorrelated with the compliance event in a
subsequent period.
arXiv link: http://arxiv.org/abs/2405.01463v3
Demistifying Inference after Adaptive Experiments
policy and/or the decision to stop the experiment to the data observed so far.
This has the potential to improve outcomes for study participants within the
experiment, to improve the chance of identifying best treatments after the
experiment, and to avoid wasting data. Seen as an experiment (rather than just
a continually optimizing system) it is still desirable to draw statistical
inferences with frequentist guarantees. The concentration inequalities and
union bounds that generally underlie adaptive experimentation algorithms can
yield overly conservative inferences, but at the same time the asymptotic
normality we would usually appeal to in non-adaptive settings can be imperiled
by adaptivity. In this article we aim to explain why, how, and when adaptivity
is in fact an issue for inference and, when it is, understand the various ways
to fix it: reweighting to stabilize variances and recover asymptotic normality,
always-valid inference based on joint normality of an asymptotic limiting
sequence, and characterizing and inverting the non-normal distributions induced
by adaptivity.
arXiv link: http://arxiv.org/abs/2405.01281v1
Asymptotic Properties of the Distributional Synthetic Controls
(DSC) proposed by Gunsilius (2023) provides estimates for quantile treatment
effect and thus enabling researchers to comprehensively understand the impact
of interventions in causal inference. But the asymptotic properties of DSC have
not been built. In this paper, we first establish the DSC estimator's
asymptotic optimality in the essence that the treatment effect estimator given
by DSC achieves the lowest possible squared prediction error among all
potential estimators from averaging quantiles of control units. We then
establish the convergence rate of the DSC weights. A significant aspect of our
research is that we find the DSC synthesis forms an optimal weighted average,
particularly in situations where it is impractical to perfectly fit the treated
unit's quantiles through the weighted average of the control units' quantiles.
Simulation results verify our theoretical insights.
arXiv link: http://arxiv.org/abs/2405.00953v2
De-Biasing Models of Biased Decisions: A Comparison of Methods Using Mortgage Application Data
approval of loan applications. However, they may inherit bias against protected
groups from the data they are trained on. This paper adds counterfactual
(simulated) ethnic bias to real data on mortgage application decisions, and
shows that this bias is replicated by a machine learning model (XGBoost) even
when ethnicity is not used as a predictive variable. Next, several other
de-biasing methods are compared: averaging over prohibited variables, taking
the most favorable prediction over prohibited variables (a novel method), and
jointly minimizing errors as well as the association between predictions and
prohibited variables. De-biasing can recover some of the original decisions,
but the results are sensitive to whether the bias is effected through a proxy.
arXiv link: http://arxiv.org/abs/2405.00910v1
Optimal Bias-Correction and Valid Inference in High-Dimensional Ridge Regression: A Closed-Form Solution
inherent bias poses a significant and longstanding challenge, compromising both
statistical efficiency and scalability across various applications. To tackle
this critical issue, we introduce an iterative strategy to correct bias
effectively when the dimension $p$ is less than the sample size $n$. For $p>n$,
our method optimally mitigates the bias such that any remaining bias in the
proposed de-biased estimator is unattainable through linear transformations of
the response data. To address the remaining bias when $p>n$, we employ a
Ridge-Screening (RS) method, producing a reduced model suitable for bias
correction. Crucially, under certain conditions, the true model is nested
within our selected one, highlighting RS as a novel variable selection
approach. Through rigorous analysis, we establish the asymptotic properties and
valid inferences of our de-biased ridge estimators for both $p<n$ and $p>n$,
where, both $p$ and $n$ may increase towards infinity, along with the number of
iterations. We further validate these results using simulated and real-world
data examples. Our method offers a transformative solution to the bias
challenge in ridge regression inferences across various disciplines.
arXiv link: http://arxiv.org/abs/2405.00424v2
Estimating Heterogeneous Treatment Effects with Item-Level Outcome Data: Insights from Item Response Theory
causal inference research. However, when outcomes are latent variables assessed
via psychometric instruments such as educational tests, standard methods ignore
the potential HTE that may exist among the individual items of the outcome
measure. Failing to account for "item-level" HTE (IL-HTE) can lead to both
underestimated standard errors and identification challenges in the estimation
of treatment-by-covariate interaction effects. We demonstrate how Item Response
Theory (IRT) models that estimate a treatment effect for each assessment item
can both address these challenges and provide new insights into HTE generally.
This study articulates the theoretical rationale for the IL-HTE model and
demonstrates its practical value using 75 datasets from 48 randomized
controlled trials containing 5.8 million item responses in economics,
education, and health research. Our results show that the IL-HTE model reveals
item-level variation masked by single-number scores, provides more meaningful
standard errors in many settings, allows for estimates of the generalizability
of causal effects to untested items, resolves identification problems in the
estimation of interaction effects, and provides estimates of standardized
treatment effect sizes corrected for attenuation due to measurement error.
arXiv link: http://arxiv.org/abs/2405.00161v4
Identification by non-Gaussianity in structural threshold and smooth transition vector autoregressive models
statistically identified if the shocks are mutually independent and at most one
of them is Gaussian. This extends a known identification result for linear
structural vector autoregressions to a time-varying impact matrix. We also
propose an estimation method, show how a blended identification strategy can be
adopted to address weak identification, and establish a sufficient condition
for ergodic stationarity. The introduced methods are implemented in the
accompanying R package sstvars. Our empirical application finds that a positive
climate policy uncertainty shock reduces production and raises inflation under
both low and high economic policy uncertainty, but its effects, particularly on
inflation, are stronger during the latter.
arXiv link: http://arxiv.org/abs/2404.19707v5
Percentage Coefficient (bp) -- Effect Size Analysis (Theory Paper 1)
additional and alternative estimator of effect size for regression analysis.
This paper retraces the theory behind the estimator. It's posited that an
estimator must first serve the fundamental function of enabling researchers and
readers to comprehend an estimand, the target of estimation. It may then serve
the instrumental function of enabling researchers and readers to compare two or
more estimands. Defined as the regression coefficient when dependent variable
(DV) and independent variable (IV) are both on conceptual 0-1 percentage
scales, percentage coefficients (bp) feature 1) clearly comprehendible
interpretation and 2) equitable scales for comparison. The coefficient (bp)
serves the two functions effectively and efficiently. It thus serves needs
unserved by other indicators, such as raw coefficient (bw) and standardized
beta.
Another premise of the functionalist theory is that "effect" is not a
monolithic concept. Rather, it is a collection of concepts, each of which
measures a component of the conglomerate called "effect", thereby serving a
subfunction. Regression coefficient (b), for example, indicates the unit change
in DV associated with a one-unit increase in IV, thereby measuring one aspect
called unit effect, aka efficiency. Percentage coefficient (bp) indicates the
percentage change in DV associated with a whole scale increase in IV. It is not
meant to be an all-encompassing indicator of an all-encompassing concept, but
rather a comprehendible and comparable indicator of efficiency, a key aspect of
effect.
arXiv link: http://arxiv.org/abs/2404.19495v2
Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty
it can be computationally expensive when the number of samples is large. We
propose a new approach called Orthogonal Bootstrap that reduces the
number of required Monte Carlo replications. We decomposes the target being
simulated into two parts: the non-orthogonal part which has a
closed-form result known as Infinitesimal Jackknife and the orthogonal
part which is easier to be simulated. We theoretically and numerically show
that Orthogonal Bootstrap significantly reduces the computational cost of
Bootstrap while improving empirical accuracy and maintaining the same width of
the constructed interval.
arXiv link: http://arxiv.org/abs/2404.19145v2
A Locally Robust Semiparametric Approach to Examiner IV Designs
effects using the popular examiner IV design, in the presence of many examiners
and possibly many covariates relative to the sample size. The key ingredient of
this approach is an orthogonal moment function that is robust to biases and
local misspecification from the first step estimation of the examiner IV. I
derive the orthogonal moment function and show that it delivers multiple
robustness where the outcome model or at least one of the first step components
is misspecified but the estimating equation remains valid. The proposed
framework not only allows for estimation of the examiner IV in the presence of
many examiners and many covariates relative to sample size, using a wide range
of nonparametric and machine learning techniques including LASSO, Dantzig,
neural networks and random forests, but also delivers root-n consistent
estimation of the parameter of interest under mild assumptions.
arXiv link: http://arxiv.org/abs/2404.19144v1
Optimal Treatment Allocation under Constraints
level, optimally allocating treatments to recipients is complex even when
potential outcomes are known. We present an algorithm for multi-arm treatment
allocation problems that is guaranteed to find the optimal allocation in
strongly polynomial time, and which is able to handle arbitrary potential
outcomes as well as constraints on treatment requirement and capacity. Further,
starting from an arbitrary allocation, we show how to optimally re-allocate
treatments in a Pareto-improving manner. To showcase our results, we use data
from Danish nurse home visiting for infants. We estimate nurse specific
treatment effects for children born 1959-1967 in Copenhagen, comparing nurses
against each other. We exploit random assignment of newborn children to nurses
within a district to obtain causal estimates of nurse-specific treatment
effects using causal machine learning. Using these estimates, and treating the
Danish nurse home visiting program as a case of an optimal treatment allocation
problem (where a treatment is a nurse), we document room for significant
productivity improvements by optimally re-allocating nurses to children. Our
estimates suggest that optimal allocation of nurses to children could have
improved average yearly earnings by USD 1,815 and length of education by around
two months.
arXiv link: http://arxiv.org/abs/2404.18268v1
Testing for Asymmetric Information in Insurance with Deep Learning
Chiappori and Salanie (2000) has been applied in many insurance markets. Most
of the literature focuses on the special case of constant correlation; it also
relies on restrictive parametric specifications for the choice of coverage and
the occurrence of claims. We relax these restrictions by estimating conditional
covariances and correlations using deep learning methods. We test the positive
correlation property by using the intersection test of Chernozhukov, Lee, and
Rosen (2013) and the "sorted groups" test of Chernozhukov, Demirer, Duflo, and
Fernandez-Val (2023). Our results confirm earlier findings that the correlation
between risk and coverage is small. Random forests and gradient boosting trees
produce similar results to neural networks.
arXiv link: http://arxiv.org/abs/2404.18207v1
Sequential monitoring for explosive volatility regimes
(timely) detect changes in a GARCH(1,1) model. Whilst our methodologies can be
applied for the general analysis of changepoints in GARCH(1,1) sequences, they
are in particular designed to detect changes from stationarity to explosivity
or vice versa, thus allowing to check for volatility bubbles. Our statistics
can be applied irrespective of whether the historical sample is stationary or
not, and indeed without prior knowledge of the regime of the observations
before and after the break. In particular, we construct our detectors as the
CUSUM process of the quasi-Fisher scores of the log likelihood function. In
order to ensure timely detection, we then construct our boundary function
(exceeding which would indicate a break) by including a weighting sequence
which is designed to shorten the detection delay in the presence of a
changepoint. We consider two types of weights: a lighter set of weights, which
ensures timely detection in the presence of changes occurring early, but not
too early after the end of the historical sample; and a heavier set of weights,
called Renyi weights which is designed to ensure timely detection in the
presence of changepoints occurring very early in the monitoring horizon. In
both cases, we derive the limiting distribution of the detection delays,
indicating the expected delay for each set of weights. Our theoretical results
are validated via a comprehensive set of simulations, and an empirical
application to daily returns of individual stocks.
arXiv link: http://arxiv.org/abs/2404.17885v1
A Nonresponse Bias Correction using Nonrandom Followup with an Application to the Gender Entrepreneurship Gap
multiple attempts to contact subjects affect whether researchers observe
variables without affecting the variables themselves. Our procedure produces
point estimates of population averages using selected samples without requiring
randomized incentives or assuming selection bias cancels out for any
within-respondent comparisons. Applying our correction to a 16% response rate
survey of University of Wisconsin-Madison undergraduates, we estimate a 15
percentage point male-female entrepreneurial intention gap. Our estimates
attribute the 20 percentage point uncorrected within-respondent gap to positive
bias for men and negative bias for women, highlighting the value of
within-group nonresponse corrections.
arXiv link: http://arxiv.org/abs/2404.17693v2
Overidentification in Shift-Share Designs
employed to assign a causal interpretation to two stage least squares (TSLS)
estimators based on Bartik instruments. For homogeneous effects models applied
to short panels, our analysis yields testable implications previously noted in
the literature for the two major available identification strategies. We
propose overidentification tests for these restrictions that remain valid in
high dimensional regimes and are robust to heteroskedasticity and clustering.
We further show that homogeneous effect models in short panels, and their
corresponding overidentification tests, are of central importance by
establishing that: (i) In heterogenous effects models, interpreting TSLS as a
positively weighted average of treatment effects can impose implausible
assumptions on the distribution of the data; and (ii) Alternative identifying
strategies relying on long panels can prove uninformative in short panel
applications. We highlight the empirical relevance of our results by examining
the viability of Bartik instruments for identifying the effect of rising
Chinese import competition on US local labor markets.
arXiv link: http://arxiv.org/abs/2404.17049v1
A joint test of unconfoundedness and common trends
assumptions to identify the average treatment effect on the treated in a
two-period panel data setting: unconfoundedness and common trends. Under the
unconfoundedness assumption, treatment assignment and post-treatment outcomes
are independent, conditional on control variables and pre-treatment outcomes,
which motivates including pre-treatment outcomes in the set of controls.
Conversely, under the common trends assumption, the trend and the treatment
assignment are independent, conditional on control variables. This motivates
employing a Difference-in-Differences (DiD) approach by comparing the
differences between pre- and post-treatment outcomes of the treatment and
control group. Given the non-nested nature of these assumptions and their often
ambiguous plausibility in empirical settings, we propose a joint test using a
doubly robust statistic that can be combined with machine learning to control
for observed confounders in a data-driven manner. We discuss various causal
models that imply the satisfaction of either common trends, unconfoundedness,
or both assumptions jointly, and we investigate the finite sample properties of
our test through a simulation study. Additionally, we apply the proposed method
to five empirical examples using publicly available datasets and find the test
to reject the null hypothesis in two cases.
arXiv link: http://arxiv.org/abs/2404.16961v3
Correlations versus noise in the NFT market
leveraging blockchain technology, mirroring the dynamics of the cryptocurrency
market. The current study is based on the capitalization changes and
transaction volumes across a large number of token collections on the Ethereum
platform. In order to deepen the understanding of the market dynamics, the
collection-collection dependencies are examined by using the multivariate
formalism of detrended correlation coefficient and correlation matrix. It
appears that correlation strength is lower here than that observed in
previously studied markets. Consequently, the eigenvalue spectra of the
correlation matrix more closely follow the Marchenko-Pastur distribution,
still, some departures indicating the existence of correlations remain. The
comparison of results obtained from the correlation matrix built from the
Pearson coefficients and, independently, from the detrended cross-correlation
coefficients suggests that the global correlations in the NFT market arise from
higher frequency fluctuations. Corresponding minimal spanning trees (MSTs) for
capitalization variability exhibit a scale-free character while, for the number
of transactions, they are somewhat more decentralized.
arXiv link: http://arxiv.org/abs/2404.15495v2
Quantifying the Internal Validity of Weighted Estimands
parameters that can be expressed as weighted averages of the underlying
heterogeneous treatment effects. The popular ordinary least squares (OLS),
two-stage least squares (2SLS), and two-way fixed effects (TWFE) estimands are
all special cases within our framework. Our focus is on answering two questions
concerning weighted estimands. First, under what conditions can they be
interpreted as the average treatment effect for some (possibly latent)
subpopulation? Second, when these conditions are satisfied, what is the upper
bound on the size of that subpopulation, either in absolute terms or relative
to a target population of interest? We argue that this upper bound provides a
valuable diagnostic for empirical research. When a given weighted estimand
corresponds to the average treatment effect for a small subset of the
population of interest, we say its internal validity is low. Our paper develops
practical tools to quantify the internal validity of weighted estimands. We
also apply these tools to revisit a prominent study of the effects of
unilateral divorce laws on female suicide.
arXiv link: http://arxiv.org/abs/2404.14603v4
Stochastic Volatility in Mean: Efficient Analysis by a Generalized Mixture Sampler
stochastic volatility in mean (SVM) models. Extending the highly efficient
Markov chain Monte Carlo mixture sampler for the SV model proposed in Kim et
al. (1998) and Omori et al. (2007), we develop an accurate approximation of the
non-central chi-squared distribution as a mixture of thirty normal
distributions. Under this mixture representation, we sample the parameters and
latent volatilities in one block. We also detail a correction of the small
approximation error by using additional Metropolis-Hastings steps. The proposed
method is extended to the SVM model with leverage. The methodology and models
are applied to excess holding yields and S&P500 returns in empirical studies,
and the SVM models are shown to outperform other volatility models based on
marginal likelihoods.
arXiv link: http://arxiv.org/abs/2404.13986v2
Identification and Estimation of Nonseparable Triangular Equations with Mismeasured Instruments
marginal effect of an endogenous variable $X$ on the outcome variable $Y$,
given a potentially mismeasured instrument variable $W^*$, without assuming
linearity or separability of the functions governing the relationship between
observables and unobservables. To address the challenges arising from the
co-existence of measurement error and nonseparability, I first employ the
deconvolution technique from the measurement error literature to identify the
joint distribution of $Y, X, W^*$ using two error-laden measurements of $W^*$.
I then recover the structural derivative of the function of interest and the
"Local Average Response" (LAR) from the joint distribution via the "unobserved
instrument" approach in Matzkin (2016). I also propose nonparametric estimators
for these parameters and derive their uniform rates of convergence. Monte Carlo
exercises show evidence that the estimators I propose have good finite sample
performance.
arXiv link: http://arxiv.org/abs/2404.13735v1
How do applied researchers use the Causal Forest? A methodological review of a method
applied researchers across 133 peer-reviewed papers. It shows that the emerging
best practice relies heavily on the approach and tools created by the original
authors of the causal forest such as their grf package and the approaches given
by them in examples. Generally researchers use the causal forest on a
relatively low-dimensional dataset relying on observed controls or in some
cases experiments to identify effects. There are several common ways to then
communicate results -- by mapping out the univariate distribution of
individual-level treatment effect estimates, displaying variable importance
results for the forest and graphing the distribution of treatment effects
across covariates that are important either for theoretical reasons or because
they have high variable importance. Some deviations from this common practice
are interesting and deserve further development and use. Others are unnecessary
or even harmful. The paper concludes by reflecting on the emerging best
practice for causal forest use and paths for future research.
arXiv link: http://arxiv.org/abs/2404.13356v2
An economically-consistent discrete choice model with flexible utility specification based on artificial neural networks
discrete choice modelling. However, specifying the utility function of RUM
models is not straightforward and has a considerable impact on the resulting
interpretable outcomes and welfare measures. In this paper, we propose a new
discrete choice model based on artificial neural networks (ANNs) named
"Alternative-Specific and Shared weights Neural Network (ASS-NN)", which
provides a further balance between flexible utility approximation from the data
and consistency with two assumptions: RUM theory and fungibility of money
(i.e., "one euro is one euro"). Therefore, the ASS-NN can derive
economically-consistent outcomes, such as marginal utilities or willingness to
pay, without explicitly specifying the utility functional form. Using a Monte
Carlo experiment and empirical data from the Swissmetro dataset, we show that
ASS-NN outperforms (in terms of goodness of fit) conventional multinomial logit
(MNL) models under different utility specifications. Furthermore, we show how
the ASS-NN is used to derive marginal utilities and willingness to pay
measures.
arXiv link: http://arxiv.org/abs/2404.13198v1
On the Asymmetric Volatility Connectedness
volatility to other variables compared to the rate that it is receiving. The
idea is based on the percentage of variance decomposition from one variable to
the others, which is estimated by making use of a VAR model. Diebold and Yilmaz
(2012, 2014) suggested estimating this simple and useful measure of percentage
risk spillover impact. Their method is symmetric by nature, however. The
current paper offers an alternative asymmetric approach for measuring the
volatility spillover direction, which is based on estimating the asymmetric
variance decompositions introduced by Hatemi-J (2011, 2014). This approach
accounts explicitly for the asymmetric property in the estimations, which
accords better with reality. An application is provided to capture the
potential asymmetric volatility spillover impacts between the three largest
financial markets in the world.
arXiv link: http://arxiv.org/abs/2404.12997v2
The modified conditional sum-of-squares estimator for fractionally integrated models
bias of the conditional sum-of-squares (CSS) estimator in a stationary or
non-stationary type-II ARFIMA ($p_1$,$d$,$p_2$) model. We derive expressions
for the estimator's bias and show that the leading term can be easily removed
by a simple modification of the CSS objective function. We call this new
estimator the modified conditional sum-of-squares (MCSS) estimator. We show
theoretically and by means of Monte Carlo simulations that its performance
relative to that of the CSS estimator is markedly improved even for small
sample sizes. Finally, we revisit three classical short datasets that have in
the past been described by ARFIMA($p_1$,$d$,$p_2$) models with constant term,
namely the post-second World War real GNP data, the extended Nelson-Plosser
data, and the Nile data.
arXiv link: http://arxiv.org/abs/2404.12882v2
Two-step Estimation of Network Formation Models with Unobserved Heterogeneities and Strategic Interactions
of incomplete information, where the latent payoff of forming a link between
two individuals depends on the structure of the network, as well as private
information on agents' attributes. I allow agents' private unobserved
attributes to be correlated with observed attributes through individual fixed
effects. Using data from a single large network, I propose a two-step estimator
for the model primitives. In the first step, I estimate agents' equilibrium
beliefs of other people's choice probabilities. In the second step, I plug in
the first-step estimator to the conditional choice probability expression and
estimate the model parameters and the unobserved individual fixed effects
together using Joint MLE. Assuming that the observed attributes are discrete, I
showed that the first step estimator is uniformly consistent with rate
$N^{-1/4}$, where $N$ is the total number of linking proposals. I also show
that the second-step estimator converges asymptotically to a normal
distribution at the same rate.
arXiv link: http://arxiv.org/abs/2404.12581v1
Axiomatic modeling of fixed proportion technologies
critical for efficient resource allocation and firm strategy. There are
important examples of fixed proportion technologies where certain inputs are
non-substitutable and/or certain outputs are non-transformable. However, there
is widespread confusion about the appropriate modeling of fixed proportion
technologies in data envelopment analysis. We point out and rectify several
misconceptions in the existing literature, and show how fixed proportion
technologies can be correctly incorporated into the axiomatic framework. A
Monte Carlo study is performed to demonstrate the proposed solution.
arXiv link: http://arxiv.org/abs/2404.12462v2
(Empirical) Bayes Approaches to Parallel Trends
violations of parallel trends. In the Bayes approach, the researcher specifies
a prior over both the pre-treatment violations of parallel trends
$\delta_{pre}$ and the post-treatment violations $\delta_{post}$. The
researcher then updates their posterior about the post-treatment bias
$\delta_{post}$ given an estimate of the pre-trends $\delta_{pre}$. This allows
them to form posterior means and credible sets for the treatment effect of
interest, $\tau_{post}$. In the EB approach, the prior on the violations of
parallel trends is learned from the pre-treatment observations. We illustrate
these approaches in two empirical applications.
arXiv link: http://arxiv.org/abs/2404.11839v1
Regret Analysis in Threshold Policy Design
an observable characteristic exceeds a certain threshold. They are widespread
across multiple domains, including welfare programs, taxation, and clinical
medicine. This paper examines the problem of designing threshold policies using
experimental data, when the goal is to maximize the population welfare. First,
I characterize the regret - a measure of policy optimality - of the Empirical
Welfare Maximizer (EWM) policy, popular in the literature. Next, I introduce
the Smoothed Welfare Maximizer (SWM) policy, which improves the EWM's regret
convergence rate under an additional smoothness condition. The two policies are
compared by studying how differently their regrets depend on the population
distribution, and investigating their finite sample performances through Monte
Carlo simulations. In many contexts, the SWM policy guarantees larger welfare
than the EWM. An empirical illustration demonstrates how the treatment
recommendations of the two policies may differ in practice.
arXiv link: http://arxiv.org/abs/2404.11767v2
Testing Mechanisms
affects an outcome. We develop tests for the "sharp null of full mediation"
that a treatment $D$ affects an outcome $Y$ only through a particular mechanism
(or set of mechanisms) $M$. Our approach exploits connections between mediation
analysis and the econometric literature on testing instrument validity. We also
provide tools for quantifying the magnitude of alternative mechanisms when the
sharp null is rejected: we derive sharp lower bounds on the fraction of
individuals whose outcome is affected by the treatment despite having the same
value of $M$ under both treatments (“always-takers”), as well as sharp bounds
on the average effect of the treatment for such always-takers. An advantage of
our approach relative to existing tools for mediation analysis is that it does
not require stringent assumptions about how $M$ is assigned. We illustrate our
methodology in two empirical applications.
arXiv link: http://arxiv.org/abs/2404.11739v2
Weighted-Average Least Squares for Negative Binomial Regression
improving predictions and dealing with model uncertainty, especially in
Bayesian settings. Recently, frequentist model averaging methods such as
information theoretic and least squares model averaging have emerged. This work
focuses on the issue of covariate uncertainty where managing the computational
resources is key: The model space grows exponentially with the number of
covariates such that averaged models must often be approximated.
Weighted-average least squares (WALS), first introduced for (generalized)
linear models in the econometric literature, combines Bayesian and frequentist
aspects and additionally employs a semiorthogonal transformation of the
regressors to reduce the computational burden. This paper extends WALS for
generalized linear models to the negative binomial (NB) regression model for
overdispersed count data. A simulation experiment and an empirical application
using data on doctor visits were conducted to compare the predictive power of
WALS for NB regression to traditional estimators. The results show that WALS
for NB improves on the maximum likelihood estimator in sparse situations and is
competitive with lasso while being computationally more efficient.
arXiv link: http://arxiv.org/abs/2404.11324v1
Bayesian Markov-Switching Vector Autoregressive Process
Markov-Switching Vector Autoregressive (MS-VAR) process. In the case of the
Bayesian MS-VAR process, we provide closed-form density functions and
Monte-Carlo simulation algorithms, including the importance sampling method.
The Monte-Carlo simulation method departs from the previous simulation methods
because it removes the duplication in a regime vector.
arXiv link: http://arxiv.org/abs/2404.11235v3
Forecasting with panel data: Estimation uncertainty versus parameter heterogeneity
forecasting methods based on individual, pooling, fixed effects, and empirical
Bayes estimation, and propose optimal weights for forecast combination schemes.
We consider linear panel data models, allowing for weakly exogenous regressors
and correlated heterogeneity. We quantify the gains from exploiting panel data
and demonstrate how forecasting performance depends on the degree of parameter
heterogeneity, whether such heterogeneity is correlated with the regressors,
the goodness of fit of the model, and the dimensions of the data. Monte Carlo
simulations and empirical applications to house prices and CPI inflation show
that empirical Bayes and forecast combination methods perform best overall and
rarely produce the least accurate forecasts for individual series.
arXiv link: http://arxiv.org/abs/2404.11198v2
Estimation for conditional moment models based on martingale difference divergence
martingale difference divergence (MDD).Our MDD-based estimation method is
formed in the framework of a continuum of unconditional moment restrictions.
Unlike the existing estimation methods in this framework, the MDD-based
estimation method adopts a non-integrable weighting function, which could grab
more information from unconditional moment restrictions than the integrable
weighting function to enhance the estimation efficiency. Due to the nature of
shift-invariance in MDD, our MDD-based estimation method can not identify the
intercept parameters. To overcome this identification issue, we further provide
a two-step estimation procedure for the model with intercept parameters. Under
regularity conditions, we establish the asymptotics of the proposed estimators,
which are not only easy-to-implement with analytic asymptotic variances, but
also applicable to time series data with an unspecified form of conditional
heteroskedasticity. Finally, we illustrate the usefulness of the proposed
estimators by simulations and two real examples.
arXiv link: http://arxiv.org/abs/2404.11092v1
Partial Identification of Structural Vector Autoregressions with Non-Centred Stochastic Volatility
stochastic volatility under Bayesian estimation. Three contributions emerge
from our exercise. First, we show that a non-centred parameterization of
stochastic volatility yields a marginal prior for the conditional variances of
structural shocks that is centred on homoskedasticity, with strong shrinkage
and heavy tails -- unlike the common centred parameterization. This feature
makes it well suited for assessing partial identification of any shock of
interest. Second, Monte Carlo experiments on small and large systems indicate
that the non-centred setup estimates structural parameters more precisely and
normalizes conditional variances efficiently. Third, revisiting prominent
fiscal structural vector autoregressions, we show how the non-centred approach
identifies tax shocks that are consistent with estimates reported in the
literature.
arXiv link: http://arxiv.org/abs/2404.11057v2
From Predictive Algorithms to Automatic Generation of Anomalies
take a familiar lesson: researchers often turn their intuitions into
theoretical insights by constructing "anomalies" -- specific examples
highlighting hypothesized flaws in a theory, such as the Allais paradox and the
Kahneman-Tversky choice experiments for expected utility. We develop procedures
that replace researchers' intuitions with predictive algorithms: given a
predictive algorithm and a theory, our procedures automatically generate
anomalies for that theory. We illustrate our procedures with a concrete
application: generating anomalies for expected utility theory. Based on a
neural network that accurately predicts lottery choices, our procedures recover
known anomalies for expected utility theory and discover new ones absent from
existing work. In incentivized experiments, subjects violate expected utility
theory on these algorithmically generated anomalies at rates similar to the
Allais paradox and common ratio effect.
arXiv link: http://arxiv.org/abs/2404.10111v2
Overfitting Reduction in Convex Regression
set. This method has played an important role in operations research,
economics, machine learning, and many other areas. However, it has been
empirically observed that convex regression produces inconsistent estimates of
convex functions and extremely large subgradients near the boundary as the
sample size increases. In this paper, we provide theoretical evidence of this
overfitting behavior. To eliminate this behavior, we propose two new estimators
by placing a bound on the subgradients of the convex function. We further show
that our proposed estimators can reduce overfitting by proving that they
converge to the underlying true convex function and that their subgradients
converge to the gradient of the underlying function, both uniformly over the
domain with probability one as the sample size is increasing to infinity. An
application to Finnish electricity distribution firms confirms the superior
performance of the proposed methods in predictive power over the existing
methods.
arXiv link: http://arxiv.org/abs/2404.09528v2
The Role of Carbon Pricing in Food Inflation: Evidence from Canadian Provinces
carbon pricing, which includes carbon tax and cap-and-trade, is implemented by
many governments. However, the inflating food prices in carbon-pricing
countries, such as Canada, have led many to believe such policies harm food
affordability. This study aims to identify changes in food prices induced by
carbon pricing using the case of Canadian provinces. Using the staggered
difference-in-difference (DiD) approach, we find an overall deflationary effect
of carbon pricing on food prices (measured by monthly provincial food CPI). The
average reductions in food CPI compared to before carbon pricing are $2%$ and
$4%$ within and beyond two years of implementation. We further find that the
deflationary effects are partially driven by lower consumption with no
significant change via farm input costs. Evidence in this paper suggests no
inflationary effect of carbon pricing in Canadian provinces, thus giving no
support to the growing voices against carbon pricing policies.
arXiv link: http://arxiv.org/abs/2404.09467v5
Julia as a universal platform for statistical software development
can transfer data between Stata and Julia, issue Julia commands to analyze and
plot, and pass results back to Stata. Julia's econometric ecosystem is not as
mature as Stata's or R's or Python's. But Julia is an excellent environment for
developing high-performance numerical applications, which can then be called
from many platforms. For example, the boottest program for wild bootstrap-based
inference (Roodman et al. 2019) and fwildclusterboot for R (Fischer and Roodman
2021) can use the same Julia back end. And the program reghdfejl mimics reghdfe
(Correia 2016) in fitting linear models with high-dimensional fixed effects
while calling a Julia package for tenfold acceleration on hard problems.
reghdfejl also supports nonlinear fixed-effect models that cannot otherwise be
fit in Stata--though preliminarily, as the Julia package for that purpose is
immature.
arXiv link: http://arxiv.org/abs/2404.09309v4
Identifying Causal Effects under Kink Setting: Theory and Evidence
a reduced-form manner under kinked settings when agents can manipulate their
choices around the threshold. The causal estimation using a bunching framework
was initially developed by Diamond and Persson (2017) under notched settings.
Many empirical applications of bunching designs involve kinked settings. We
propose a model-free causal estimator in kinked settings with sharp bunching
and then extend to the scenarios with diffuse bunching, misreporting,
optimization frictions, and heterogeneity. The estimation method is mostly
non-parametric and accounts for the interior response under kinked settings.
Applying the proposed approach, we estimate how medical subsidies affect
outpatient behaviors in China.
arXiv link: http://arxiv.org/abs/2404.09117v1
Multiply-Robust Causal Change Attribution
outcome variable. In the presence of multiple explanatory variables, how much
of the change can be explained by each possible cause? We develop a new
estimation strategy that, given a causal model, combines regression and
re-weighting methods to quantify the contribution of each causal mechanism. Our
proposed methodology is multiply robust, meaning that it still recovers the
target parameter under partial misspecification. We prove that our estimator is
consistent and asymptotically normal. Moreover, it can be incorporated into
existing frameworks for causal attribution, such as Shapley values, which will
inherit the consistency and large-sample distribution properties. Our method
demonstrates excellent performance in Monte Carlo simulations, and we show its
usefulness in an empirical application. Our method is implemented as part of
the Python library DoWhy (arXiv:2011.04216, arXiv:2206.06821).
arXiv link: http://arxiv.org/abs/2404.08839v4
Measuring the Quality of Answers in Political Q&As with Large Language Models
political question-and-answer sessions. We measure the quality of an answer
based on how easily and accurately it can be recognized in a random set of
candidate answers given the question's text. This measure reflects the answer's
relevance and depth of engagement with the question. Like semantic search, we
can implement this approach by training a language model on the corpus of
observed questions and answers without additional human-labeled data. We
showcase and validate our methodology within the context of the Question Period
in the Canadian House of Commons. Our analysis reveals that while some answers
have a weak semantic connection to questions, hinting at some evasion or
obfuscation, they are generally at least moderately relevant, far exceeding
what we would expect from random replies. We also find a meaningful correlation
between answer quality and the party affiliation of the members of Parliament
asking the questions.
arXiv link: http://arxiv.org/abs/2404.08816v5
Estimation and Inference for Three-Dimensional Panel Data Models
This study contributes to the relevant literature by introducing a novel
three-dimensional (3D) hierarchical panel data model, which integrates panel
regression with three sets of latent factor structures: one set of global
factors and two sets of local factors. Instead of aggregating latent factors
from various nodes, as seen in the literature of distributed principal
component analysis (PCA), we propose an estimation approach capable of
recovering the parameters of interest and disentangling latent factors at
different levels and across different dimensions. We establish an asymptotic
theory and provide a bootstrap procedure to obtain inference for the parameters
of interest while accommodating various types of cross-sectional dependence and
time series autocorrelation. Finally, we demonstrate the applicability of our
framework by examining productivity convergence in manufacturing industries
worldwide.
arXiv link: http://arxiv.org/abs/2404.08365v2
One Factor to Bind the Cross-Section of Returns
$r_{it}=h(f_{t}\lambda_{i})+\epsilon_{it}$. Despite its parsimony, this model
represents exactly any non-linear model with an arbitrary number of factors and
loadings -- a consequence of the Kolmogorov-Arnold representation theorem. It
features only one pricing component $h(f_{t}\lambda_{I})$, comprising a
nonparametric link function of the time-dependent factor and factor loading
that we jointly estimate with sieve-based estimators. Using 171 assets across
major classes, our model delivers superior cross-sectional performance with a
low-dimensional approximation of the link function. Most known finance and
macro factors become insignificant controlling for our single-factor.
arXiv link: http://arxiv.org/abs/2404.08129v1
Uniform Inference in High-Dimensional Threshold Regression Models
in threshold regression models, allowing for either cross-sectional or time
series data. We first establish oracle inequalities for prediction errors, and
L1 estimation errors for the Lasso estimator of the slope parameters and the
threshold parameter, accommodating heteroskedastic non-subgaussian error terms
and non-subgaussian covariates. Next, we derive the asymptotic distribution of
tests involving an increasing number of slope parameters by debiasing (or
desparsifying) the Lasso estimator in cases with no threshold effect and with a
fixed threshold effect. We show that the asymptotic distributions in both cases
are the same, allowing us to perform uniform inference without specifying
whether the model is a linear or threshold regression. Additionally, we extend
the theory to accommodate time series data under the near-epoch dependence
assumption. Finally, we identify statistically significant factors influencing
cross-country economic growth and quantify the effects of military news shocks
on US government spending and GDP, while also estimating a data-driven
threshold point in both applications.
arXiv link: http://arxiv.org/abs/2404.08105v3
Merger Analysis with Unobserved Prices
often unavailable. I characterize sufficient conditions for identifying the
unilateral effects of mergers without price data using the first-order approach
and merger simulation. Data on merging firms' revenues, margins, and revenue
diversion ratios are sufficient to identify their gross upward pricing pressure
indices and compensating marginal cost reductions. Standard discrete-continuous
demand assumptions facilitate the identification of revenue diversion ratios as
well as the feasibility of merger simulation in terms of percentage change in
price. I apply the framework to the Albertsons/Safeway (2015) and
Staples/Office Depot (2016) mergers.
arXiv link: http://arxiv.org/abs/2404.07684v6
Regression Discontinuity Design with Spillovers
linear-in-means spillovers occur between units that are close in their running
variable. We show that the RDD estimand depends on the ratio of two terms: (1)
the radius over which spillovers occur and (2) the choice of bandwidth used for
the local linear regression. RDD estimates direct treatment effect when radius
is of larger order than the bandwidth and total treatment effect when radius is
of smaller order than the bandwidth. When the two are of similar order, the RDD
estimand need not have a causal interpretation. To recover direct and spillover
effects in the intermediate regime, we propose to incorporate estimated
spillover terms into local linear regression. Our estimator is consistent and
asymptotically normal and we provide bias-aware confidence intervals for direct
treatment effects and spillovers. In the setting of Gonzalez (2021), we detect
endogenous spillovers in voter fraud during the 2009 Afghan Presidential
election. We also clarify when the donut-hole design addresses spillovers in
RDD.
arXiv link: http://arxiv.org/abs/2404.06471v2
Common Trends and Long-Run Identification in Nonlinear Structural VARs
capture important aspects of economic time series, the use of nonlinear SVARs
has to date been almost entirely confined to the modelling of stationary time
series, because of a lack of understanding as to how common stochastic trends
may be accommodated within nonlinear models. This has unfortunately
circumscribed the range of series to which such models can be applied -- and/or
required that these series be first transformed to stationarity, a potential
source of misspecification -- and prevented the use of long-run identifying
restrictions in these models. To address these problems, we develop a flexible
class of additively time-separable nonlinear SVARs, which subsume models with
threshold-type endogenous regime switching, both of the piecewise linear and
smooth transition varieties. We extend the Granger--Johansen representation
theorem to this class of models, obtaining conditions that specialise exactly
to the usual ones when the model is linear. We further show that, as a
corollary, these models are capable of supporting the same kinds of long-run
identifying restrictions as are available in linearly cointegrated SVARs.
arXiv link: http://arxiv.org/abs/2404.05349v2
Maximally Forward-Looking Core Inflation
measures. We create a new core inflation series that is explicitly designed to
succeed at that goal. Precisely, we introduce the Assemblage Regression, a
generalized nonnegative ridge regression problem that optimizes the price
index's subcomponent weights such that the aggregate is maximally predictive of
future headline inflation. Ordering subcomponents according to their rank in
each period switches the algorithm to be learning supervised trimmed inflation
- or, put differently, the maximally forward-looking summary statistic of the
realized price changes distribution. In an extensive out-of-sample forecasting
experiment for the US and the euro area, we find substantial improvements for
signaling medium-term inflation developments in both the pre- and post-Covid
years. Those coming from the supervised trimmed version are particularly
striking, and are attributable to a highly asymmetric trimming which contrasts
with conventional indicators. We also find that this metric was indicating
first upward pressures on inflation as early as mid-2020 and quickly captured
the turning point in 2022. We also consider extensions, like assembling
inflation from geographical regions, trimmed temporal aggregation, and building
core measures specialized for either upside or downside inflation risks.
arXiv link: http://arxiv.org/abs/2404.05209v1
Estimating granular house price distributions in the Australian market using Gaussian mixtures
distribution at a fine regional scale using Gaussian mixtures. The means,
variances and weights of the mixture components are related to time, location
and dwelling type through a non linear function trained by a deep functional
approximator. Price indices are derived as means, medians, quantiles or other
functions of the estimated distributions. Price densities for larger regions,
such as a city, are calculated via a weighted sum of the component density
functions. The method is applied to a data set covering all of Australia at a
fine spatial and temporal resolution. In addition to enabling a detailed
exploration of the data, the proposed index yields lower prediction errors in
the practical task of individual dwelling price projection from previous sales
values within the three major Australian cities. The estimated quantiles are
also found to be well calibrated empirically, capturing the complexity of house
price distributions.
arXiv link: http://arxiv.org/abs/2404.05178v1
Context-dependent Causality (the Non-Nonotonic Case)
context-dependent causal inference in non-parametric triangular models with
non-separable disturbances. Departing from the common practice, our analysis
does not rely on the strict monotonicity assumption. Our key contribution lies
in leveraging on diffusion models to formulate the structural equations as a
system evolving from noise accumulation to account for the influence of the
latent context (confounder) on the outcome. Our identifiability strategy
involves a system of Fredholm integral equations expressing the distributional
relationship between a latent context variable and a vector of observables.
These integral equations involve an unknown kernel and are governed by a set of
structural form functions, inducing a non-monotonic inverse problem. We prove
that if the kernel density can be represented as an infinite mixture of
Gaussians, then there exists a unique solution for the unknown function. This
is a significant result, as it shows that it is possible to solve a
non-monotonic inverse problem even when the kernel is unknown. On the
methodological front we leverage on a novel and enriched Contaminated
Generative Adversarial (Neural) Networks (CONGAN) which we provide as a
solution to the non-monotonic inverse problem.
arXiv link: http://arxiv.org/abs/2404.05021v1
Towards a generalized accessibility measure for transportation equity and efficiency
transportation planning to understand the impact of the transportation system
on influencing people's access to places. However, there is a considerable lack
of measurement standards and publicly available data. We propose a generalized
measure of locational accessibility that has a comprehensible form for
transportation planning analysis. This metric combines the cumulative
opportunities approach with gravity-based measures and is capable of catering
to multiple trip purposes, travel modes, cost thresholds, and scales of
analysis. Using data from multiple publicly available datasets, this metric is
computed by trip purpose and travel time threshold for all block groups in the
United States, and the data is made publicly accessible. Further, case studies
of three large metropolitan areas reveal substantial inefficiencies in
transportation infrastructure, with the most inefficiency observed in sprawling
and non-core urban areas, especially for bicycling. Subsequently, it is shown
that targeted investment in facilities can contribute to a more equitable
distribution of accessibility to essential shopping and service facilities. By
assigning greater weights to socioeconomically disadvantaged neighborhoods, the
proposed metric formally incorporates equity considerations into transportation
planning, contributing to a more equitable distribution of accessibility to
essential services and facilities.
arXiv link: http://arxiv.org/abs/2404.04985v1
CAVIAR: Categorical-Variable Embeddings for Accurate and Robust Inference
variables and outcomes. We introduce CAVIAR, a novel method for embedding
categorical variables that assume values in a high-dimensional ambient space
but are sampled from an underlying manifold. Our theoretical and numerical
analyses outline challenges posed by such categorical variables in causal
inference. Specifically, dynamically varying and sparse levels can lead to
violations of the Donsker conditions and a failure of the estimation
functionals to converge to a tight Gaussian process. Traditional approaches,
including the exclusion of rare categorical levels and principled variable
selection models like LASSO, fall short. CAVIAR embeds the data into a
lower-dimensional global coordinate system. The mapping can be derived from
both structured and unstructured data, and ensures stable and robust estimates
through dimensionality reduction. In a dataset of direct-to-consumer apparel
sales, we illustrate how high-dimensional categorical variables, such as zip
codes, can be succinctly represented, facilitating inference and analysis.
arXiv link: http://arxiv.org/abs/2404.04979v2
Neural Network Modeling for Forecasting Tourism Demand in Stopića Cave: A Serbian Cave Tourism Study
the classical Auto-regressive Integrated Moving Average (ARIMA) model, Machine
Learning (ML) method Support Vector Regression (SVR), and hybrid NeuralPropeth
method which combines classical and ML concepts. The most accurate predictions
were obtained with NeuralPropeth which includes the seasonal component and
growing trend of time-series. In addition, non-linearity is modeled by shallow
Neural Network (NN), and Google Trend is incorporated as an exogenous variable.
Modeling tourist demand represents great importance for management structures
and decision-makers due to its applicability in establishing sustainable
tourism utilization strategies in environmentally vulnerable destinations such
as caves. The data provided insights into the tourist demand in Stopi\'{c}a
cave and preliminary data for addressing the issues of carrying capacity within
the most visited cave in Serbia.
arXiv link: http://arxiv.org/abs/2404.04974v1
Stratifying on Treatment Status
treatment status. Standard estimators of the average treatment effect and the
local average treatment effect are inconsistent in this setting. We propose
consistent estimators and characterize their asymptotic distributions.
arXiv link: http://arxiv.org/abs/2404.04700v3
Absolute Technical Efficiency Indices
stochastic frontier analysis approach, which yields relative indices that do
not allow self-interpretations. In this paper, we introduce a single-step
estimation procedure for TEIs that eliminates the need to identify best
practices and avoids imposing restrictive hypotheses on the error term. The
resulting indices are absolute and allow for individual interpretation. In our
model, we estimate a distance function using the inverse coefficient of
resource utilization, rather than treating it as unobservable. We employ a
Tobit model with a translog distance function as our econometric framework.
Applying this model to a sample of 19 airline companies from 2012 to 2021, we
find that: (1) Absolute technical efficiency varies considerably between
companies with medium-haul European airlines being technically the most
efficient, while Asian airlines are the least efficient; (2) Our estimated TEIs
are consistent with the observed data with a decline in efficiency especially
during the Covid-19 crisis and Brexit period; (3) All airlines contained in our
sample would be able to increase their average technical efficiency by 0.209%
if they reduced their average kerosene consumption by 1%; (4) Total factor
productivity (TFP) growth slowed between 2013 and 2019 due to a decrease in
Disembodied Technical Change (DTC) and a small effect from Scale Economies
(SE). Toward the end of our study period, TFP growth seemed increasingly driven
by the SE effect, with a sharp decline in 2020 followed by an equally sharp
recovery in 2021 for most airlines.
arXiv link: http://arxiv.org/abs/2404.04590v1
Fast and simple inner-loop algorithms of static / dynamic BLP estimations
estimating static/dynamic BLP models. It provides the following ideas for
reducing the number of inner-loop iterations: (1). Add a term relating to the
outside option share in the BLP contraction mapping; (2). Analytically
represent the mean product utilities as a function of value functions and solve
for value functions (for dynamic BLP); (3). Combine an acceleration method of
fixed-point iterations, especially the Anderson acceleration. They are
independent and easy to implement. This study shows the good performance of
these methods using numerical experiments.
arXiv link: http://arxiv.org/abs/2404.04494v5
Forecasting with Neuro-Dynamic Programming
gross domestic product (GDP) in the next period given a set of variables that
describes the current situation or state of the economy, including industrial
production, retail trade turnover or economic confidence. Neuro-dynamic
programming (NDP) provides tools to deal with forecasting and other sequential
problems with such high-dimensional states spaces. Whereas conventional
forecasting methods penalises the difference (or loss) between predicted and
actual outcomes, NDP favours the difference between temporally successive
predictions, following an interactive and trial-and-error approach. Past data
provides a guidance to train the models, but in a different way from ordinary
least squares (OLS) and other supervised learning methods, signalling the
adjustment costs between sequential states. We found that it is possible to
train a GDP forecasting model with data concerned with other countries that
performs better than models trained with past data from the tested country
(Portugal). In addition, we found that non-linear architectures to approximate
the value function of a sequential problem, namely, neural networks can perform
better than a simple linear architecture, lowering the out-of-sample mean
absolute forecast error (MAE) by 32% from an OLS model.
arXiv link: http://arxiv.org/abs/2404.03737v1
An early warning system for emerging markets
cascading information spillovers, surges, sudden stops and reversals. With this
in mind, we develop a new online early warning system (EWS) to detect what is
referred to as `concept drift' in machine learning, as a `regime shift' in
economics and as a `change-point' in statistics. The system explores
nonlinearities in financial information flows and remains robust to heavy tails
and dependence of extremes. The key component is the use of conditional
entropy, which captures shifts in various channels of information transmission,
not only in conditional mean or variance. We design a baseline method, and
adapt it to a modern high-dimensional setting through the use of random forests
and copulas. We show the relevance of each system component to the analysis of
emerging markets. The new approach detects significant shifts where
conventional methods fail. We explore when this happens using simulations and
we provide two illustrations when the methods generate meaningful warnings. The
ability to detect changes early helps improve resilience in emerging markets
against shocks and provides new economic and financial insights into their
operation.
arXiv link: http://arxiv.org/abs/2404.03319v2
Marginal Treatment Effects and Monotonicity
violations of Imbens and Angrist (1994) monotonicity? In this note, I present
weaker forms of monotonicity under which popular MTE-based estimands still
identify the parameters of interest.
arXiv link: http://arxiv.org/abs/2404.03235v1
Bayesian Bi-level Sparse Group Regressions for Macroeconomic Density Forecasting
forecasting in a high-dimensional setting where the underlying model exhibits a
known group structure. Our approach is general enough to encompass specific
forecasting models featuring either many covariates, or unknown nonlinearities,
or series sampled at different frequencies. By relying on the novel concept of
bi-level sparsity in time-series econometrics, we construct density forecasts
based on a prior that induces sparsity both at the group level and within
groups. We demonstrate the consistency of both posterior and predictive
distributions. We show that the posterior distribution contracts at the
minimax-optimal rate and, asymptotically, puts mass on a set that includes the
support of the model. Our theory allows for correlation between groups, while
predictors in the same group can be characterized by strong covariation as well
as common characteristics and patterns. Finite sample performance is
illustrated through comprehensive Monte Carlo experiments and a real-data
nowcasting exercise of the US GDP growth rate.
arXiv link: http://arxiv.org/abs/2404.02671v3
Moran's I 2-Stage Lasso: for Models with Spatial Correlation and Endogenous Variables
in the presence of spatial correlation based on Eigenvector Spatial Filtering.
The procedure, called Moran's $I$ 2-Stage Lasso (Mi-2SL), uses a two-stage
Lasso estimator where the Standardised Moran's I is used to set the Lasso
tuning parameter. Unlike existing spatial econometric methods, this has the key
benefit of not requiring the researcher to explicitly model the spatial
correlation process, which is of interest in cases where they are only
interested in removing the resulting bias when estimating the direct effect of
covariates. We show the conditions necessary for consistent and asymptotically
normal parameter estimation assuming the support (relevant) set of eigenvectors
is known. Our Monte Carlo simulation results also show that Mi-2SL performs
well against common alternatives in the presence of spatial correlation. Our
empirical application replicates Cadena and Kovak (2016) instrumental variables
estimates using Mi-2SL and shows that in that case, Mi-2SL can boost the
performance of the first stage.
arXiv link: http://arxiv.org/abs/2404.02584v1
Improved Semi-Parametric Bounds for Tail Probability and Expected Loss: Theory and Applications
only the first and second moments of their distribution are available. The
sharp Chebyshev-type bound for the tail probability and Scarf bound for the
expected loss are widely used in this setting. We revisit the tail behavior of
such quantities with a focus on independence. Conventional primal-dual
approaches from optimization are ineffective in this setting. Instead, we use
probabilistic inequalities to derive new bounds and offer new insights. For
non-identical distributions attaining the tail probability bounds, we show that
the extreme values are equidistant regardless of the distributional
differences. For the bound on the expected loss, we show that the impact of
each random variable on the expected sum can be isolated using an extension of
the Korkine identity. We illustrate how these new results open up abundant
practical applications, including improved pricing of product bundles, more
precise option pricing, more efficient insurance design, and better inventory
management. For example, we establish a new solution to the optimal bundling
problem, yielding a 17% uplift in per-bundle profits, and a new solution to the
inventory problem, yielding a 5.6% cost reduction for a model with 20
retailers.
arXiv link: http://arxiv.org/abs/2404.02400v3
Seemingly unrelated Bayesian additive regression trees for cost-effectiveness analyses in healthcare
Bayesian additive regression trees to be a highly-effective method for
nonparametric regression. Motivated by cost-effectiveness analyses in health
economics, where interest lies in jointly modelling the costs of healthcare
treatments and the associated health-related quality of life experienced by a
patient, we propose a multivariate extension of BART which is applicable in
regression analyses with several dependent outcome variables. Our framework
allows for continuous or binary outcomes and overcomes some key limitations of
existing multivariate BART models by allowing each individual response to be
associated with different ensembles of trees, while still handling dependencies
between the outcomes. In the case of continuous outcomes, our model is
essentially a nonparametric version of seemingly unrelated regression.
Likewise, our proposal for binary outcomes is a nonparametric generalisation of
the multivariate probit model. We give suggestions for easily interpretable
prior distributions, which allow specification of both informative and
uninformative priors. We provide detailed discussions of MCMC sampling methods
to conduct posterior inference. Our methods are implemented in the R package
"subart". We showcase their performance through extensive simulation
experiments and an application to an empirical case study from health
economics. By also accommodating propensity scores in a manner befitting a
causal analysis, we find substantial evidence for a novel trauma care
intervention's cost-effectiveness.
arXiv link: http://arxiv.org/abs/2404.02228v3
Robustly estimating heterogeneity in factorial data using Rashomon Partitions
statistical models to articulate how the outcome of interest varies with
combinations of observable covariates. Choosing a model that is too simple can
obfuscate important heterogeneity in outcomes between covariate groups, while
too much complexity risks identifying spurious patterns. In this paper, we
propose a novel Bayesian framework for model uncertainty called Rashomon
Partition Sets (RPSs). The RPS consists of all models that have posterior
density close to the maximum a posteriori (MAP) model. We construct the RPS by
enumeration, rather than sampling, which ensures that we explore all models
models with high evidence in the data, even if they offer dramatically
different substantive explanations. We use a l0 prior, which allows the allows
us to capture complex heterogeneity without imposing strong assumptions about
the associations between effects, showing this prior is minimax optimal from an
information-theoretic perspective. We characterize the approximation error of
(functions of) parameters computed conditional on being in the RPS relative to
the entire posterior. We propose an algorithm to enumerate the RPS from the
class of models that are interpretable and unique, then provide bounds on the
size of the RPS. We give simulation evidence along with three empirical
examples: price effects on charitable giving, heterogeneity in chromosomal
structure, and the introduction of microfinance.
arXiv link: http://arxiv.org/abs/2404.02141v4
The impact of geopolitical risk on the international agricultural market: Empirical analysis based on the GJR-GARCH-MIDAS model
outbreaks of geopolitical conflicts worldwide. Geopolitical risk has emerged as
a significant threat to regional and global peace, stability, and economic
prosperity, causing serious disruptions to the global food system and food
security. Focusing on the international food market, this paper builds
different dimensions of geopolitical risk measures based on the random matrix
theory and constructs single- and two-factor GJR-GARCH-MIDAS models with fixed
time span and rolling window, respectively, to investigate the impact of
geopolitical risk on food market volatility. The findings indicate that
modeling based on rolling window performs better in describing the overall
volatility of the wheat, maize, soybean, and rice markets, and the two-factor
models generally exhibit stronger explanatory power in most cases. In terms of
short-term fluctuations, all four staple food markets demonstrate obvious
volatility clustering and high volatility persistence, without significant
asymmetry. Regarding long-term volatility, the realized volatility of wheat,
maize, and soybean significantly exacerbates their long-run market volatility.
Additionally, geopolitical risks of different dimensions show varying
directions and degrees of effects in explaining the long-term market volatility
of the four staple food commodities. This study contributes to the
understanding of the macro-drivers of food market fluctuations, provides useful
information for investment using agricultural futures, and offers valuable
insights into maintaining the stable operation of food markets and safeguarding
global food security.
arXiv link: http://arxiv.org/abs/2404.01641v1
Heterogeneous Treatment Effects and Causal Mechanisms
identification and estimation of causal effects. However, understanding which
mechanisms produce measured causal effects remains a challenge. A dominant
current approach to the quantitative evaluation of mechanisms relies on the
detection of heterogeneous treatment effects with respect to pre-treatment
covariates. This paper develops a framework to understand when the existence of
such heterogeneous treatment effects can support inferences about the
activation of a mechanism. We show first that this design cannot provide
evidence of mechanism activation without an additional, generally implicit,
assumption. Further, even when this assumption is satisfied, if a measured
outcome is produced by a non-linear transformation of a directly-affected
outcome of theoretical interest, heterogeneous treatment effects are not
informative of mechanism activation. We provide novel guidance for
interpretation and research design in light of these findings.
arXiv link: http://arxiv.org/abs/2404.01566v3
Estimating Heterogeneous Effects: Applications to Labor Economics
heterogeneous effects, a researcher compares various units. Examples of
research designs include children moving between different neighborhoods,
workers moving between firms, patients migrating from one city to another, and
banks offering loans to different firms. We present a unified framework for
these settings, based on a linear model with normal random coefficients and
normal errors. Using the model, we discuss how to recover the mean and
dispersion of effects, other features of their distribution, and to construct
predictors of the effects. We provide moment conditions on the model's
parameters, and outline various estimation strategies. A main objective of the
paper is to clarify some of the underlying assumptions by highlighting their
economic content, and to discuss and inform some of the key practical choices.
arXiv link: http://arxiv.org/abs/2404.01495v1
Convolution-t Distributions
convolutions of heterogeneous multivariate t-distributions. Unlike commonly
used heavy-tailed distributions, the multivariate convolution-t distributions
embody cluster structures with flexible nonlinear dependencies and
heterogeneous marginal distributions. Importantly, convolution-t distributions
have simple density functions that facilitate estimation and likelihood-based
inference. The characteristic features of convolution-t distributions are found
to be important in an empirical analysis of realized volatility measures and
help identify their underlying factor structure.
arXiv link: http://arxiv.org/abs/2404.00864v1
Estimating sample paths of Gauss-Markov processes from noisy data
Gauss-Markov process, given noisy observations of points on a sample path.
These moments depend on the process's mean and covariance functions, and on the
conditional moments of the sampled points. I study the Brownian motion and
bridge as special cases.
arXiv link: http://arxiv.org/abs/2404.00784v1
Policy Learning for Optimal Dynamic Treatment Regimes with Observational Data
assignments, in which individuals receive a sequence of interventions over
multiple stages. We study the statistical learning of optimal dynamic treatment
regimes (DTRs) that determine the optimal treatment assignment for each
individual at each stage based on their evolving history. We propose a novel,
doubly robust, classification-based method for learning the optimal DTR from
observational data under the sequential ignorability assumption. The method
proceeds via backward induction: at each stage, it constructs and maximizes an
augmented inverse probability weighting (AIPW) estimator of the policy value
function to learn the optimal stage-specific policy. We show that the resulting
DTR achieves an optimal convergence rate of $n^{-1/2}$ for welfare regret under
mild convergence conditions on estimators of the nuisance components.
arXiv link: http://arxiv.org/abs/2404.00221v7
Sequential Synthetic Difference in Differences
SDiD) estimator for event studies with staggered treatment adoption,
particularly when the parallel trends assumption fails. The method uses an
iterative imputation procedure on aggregated data, where estimates for
early-adopting cohorts are used to construct counterfactuals for later ones. We
prove the estimator is asymptotically equivalent to an infeasible oracle OLS
estimator within a linear model with interactive fixed effects. This key
theoretical result provides a foundation for standard inference by establishing
asymptotic normality and clarifying the estimator's efficiency. By offering a
robust and transparent method with formal statistical guarantees, Sequential
SDiD is a powerful alternative to conventional difference-in-differences
strategies.
arXiv link: http://arxiv.org/abs/2404.00164v2
Dynamic Analyses of Contagion Risk and Module Evolution on the SSE A-Shares Market Based on Minimum Information Entropy
exacerbating the abnormal market volatilities and risk contagion. Based on
daily stock returns in the Shanghai Stock Exchange (SSE) A-shares, this paper
divides the period between 2005 and 2018 into eight bull and bear market stages
to investigate interactive patterns in the Chinese financial market. We employ
the LASSO method to construct the stock network and further use the Map
Equation method to analyze the evolution of modules in the SSE A-shares market.
Empirical results show: (1) The connected effect is more significant in bear
markets than bull markets; (2) A system module can be found in the network
during the first four stages, and the industry aggregation effect leads to
module differentiation in the last four stages; (3) Some stocks have leading
effects on others throughout eight periods, and medium- and small-cap stocks
with poor financial conditions are more likely to become risk sources,
especially in bear markets. Our conclusions are beneficial to improving
investment strategies and making regulatory policies.
arXiv link: http://arxiv.org/abs/2403.19439v1
Dynamic Correlation of Market Connectivity, Risk Spillover and Abnormal Volatility in Stock Price
capital markets and contributes to interior risk contagion and spillover
effects. We compare Shanghai Stock Exchange A-shares (SSE A-shares) during
tranquil periods, with high leverage periods associated with the 2015 subprime
mortgage crisis. We use Pearson correlations of returns, the maximum strongly
connected subgraph, and $3\sigma$ principle to iteratively determine the
threshold value for building a dynamic correlation network of SSE A-shares.
Analyses are carried out based on the networking structure, intra-sector
connectivity, and node status, identifying several contributions. First,
compared with tranquil periods, the SSE A-shares network experiences a more
significant small-world and connective effect during the subprime mortgage
crisis and the high leverage period in 2015. Second, the finance, energy and
utilities sectors have a stronger intra-industry connectivity than other
sectors. Third, HUB nodes drive the growth of the SSE A-shares market during
bull periods, while stocks have a think-tail degree distribution in bear
periods and show distinct characteristics in terms of market value and finance.
Granger linear and non-linear causality networks are also considered for the
comparison purpose. Studies on the evolution of inter-cycle connectivity in the
SSE A-share market may help investors improve portfolios and develop more
robust risk management policies.
arXiv link: http://arxiv.org/abs/2403.19363v1
Distributional Treatment Effect with Latent Rank Invariance
impact: "is the treatment Pareto-improving?", "what is the proportion of people
who are better off under the treatment?", etc. However, even in the simple case
of a binary random treatment, existing analysis has been mostly limited to an
average treatment effect or a quantile treatment effect, due to the fundamental
limitation that we cannot simultaneously observe both treated potential outcome
and untreated potential outcome for a given unit. This paper assumes a
conditional independence assumption that the two potential outcomes are
independent of each other given a scalar latent variable. With a specific
example of strictly increasing conditional expectation, I label the latent
variable as 'latent rank' and motivate the identifying assumption as 'latent
rank invariance.' In implementation, I assume a finite support on the latent
variable and propose an estimation strategy based on a nonnegative matrix
factorization. A limiting distribution is derived for the distributional
treatment effect estimator, using Neyman orthogonality.
arXiv link: http://arxiv.org/abs/2403.18503v3
Statistical Inference of Optimal Allocations I: Regularities and their Implications
statistical optimal allocation problems. We derive Hadamard differentiability
of the value functions through analyzing the properties of the sorting operator
using tools from geometric measure theory. Building on our Hadamard
differentiability results, we apply the functional delta method to obtain the
asymptotic properties of the value function process for the binary constrained
optimal allocation problem and the plug-in ROC curve estimator. Moreover, the
convexity of the optimal allocation value functions facilitates demonstrating
the degeneracy of first order derivatives with respect to the policy. We then
present a double / debiased estimator for the value functions. Importantly, the
conditions that validate Hadamard differentiability justify the margin
assumption from the statistical classification literature for the fast
convergence rate of plug-in methods.
arXiv link: http://arxiv.org/abs/2403.18248v3
Deconvolution from two order statistics
ranking. This paper shows that the classical measurement error model with
independent and additive measurement errors is identified nonparametrically
using only two order statistics of repeated measurements. The identification
result confirms a hypothesis by Athey and Haile (2002) for a symmetric
ascending auction model with unobserved heterogeneity. Extensions allow for
heterogeneous measurement errors, broadening the applicability to additional
empirical settings, including asymmetric auctions and wage offer models. We
adapt an existing simulated sieve estimator and illustrate its performance in
finite samples.
arXiv link: http://arxiv.org/abs/2403.17777v1
The inclusive Synthetic Control Method
synthetic control methods that includes units in the donor pool potentially
affected, directly or indirectly, by an intervention. This method is ideal for
situations where including treated units in the donor pool is essential or
where donor units may experience spillover effects. The iSCM is straightforward
to implement with most synthetic control estimators. As an empirical
illustration, we re-estimate the causal effect of German reunification on GDP
per capita, accounting for spillover effects from West Germany to Austria.
arXiv link: http://arxiv.org/abs/2403.17624v2
Resistant Inference in Instrumental Variable Models
if the data is contaminated. For instance, one outlying observation can be
enough to change the outcome of a test. We develop a framework to construct
testing procedures that are robust to weak instruments, outliers and
heavy-tailed errors in the instrumental variable model. The framework is
constructed upon M-estimators. By deriving the influence functions of the
classical weak instrument robust tests, such as the Anderson-Rubin test, K-test
and the conditional likelihood ratio (CLR) test, we prove their unbounded
sensitivity to infinitesimal contamination. Therefore, we construct
contamination resistant/robust alternatives. In particular, we show how to
construct a robust CLR statistic based on Mallows type M-estimators and show
that its asymptotic distribution is the same as that of the (classical) CLR
statistic. The theoretical results are corroborated by a simulation study.
Finally, we revisit three empirical studies affected by outliers and
demonstrate how the new robust tests can be used in practice.
arXiv link: http://arxiv.org/abs/2403.16844v1
Privacy-Protected Spatial Autoregressive Model
effects. However, with an increasing emphasis on data privacy, data providers
often implement privacy protection measures that make classical SAR models
inapplicable. In this study, we introduce a privacy-protected SAR model with
noise-added response and covariates to meet privacy-protection requirements.
However, in this scenario, the traditional quasi-maximum likelihood estimator
becomes infeasible because the likelihood function cannot be directly
formulated. To address this issue, we first consider an explicit expression for
the likelihood function with only noise-added responses. Then, we develop
techniques to correct the biases for derivatives introduced by noise.
Correspondingly, a Newton-Raphson-type algorithm is proposed to obtain the
estimator, leading to a corrected likelihood estimator. To further enhance
computational efficiency, we introduce a corrected least squares estimator
based on the idea of bias correction. These two estimation methods ensure both
data security and the attainment of statistically valid estimators. Theoretical
analysis of both estimators is carefully conducted, statistical inference
methods and model extensions are discussed. The finite sample performances of
different methods are demonstrated through extensive simulations and the
analysis of a real dataset.
arXiv link: http://arxiv.org/abs/2403.16773v2
Quasi-randomization tests for network interference
the potential outcome of other units in the population. Testing for spillover
effects in this setting makes the null hypothesis non-sharp. An interesting
approach to tackling the non-sharp nature of the null hypothesis in this setup
is constructing conditional randomization tests such that the null is sharp on
the restricted population. In randomized experiments, conditional randomized
tests hold finite sample validity and are assumption-lean. In this paper, we
incorporate the network amongst the population as a random variable instead of
being fixed. We propose a new approach that builds a conditional
quasi-randomization test. To build the (non-sharp) null distribution of no
spillover effects, we use random graph null models. We show that our method is
exactly valid in finite samples under mild assumptions. Our method displays
enhanced power over state-of-the-art methods, with a substantial improvement in
cluster randomized trials. We illustrate our methodology to test for
interference in a weather insurance adoption experiment run in rural China.
arXiv link: http://arxiv.org/abs/2403.16673v3
Optimal testing in a class of nonregular models
models with parameter-dependent support. We consider both one-sided and
two-sided hypothesis testing and develop asymptotically uniformly most powerful
tests based on a limit experiment. Our two-sided test becomes asymptotically
uniformly most powerful without imposing further restrictions such as
unbiasedness, and can be inverted to construct a confidence set for the
nonregular parameter. Simulation results illustrate desirable finite sample
properties of the proposed tests.
arXiv link: http://arxiv.org/abs/2403.16413v2
The Informativeness of Combined Experimental and Observational Data under Dynamic Selection
on the Treated Survivors (ATETS; Vikstrom et al., 2018) in the absence of
long-term experimental data, utilizing available long-term observational data
instead. We establish two theoretical results. First, it is impossible to
obtain informative bounds for the ATETS with no model restriction and no
auxiliary data. Second, to overturn this negative result, we explore as a
promising avenue the recent econometric developments in combining experimental
and observational data (e.g., Athey et al., 2020, 2019); we indeed find that
exploiting short-term experimental data can be informative without imposing
classical model restrictions. Furthermore, building on Chesher and Rosen
(2017), we explore how to systematically derive sharp identification bounds,
exploiting both the novel data-combination principles and classical model
restrictions. Applying the proposed method, we explore what can be learned
about the long-run effects of job training programs on employment without
long-term experimental data.
arXiv link: http://arxiv.org/abs/2403.16177v1
Liquidity Jump, Liquidity Diffusion, and Treatment on Wash Trading of Crypto Assets
jump and liquidity diffusion. We show that liquidity diffusion has a higher
correlation with crypto wash trading than liquidity jump and demonstrate that
treatment on wash trading significantly reduces the level of liquidity
diffusion, but only marginally reduces that of liquidity jump. We confirm that
the autoregressive models are highly effective in modeling the
liquidity-adjusted return with and without the treatment on wash trading. We
argue that treatment on wash trading is unnecessary in modeling established
crypto assets that trade in unregulated but mainstream exchanges.
arXiv link: http://arxiv.org/abs/2404.07222v3
Debiased Machine Learning when Nuisance Parameters Appear in Indicator Functions
in indicator functions. An important example is maximized average welfare gain
under optimal treatment assignment rules. For asymptotically valid inference
for a parameter of interest, the current literature on debiased machine
learning relies on Gateaux differentiability of the functions inside moment
conditions, which does not hold when nuisance parameters appear in indicator
functions. In this paper, we propose smoothing the indicator functions, and
develop an asymptotic distribution theory for this class of models. The
asymptotic behavior of the proposed estimator exhibits a trade-off between bias
and variance due to smoothing. We study how a parameter which controls the
degree of smoothing can be chosen optimally to minimize an upper bound of the
asymptotic mean squared error. A Monte Carlo simulation supports the asymptotic
distribution theory, and an empirical example illustrates the implementation of
the method.
arXiv link: http://arxiv.org/abs/2403.15934v2
Difference-in-Differences with Unpoolable Data
effects but is infeasible in settings where data are unpoolable due to privacy
concerns or legal restrictions on data sharing, particularly across
jurisdictions. In this study, we identify and relax the assumption of data
poolability in DID estimation. We propose an innovative approach to estimate
DID with unpoolable data (UN-DID) which can accommodate covariates, multiple
groups, and staggered adoption. Through analytical proofs and Monte Carlo
simulations, we show that UN-DID and conventional DID estimates of the average
treatment effect and standard errors are equal and unbiased in settings without
covariates. With covariates, both methods produce estimates that are unbiased,
equivalent, and converge to the true value. The estimates differ slightly but
the statistical inference and substantive conclusions remain the same. Two
empirical examples with real-world data further underscore UN-DID's utility.
The UN-DID method allows the estimation of cross-jurisdictional treatment
effects with unpoolable data, enabling better counterfactuals to be used and
new research questions to be answered.
arXiv link: http://arxiv.org/abs/2403.15910v3
Tests for almost stochastic dominance
characterize both strict and almost stochastic dominance. Based on this index,
we derive an estimator for the minimum violation ratio (MVR), also known as the
critical parameter, of the almost stochastic ordering condition between two
variables. We determine the asymptotic properties of the empirical 2DSD index
and MVR for the most frequently used stochastic orders. We also provide
conditions under which the bootstrap estimators of these quantities are
strongly consistent. As an application, we develop consistent bootstrap testing
procedures for almost stochastic dominance. The performance of the tests is
checked via simulations and the analysis of real data.
arXiv link: http://arxiv.org/abs/2403.15258v1
Modelling with Sensitive Variables
variables, or both represent sensitive data. We introduce a novel
discretization method that preserves data privacy when working with such
variables. A multiple discretization method is proposed that utilizes
information from the different discretization schemes. We show convergence in
distribution for the unobserved variable and derive the asymptotic properties
of the OLS estimator for linear models. Monte Carlo simulation experiments
presented support our theoretical findings. Finally, we contrast our method
with a differential privacy method to estimate the Australian gender wage gap.
arXiv link: http://arxiv.org/abs/2403.15220v3
Fast TTC Computation
Trading Cycles (TTC) that delivers O(1) computational speed, that is speed
independent of the number of agents and objects in the system. The proposed
methodology is well suited for complex large-dimensional problems like housing
choice. The methodology retains all the properties of TTC, namely,
Pareto-efficiency, individual rationality and strategy-proofness.
arXiv link: http://arxiv.org/abs/2403.15111v1
Estimating Causal Effects with Double Machine Learning -- A Method Evaluation
very active research area. In recent years, researchers have developed new
frameworks which use machine learning to relax classical assumptions necessary
for the estimation of causal effects. In this paper, we review one of the most
prominent methods - "double/debiased machine learning" (DML) - and empirically
evaluate it by comparing its performance on simulated data relative to more
traditional statistical methods, before applying it to real-world data. Our
findings indicate that the application of a suitably flexible machine learning
algorithm within DML improves the adjustment for various nonlinear confounding
relationships. This advantage enables a departure from traditional functional
form assumptions typically necessary in causal effect estimation. However, we
demonstrate that the method continues to critically depend on standard
assumptions about causal structure and identification. When estimating the
effects of air pollution on housing prices in our application, we find that DML
estimates are consistently larger than estimates of less flexible methods. From
our overall results, we provide actionable recommendations for specific choices
researchers must make when applying DML in practice.
arXiv link: http://arxiv.org/abs/2403.14385v2
A Gaussian smooth transition vector autoregressive model: An application to the macroeconomic effects of severe weather shocks
Gaussian conditional distribution and transition weights that, for a $p$th
order model, depend on the full distribution of the preceding $p$ observations.
Specifically, the transition weight of each regime increases in its relative
weighted likelihood. This data-driven approach facilitates capturing complex
switching dynamics, enhancing the identification of gradual regime shifts. In
an empirical application to the macroeconomic effects of a severe weather
shock, we find that in monthly U.S. data from 1961:1 to 2022:3, the shock has
stronger impact in the regime prevailing in the early part of the sample and in
certain crisis periods than in the regime dominating the latter part of the
sample. This suggests overall adaptation of the U.S. economy to severe weather
over time.
arXiv link: http://arxiv.org/abs/2403.14216v3
Fused LASSO as Non-Crossing Quantile Regression
quantile crossing due to data limitations. While existing literature addresses
this through post-processing of the fitted quantiles, these methods do not
correct the estimated coefficients. We advocate for imposing non-crossing
constraints during estimation and demonstrate their equivalence to fused LASSO
with quantile-specific shrinkage parameters. By re-examining Growth-at-Risk
through an interquantile shrinkage lens, we achieve improved left-tail
forecasts and better identification of variables that drive quantile variation.
We show that these improvements have ramifications for policy tools such as
Expected Shortfall and Quantile Local Projections.
arXiv link: http://arxiv.org/abs/2403.14036v3
Policy Relevant Treatment Effects with Multidimensional Unobserved Heterogeneity
treatment effects using instrumental variables. In this framework, the
treatment selection may depend on multidimensional unobserved heterogeneity. We
derive bilinear constraints on the target parameter by extracting information
from identifiable estimands. We apply a convex relaxation method to these
bilinear constraints and provide conservative yet computationally simple
bounds. Our convex-relaxation bounds extend and robustify the bounds by
Mogstad, Santos, and Torgovitsky (2018) which require the threshold-crossing
structure for the treatment: if this condition holds, our bounds are simplified
to theirs for a large class of target parameters; even if it does not, our
bounds include the true parameter value whereas theirs may not and are
sometimes empty. Linear shape restrictions can be easily incorporated to narrow
the proposed bounds. Numerical and simulation results illustrate the
informativeness of our convex-relaxation bounds.
arXiv link: http://arxiv.org/abs/2403.13738v2
Robust Inference in Locally Misspecified Bipartite Networks
networks under local misspecification. We focus on a class of dyadic network
models with misspecified conditional moment restrictions. The framework of
misspecification is local, as the effect of misspecification varies with the
sample size. We utilize this local asymptotic approach to construct a robust
estimator that is minimax optimal for the mean square error within a
neighborhood of misspecification. Additionally, we introduce bias-aware
confidence intervals that account for the effect of the local misspecification.
These confidence intervals have the correct asymptotic coverage for the true
parameter of interest under sparse network asymptotics. Monte Carlo experiments
demonstrate that the robust estimator performs well in finite samples and
sparse networks. As an empirical illustration, we study the formation of a
scientific collaboration network among economists.
arXiv link: http://arxiv.org/abs/2403.13725v1
Multifractal wavelet dynamic mode decomposition modeling for marketing time series
prices the most accessible, and our clients satisfied, thus ensuring our brand
has the widest distribution. This requires sophisticated and advanced
understanding of the whole related network. Indeed, marketing data may exist in
different forms such as qualitative and quantitative data. However, in the
literature, it is easily noted that large bibliographies may be collected about
qualitative studies, while only a few studies adopt a quantitative point of
view. This is a major drawback that results in marketing science still focusing
on design, although the market is strongly dependent on quantities such as
money and time. Indeed, marketing data may form time series such as brand sales
in specified periods, brand-related prices over specified periods, market
shares, etc. The purpose of the present work is to investigate some marketing
models based on time series for various brands. This paper aims to combine the
dynamic mode decomposition and wavelet decomposition to study marketing series
due to both prices, and volume sales in order to explore the effect of the time
scale on the persistence of brand sales in the market and on the forecasting of
such persistence, according to the characteristics of the brand and the related
market competition or competitors. Our study is based on a sample of Saudi
brands during the period 22 November 2017 to 30 December 2021.
arXiv link: http://arxiv.org/abs/2403.13361v1
Composite likelihood estimation of stationary Gaussian processes with a view toward stochastic volatility
continuous-time stationary Gaussian processes. We derive the asymptotic theory
of the associated maximum composite likelihood estimator. We implement our
approach on a pair of models that has been proposed to describe the random
log-spot variance of financial asset returns. A simulation study shows that it
delivers good performance in these settings and improves upon a
method-of-moments estimation. In an application, we inspect the dynamic of an
intraday measure of spot variance computed with high-frequency data from the
cryptocurrency market. The empirical evidence supports a mechanism, where the
short- and long-term correlation structure of stochastic volatility are
decoupled in order to capture its properties at different time scales.
arXiv link: http://arxiv.org/abs/2403.12653v1
Inflation Target at Risk: A Time-varying Parameter Distributional Regression
dynamic and evolving characteristics of economic, social, and environmental
factors that consistently reshape the fundamental patterns and relationships
governing these variables. To better understand the distributional dynamics
beyond the central tendency, this paper introduces a novel semi-parametric
approach for constructing time-varying conditional distributions, relying on
the recent advances in distributional regression. We present an efficient
precision-based Markov Chain Monte Carlo algorithm that simultaneously
estimates all model parameters while explicitly enforcing the monotonicity
condition on the conditional distribution function. Our model is applied to
construct the forecasting distribution of inflation for the U.S., conditional
on a set of macroeconomic and financial indicators. The risks of future
inflation deviating excessively high or low from the desired range are
carefully evaluated. Moreover, we provide a thorough discussion about the
interplay between inflation and unemployment rates during the Global Financial
Crisis, COVID, and the third quarter of 2023.
arXiv link: http://arxiv.org/abs/2403.12456v1
Robust Estimation and Inference for Categorical Data
continuously distributed data, contamination in categorical data is largely
overlooked. This is regrettable because many datasets are categorical and
oftentimes suffer from contamination. Examples include inattentive responding
and bot responses in questionnaires or zero-inflated count data. We propose a
novel class of contamination-robust estimators of models for categorical data,
coined $C$-estimators (“$C$” for categorical). We show that the countable and
possibly finite sample space of categorical data results in non-standard
theoretical properties. Notably, in contrast to classic robustness theory,
$C$-estimators can be simultaneously robust and fully efficient at the
postulated model. In addition, a certain particularly robust specification
fails to be asymptotically Gaussian at the postulated model, but is
asymptotically Gaussian in the presence of contamination. We furthermore
propose a diagnostic test to identify categorical outliers and demonstrate the
enhanced robustness of $C$-estimators in a simulation study.
arXiv link: http://arxiv.org/abs/2403.11954v3
Identification of Information Structures in Bayesian Games
distribution in an incomplete information game infer the underlying information
structure? We investigate this issue in a general linear-quadratic-Gaussian
framework. A simple class of canonical information structures is offered and
proves rich enough to rationalize any possible equilibrium action distribution
that can arise under an arbitrary information structure. We show that the class
is parsimonious in the sense that the relevant parameters can be uniquely
pinned down by an observed equilibrium outcome, up to some qualifications. Our
result implies, for example, that the accuracy of each agent's signal about the
state is identified, as measured by how much observing the signal reduces the
state variance. Moreover, we show that a canonical information structure
characterizes the lower bound on the amount by which each agent's signal can
reduce the state variance, across all observationally equivalent information
structures. The lower bound is tight, for example, when the actual information
structure is uni-dimensional, or when there are no strategic interactions among
agents, but in general, there is a gap since agents' strategic motives confound
their private information about fundamental and strategic uncertainty.
arXiv link: http://arxiv.org/abs/2403.11333v1
Nonparametric Identification and Estimation with Non-Classical Errors-in-Variables
regression function when a covariate is mismeasured. The measurement error need
not be classical. Employing the small measurement error approximation, we
establish nonparametric identification under weak and easy-to-interpret
conditions on the instrumental variable. The paper also provides nonparametric
estimators of the regression function and derives their rates of convergence.
arXiv link: http://arxiv.org/abs/2403.11309v1
Comprehensive OOS Evaluation of Predictive Algorithms with Statistical Decision Theory
decision theory (SDT) should replace the current practice of K-fold and Common
Task Framework validation in machine learning (ML) research on prediction. SDT
provides a formal frequentist framework for performing comprehensive OOS
evaluation across all possible (1) training samples, (2) populations that may
generate training data, and (3) populations of prediction interest. Regarding
feature (3), we emphasize that SDT requires the practitioner to directly
confront the possibility that the future may not look like the past and to
account for a possible need to extrapolate from one population to another when
building a predictive algorithm. For specificity, we consider treatment choice
using conditional predictions with alternative restrictions on the state space
of possible populations that may generate training data. We discuss application
of SDT to the problem of predicting patient illness to inform clinical decision
making. SDT is simple in abstraction, but it is often computationally demanding
to implement. We call on ML researchers, econometricians, and statisticians to
expand the domain within which implementation of SDT is tractable.
arXiv link: http://arxiv.org/abs/2403.11016v3
Macroeconomic Spillovers of Weather Shocks across U.S. States
economic activity and cross-border spillovers that operate through economic
linkages between U.S. states. To this end, we use emergency declarations
triggered by natural disasters and estimate their effects using a monthly
Global Vector Autoregressive (GVAR) model for U.S. states. Impulse responses
highlight the nationwide effects of weather-related disasters that hit
individual regions. Taking into account economic linkages between states allows
capturing much stronger spillovers than those associated with mere spatial
proximity. The results underscore the importance of geographic heterogeneity
for impact evaluation and the critical role of supply-side propagation
mechanisms.
arXiv link: http://arxiv.org/abs/2403.10907v3
Limits of Approximating the Median Treatment Effect
inference. However, it does not necessarily capture the heterogeneity in the
data, and several approaches have been proposed to tackle the issue, including
estimating the Quantile Treatment Effects. In the finite population setting
containing $n$ individuals, with treatment and control values denoted by the
potential outcome vectors $a, b$, much of the prior work
focused on estimating median$(a) -$ median$(b)$, where
median($\mathbf x$) denotes the median value in the sorted ordering of all the
values in vector $\mathbf x$. It is known that estimating the difference of
medians is easier than the desired estimand of median$(a-b)$, called
the Median Treatment Effect (MTE). The fundamental problem of causal inference
-- for every individual $i$, we can only observe one of the potential outcome
values, i.e., either the value $a_i$ or $b_i$, but not both, makes estimating
MTE particularly challenging. In this work, we argue that MTE is not estimable
and detail a novel notion of approximation that relies on the sorted order of
the values in $a-b$. Next, we identify a quantity called variability
that exactly captures the complexity of MTE estimation. By drawing connections
to instance-optimality studied in theoretical computer science, we show that
every algorithm for estimating the MTE obtains an approximation error that is
no better than the error of an algorithm that computes variability. Finally, we
provide a simple linear time algorithm for computing the variability exactly.
Unlike much prior work, a particular highlight of our work is that we make no
assumptions about how the potential outcome vectors are generated or how they
are correlated, except that the potential outcome values are $k$-ary, i.e.,
take one of $k$ discrete values.
arXiv link: http://arxiv.org/abs/2403.10618v1
Testing Goodness-of-Fit for Conditional Distributions: A New Perspective based on Principal Component Analysis
conditional distributions. The proposed tests are based on a residual marked
empirical process, for which we develop a conditional Principal Component
Analysis. The obtained components provide a basis for various types of new
tests in addition to the omnibus one. Component tests that based on each
component serve as experts in detecting certain directions. Smooth tests that
assemble a few components are also of great use in practice. To further improve
testing performance, we introduce a component selection approach, aiming to
identify the most contributory components. The finite sample performance of the
proposed tests is illustrated through Monte Carlo experiments.
arXiv link: http://arxiv.org/abs/2403.10352v2
A Big Data Approach to Understand Sub-national Determinants of FDI in Africa
corruption, trade openness, access to finance, and political instability.
Existing research mostly focuses on country-level data, with limited
exploration of firm-level data, especially in developing countries. Recognizing
this gap, recent calls for research emphasize the need for qualitative data
analysis to delve into FDI determinants, particularly at the regional level.
This paper proposes a novel methodology, based on text mining and social
network analysis, to get information from more than 167,000 online news
articles to quantify regional-level (sub-national) attributes affecting FDI
ownership in African companies. Our analysis extends information on obstacles
to industrial development as mapped by the World Bank Enterprise Surveys.
Findings suggest that regional (sub-national) structural and institutional
characteristics can play an important role in determining foreign ownership.
arXiv link: http://arxiv.org/abs/2403.10239v1
Invalid proxies and volatility changes
exogenous, permanent breaks that cause IRFs to change across volatility
regimes, even strong, exogenous external instruments yield inconsistent
estimates of the dynamic causal effects. However, if these volatility shifts
are properly incorporated into the analysis through (testable) "stability
restrictions", we demonstrate that the target IRFs are point-identified and can
be estimated consistently under a necessary and sufficient rank condition. If
the shifts in volatility are sufficiently informative, standard asymptotic
inference remains valid even with (i) local-to-zero covariance between the
proxies and the instrumented structural shocks, and (ii) potential failures of
instrument exogeneity. Intuitively, shifts in volatility act similarly to
strong instruments that are correlated with both the target and non-target
shocks. We illustrate the effectiveness of our approach by revisiting a seminal
fiscal proxy-SVAR for the US economy. We detect a sharp change in the size of
the tax multiplier when the narrative tax instrument is complemented with the
decline in unconditional volatility observed during the transition from the
Great Inflation to the Great Moderation. The narrative tax instrument
contributes to identify the tax shock in both regimes, although our empirical
analysis raises concerns about its "statistical" validity.
arXiv link: http://arxiv.org/abs/2403.08753v3
Identifying Treatment and Spillover Effects Using Exposure Contrasts
statistics capturing treatment variation among neighboring units. This paper
studies the causal interpretation of nonparametric analogs of these estimands,
which we refer to as exposure contrasts. We demonstrate that their signs can be
inconsistent with those of the unit-level effects of interest even under
unconfounded assignment. We then provide interpretable restrictions under which
exposure contrasts are sign preserving and therefore have causal
interpretations. We discuss the implications of our results for
cluster-randomized trials, network experiments, and observational settings with
peer effects in selection into treatment.
arXiv link: http://arxiv.org/abs/2403.08183v3
Imputation of Counterfactual Outcomes when the Errors are Predictable
Imputation error can arise because of sampling uncertainty from estimating the
prediction model using the untreated observations, or from out-of-sample
information not captured by the model. While the literature has focused on
sampling uncertainty, it vanishes with the sample size. Often overlooked is the
possibility that the out-of-sample error can be informative about the missing
counterfactual outcome if it is mutually or serially correlated. Motivated by
the best linear unbiased predictor (\blup) of goldberger:62 in a time
series setting, we propose an improved predictor of potential outcome when the
errors are correlated. The proposed \pup\; is practical as it is not restricted
to linear models, can be used with consistent estimators already developed, and
improves mean-squared error for a large class of strong mixing error processes.
Ignoring predictability in the errors can distort conditional inference.
However, the precise impact will depend on the choice of estimator as well as
the realized values of the residuals.
arXiv link: http://arxiv.org/abs/2403.08130v2
Partial Identification of Individual-Level Parameters Using Aggregate Data in a Nonparametric Model
conditional mean outcomes when the researcher only has access to aggregate
data. Unlike the existing literature, I only allow for marginal, not joint,
distributions of covariates in my model of aggregate data. Bounds are obtained
by solving an optimization program and can easily accommodate additional
polyhedral shape restrictions. I provide an empirical illustration of the
method to Rhode Island standardized exam data.
arXiv link: http://arxiv.org/abs/2403.07236v7
Partially identified heteroskedastic SVARs
(SVARs) exploiting a break in the variances of the structural shocks.
Point-identification for this class of models relies on an eigen-decomposition
involving the covariance matrices of reduced-form errors and requires that all
the eigenvalues are distinct. This point-identification, however, fails in the
presence of multiplicity of eigenvalues. This occurs in an empirically relevant
scenario where, for instance, only a subset of structural shocks had the break
in their variances, or where a group of variables shows a variance shift of the
same amount. Together with zero or sign restrictions on the structural
parameters and impulse responses, we derive the identified sets for impulse
responses and show how to compute them. We perform inference on the impulse
response functions, building on the robust Bayesian approach developed for set
identified SVARs. To illustrate our proposal, we present an empirical example
based on the literature on the global crude oil market where the identification
is expected to fail due to multiplicity of eigenvalues.
arXiv link: http://arxiv.org/abs/2403.06879v2
Data-Driven Tuning Parameter Selection for High-Dimensional Vector Autoregressions
series models. The theoretical guarantees established for these estimators
typically require the penalty level to be chosen in a suitable fashion often
depending on unknown population quantities. Furthermore, the resulting
estimates and the number of variables retained in the model depend crucially on
the chosen penalty level. However, there is currently no theoretically founded
guidance for this choice in the context of high-dimensional time series.
Instead, one resorts to selecting the penalty level in an ad hoc manner using,
e.g., information criteria or cross-validation. We resolve this problem by
considering estimation of the perhaps most commonly employed multivariate time
series model, the linear vector autoregressive (VAR) model, and propose
versions of the Lasso, post-Lasso, and square-root Lasso estimators with
penalization chosen in a fully data-driven way. The theoretical guarantees that
we establish for the resulting estimation and prediction errors match those
currently available for methods based on infeasible choices of penalization. We
thus provide a first solution for choosing the penalization in high-dimensional
time series models.
arXiv link: http://arxiv.org/abs/2403.06657v2
Estimating Factor-Based Spot Volatility Matrices with Noisy and Asynchronous High-Frequency Data
satisfying a low-rank plus sparse structure from noisy and asynchronous
high-frequency data collected for an ultra-large number of assets. The noise
processes are allowed to be temporally correlated, heteroskedastic,
asymptotically vanishing and dependent on the efficient prices. We define a
kernel-weighted pre-averaging method to jointly tackle the microstructure noise
and asynchronicity issues, and we obtain uniformly consistent estimates for
latent prices. We impose a continuous-time factor model with time-varying
factor loadings on the price processes, and estimate the common factors and
loadings via a local principal component analysis. Assuming a uniform sparsity
condition on the idiosyncratic volatility structure, we combine the POET and
kernel-smoothing techniques to estimate the spot volatility matrices for both
the latent prices and idiosyncratic errors. Under some mild restrictions, the
estimated spot volatility matrices are shown to be uniformly consistent under
various matrix norms. We provide Monte-Carlo simulation and empirical studies
to examine the numerical performance of the developed estimation methodology.
arXiv link: http://arxiv.org/abs/2403.06246v1
Locally Regular and Efficient Tests in Non-Regular Semiparametric Models
non-regular. I show that C($\alpha$) style tests are locally regular under mild
conditions, including in cases where locally regular estimators do not exist,
such as models which are (semiparametrically) weakly identified. I characterise
the appropriate limit experiment in which to study local (asymptotic)
optimality of tests in the non-regular case and generalise classical power
bounds to this case. I give conditions under which these power bounds are
attained by the proposed C($\alpha$) style tests. The application of the theory
to a single index model and an instrumental variables model is worked out in
detail.
arXiv link: http://arxiv.org/abs/2403.05999v2
Estimating Causal Effects of Discrete and Continuous Treatments with Binary Instruments
causal effects of discrete and continuous treatments with binary instruments.
The basis of our approach is a local copula representation of the joint
distribution of the potential outcomes and unobservables determining treatment
assignment. This representation allows us to introduce an identifying
assumption, so-called copula invariance, that restricts the local dependence of
the copula with respect to the treatment propensity. We show that copula
invariance identifies treatment effects for the entire population and other
subpopulations such as the treated. The identification results are constructive
and lead to practical estimation and inference procedures based on distribution
regression. An application to estimating the effect of sleep on well-being
uncovers interesting patterns of heterogeneity.
arXiv link: http://arxiv.org/abs/2403.05850v2
Semiparametric Inference for Regression-Discontinuity Designs
estimated using local regression methods. Hahn:01 demonstrated that the
identification of the average treatment effect at the cutoff in RDDs relies on
the unconfoundedness assumption and that, without this assumption, only the
local average treatment effect at the cutoff can be identified. In this paper,
we propose a semiparametric framework tailored for identifying the average
treatment effect in RDDs, eliminating the need for the unconfoundedness
assumption. Our approach globally conceptualizes the identification as a
partially linear modeling problem, with the coefficient of a specified
polynomial function of propensity score in the linear component capturing the
average treatment effect. This identification result underpins our
semiparametric inference for RDDs, employing the $P$-spline method to
approximate the nonparametric function and establishing a procedure for
conducting inference within this framework. Through theoretical analysis, we
demonstrate that our global approach achieves a faster convergence rate
compared to the local method. Monte Carlo simulations further confirm that the
proposed method consistently outperforms alternatives across various scenarios.
Furthermore, applications to real-world datasets illustrate that our global
approach can provide more reliable inference for practical problems.
arXiv link: http://arxiv.org/abs/2403.05803v3
Non-robustness of diffusion estimates on networks with measurement error
information spread, and technology adoption. However, small amounts of
mismeasurement are extremely likely in the networks constructed to
operationalize these models. We show that estimates of diffusions are highly
non-robust to this measurement error. First, we show that even when measurement
error is vanishingly small, such that the share of missed links is close to
zero, forecasts about the extent of diffusion will greatly underestimate the
truth. Second, a small mismeasurement in the identity of the initial seed
generates a large shift in the locations of expected diffusion path. We show
that both of these results still hold when the vanishing measurement error is
only local in nature. Such non-robustness in forecasting exists even under
conditions where the basic reproductive number is consistently estimable.
Possible solutions, such as estimating the measurement error or implementing
widespread detection efforts, still face difficulties because the number of
missed links are so small. Finally, we conduct Monte Carlo simulations on
simulated networks, and real networks from three settings: travel data from the
COVID-19 pandemic in the western US, a mobile phone marketing campaign in rural
India, and in an insurance experiment in China.
arXiv link: http://arxiv.org/abs/2403.05704v4
Nonparametric Regression under Cluster Sampling
regression in the presence of cluster dependence. We examine nonparametric
density estimation, Nadaraya-Watson kernel regression, and local linear
estimation. Our theory accommodates growing and heterogeneous cluster sizes. We
derive asymptotic conditional bias and variance, establish uniform consistency,
and prove asymptotic normality. Our findings reveal that under heterogeneous
cluster sizes, the asymptotic variance includes a new term reflecting
within-cluster dependence, which is overlooked when cluster sizes are presumed
to be bounded. We propose valid approaches for bandwidth selection and
inference, introduce estimators of the asymptotic variance, and demonstrate
their consistency. In simulations, we verify the effectiveness of the
cluster-robust bandwidth selection and show that the derived cluster-robust
confidence interval improves the coverage ratio. We illustrate the application
of these methods using a policy-targeting dataset in development economics.
arXiv link: http://arxiv.org/abs/2403.04766v2
A Logarithmic Mean Divisia Index Decomposition of CO$_2$ Emissions from Energy Use in Romania
challenges that lead an extended argue about climate change. The growing trend
in the utilization of fossil fuels for the economic progress and simultaneously
reducing the carbon quantity has turn into a substantial and global challenge.
The aim of this paper is to examine the driving factors of CO$_2$ emissions
from energy sector in Romania during the period 2008-2022 emissions using the
log mean Divisia index (LMDI) method and takes into account five items: CO$_2$
emissions, primary energy resources, energy consumption, gross domestic product
and population, the driving forces of CO$_2$ emissions, based on which it was
calculated the contribution of carbon intensity, energy mixes, generating
efficiency, economy, and population. The results indicate that generating
efficiency effect -90968.57 is the largest inhibiting index while economic
effect is the largest positive index 69084.04 having the role of increasing
CO$_2$ emissions.
arXiv link: http://arxiv.org/abs/2403.04354v1
A dual approach to nonparametric characterization for random utility models
which turns out to be a dual representation of the characterization by Kitamura
and Stoye (2018, ECMA). For a given family of budgets and its "patch"
representation \'a la Kitamura and Stoye, we construct a matrix $\Xi$ of which
each row vector indicates the structure of possible revealed preference
relations in each subfamily of budgets. Then, it is shown that a stochastic
demand system on the patches of budget lines, say $\pi$, is consistent with a
RUM, if and only if $\Xi\pi \geq 1$, where the RHS is the vector of
$1$'s. In addition to providing a concise quantifier-free characterization,
especially when $\pi$ is inconsistent with RUMs, the vector $\Xi\pi$ also
contains information concerning (1) sub-families of budgets in which cyclical
choices must occur with positive probabilities, and (2) the maximal possible
weights on rational choice patterns in a population. The notion of Chv\'atal
rank of polytopes and the duality theorem in linear programming play key roles
to obtain these results.
arXiv link: http://arxiv.org/abs/2403.04328v3
Regularized DeepIV with Model Selection
(IV) regressions. While recent advancements in machine learning have introduced
flexible methods for IV estimation, they often encounter one or more of the
following limitations: (1) restricting the IV regression to be uniquely
identified; (2) requiring minimax computation oracle, which is highly unstable
in practice; (3) absence of model selection procedure. In this paper, we
present the first method and analysis that can avoid all three limitations,
while still enabling general function approximation. Specifically, we propose a
minimax-oracle-free method called Regularized DeepIV (RDIV) regression that can
converge to the least-norm IV solution. Our method consists of two stages:
first, we learn the conditional distribution of covariates, and by utilizing
the learned distribution, we learn the estimator by minimizing a
Tikhonov-regularized loss function. We further show that our method allows
model selection procedures that can achieve the oracle rates in the
misspecified regime. When extended to an iterative estimator, our method
matches the current state-of-the-art convergence rate. Our method is a Tikhonov
regularized variant of the popular DeepIV method with a non-parametric MLE
first-stage estimator, and our results provide the first rigorous guarantees
for this empirically used method, showcasing the importance of regularization
which was absent from the original work.
arXiv link: http://arxiv.org/abs/2403.04236v1
Extracting Mechanisms from Heterogeneous Effects: An Identification Strategy for Mediation Analysis
empirical phenomena. Causal mediation analysis offers statistical techniques to
quantify the mediation effects. However, current methods often require multiple
ignorability assumptions or sophisticated research designs. In this paper, we
introduce a novel identification strategy that enables the simultaneous
identification and estimation of treatment and mediation effects. By combining
explicit and implicit mediation analysis, this strategy exploits heterogeneous
treatment effects through a new decomposition of total treatment effects. Monte
Carlo simulations demonstrate that the method is more accurate and precise
across various scenarios. To illustrate the efficiency and efficacy of our
method, we apply it to estimate the causal mediation effects in two studies
with distinct data structures, focusing on common pool resource governance and
voting information. Additionally, we have developed statistical software to
facilitate the implementation of our method.
arXiv link: http://arxiv.org/abs/2403.04131v5
Active Adaptive Experimental Design for Treatment Effect Estimation with Covariate Choices
treatment effects (ATEs). In each round of our adaptive experiment, an
experimenter sequentially samples an experimental unit, assigns a treatment,
and observes the corresponding outcome immediately. At the end of the
experiment, the experimenter estimates an ATE using the gathered samples. The
objective is to estimate the ATE with a smaller asymptotic variance. Existing
studies have designed experiments that adaptively optimize the propensity score
(treatment-assignment probability). As a generalization of such an approach, we
propose optimizing the covariate density as well as the propensity score.
First, we derive the efficient covariate density and propensity score that
minimize the semiparametric efficiency bound and find that optimizing both
covariate density and propensity score minimizes the semiparametric efficiency
bound more effectively than optimizing only the propensity score. Next, we
design an adaptive experiment using the efficient covariate density and
propensity score sequentially estimated during the experiment. Lastly, we
propose an ATE estimator whose asymptotic variance aligns with the minimized
semiparametric efficiency bound.
arXiv link: http://arxiv.org/abs/2403.03589v2
Demystifying and avoiding the OLS "weighting problem": Unmodeled heterogeneity and straightforward solutions
on treatment (D) and covariates (X). Even without unobserved confounding, the
coefficient on D yields a conditional-variance-weighted average of strata-wise
effects, not the average treatment effect. Scholars have proposed
characterizing the severity of these weights, evaluating resulting biases, or
changing investigators' target estimand to the conditional-variance-weighted
effect. We aim to demystify these weights, clarifying how they arise, what they
represent, and how to avoid them. Specifically, these weights reflect
misspecification bias from unmodeled treatment-effect heterogeneity. Rather
than diagnosing or tolerating them, we recommend avoiding the issue altogether,
by relaxing the standard regression assumption of "single linearity" to one of
"separate linearity" (of each potential outcome in the covariates),
accommodating heterogeneity. Numerous methods--including regression imputation
(g-computation), interacted regression, and mean balancing weights--satisfy
this assumption. In many settings, the efficiency cost to avoiding this
weighting problem altogether will be modest and worthwhile.
arXiv link: http://arxiv.org/abs/2403.03299v4
Triple/Debiased Lasso for Statistical Inference of Conditional Average Treatment Effects
Conditional Average Treatment Effects (CATEs), which have garnered attention as
a metric representing individualized causal effects. In our data-generating
process, we assume linear models for the outcomes associated with binary
treatments and define the CATE as a difference between the expected outcomes of
these linear models. This study allows the linear models to be
high-dimensional, and our interest lies in consistent estimation and
statistical inference for the CATE. In high-dimensional linear regression, one
typical approach is to assume sparsity. However, in our study, we do not assume
sparsity directly. Instead, we consider sparsity only in the difference of the
linear models. We first use a doubly robust estimator to approximate this
difference and then regress the difference on covariates with Lasso
regularization. Although this regression estimator is consistent for the CATE,
we further reduce the bias using the techniques in double/debiased machine
learning (DML) and debiased Lasso, leading to $n$-consistency and
confidence intervals. We refer to the debiased estimator as the triple/debiased
Lasso (TDL), applying both DML and debiased Lasso techniques. We confirm the
soundness of our proposed method through simulation studies.
arXiv link: http://arxiv.org/abs/2403.03240v1
Matrix-based Prediction Approach for Intraday Instantaneous Volatility Vector
instantaneous volatility based on Ito semimartingale models using
high-frequency financial data. Several studies have highlighted stylized
volatility time series features, such as interday auto-regressive dynamics and
the intraday U-shaped pattern. To accommodate these volatility features, we
propose an interday-by-intraday instantaneous volatility matrix process that
can be decomposed into low-rank conditional expected instantaneous volatility
and noise matrices. To predict the low-rank conditional expected instantaneous
volatility matrix, we propose the Two-sIde Projected-PCA (TIP-PCA) procedure.
We establish asymptotic properties of the proposed estimators and conduct a
simulation study to assess the finite sample performance of the proposed
prediction method. Finally, we apply the TIP-PCA method to an out-of-sample
instantaneous volatility vector prediction study using high-frequency data from
the S&P 500 index and 11 sector index funds.
arXiv link: http://arxiv.org/abs/2403.02591v3
Applied Causal Inference Powered by ML and AI
inference. The book presents ideas from classical structural equation models
(SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and
structural causal models (SCMs), and covers Double/Debiased Machine Learning
methods to do inference in such models using modern predictive tools.
arXiv link: http://arxiv.org/abs/2403.02467v1
Improved Tests for Mediation
difficult - even asymptotically - by the influence of nuisance parameters.
Classical tests such as likelihood ratio (LR) and Wald (Sobel) tests have very
poor power properties in parts of the parameter space, and many attempts have
been made to produce improved tests, with limited success. In this paper we
show that augmenting the critical region of the LR test can produce a test with
much improved behavior everywhere. In fact, we first show that there exists a
test of this type that is (asymptotically) exact for certain test levels
$\alpha $, including the common choices $\alpha =.01,.05,.10.$ The critical
region of this exact test has some undesirable properties. We go on to show
that there is a very simple class of augmented LR critical regions which
provides tests that are nearly exact, and avoid the issues inherent in the
exact test. We suggest an optimal and coherent member of this class, provide
the table needed to implement the test and to report p-values if desired.
Simulation confirms validity with non-Gaussian disturbances, under
heteroskedasticity, and in a nonlinear (logit) model. A short application of
the method to an entrepreneurial attitudes study is included for illustration.
arXiv link: http://arxiv.org/abs/2403.02144v1
Calibrating doubly-robust estimators with unbalanced treatment assignment
estimator (Chernozhukov et al., 2018), are increasingly popular for the
estimation of the average treatment effect (ATE). However, datasets often
exhibit unbalanced treatment assignments where only a few observations are
treated, leading to unstable propensity score estimations. We propose a simple
extension of the DML estimator which undersamples data for propensity score
modeling and calibrates scores to match the original distribution. The paper
provides theoretical results showing that the estimator retains the DML
estimator's asymptotic properties. A simulation study illustrates the finite
sample performance of the estimator.
arXiv link: http://arxiv.org/abs/2403.01585v2
Minimax-Regret Sample Selection in Randomized Experiments
subpopulations that may have differential benefits from the treatment being
evaluated. We consider the problem of sample selection, i.e., whom to enroll in
a randomized trial, such as to optimize welfare in a heterogeneous population.
We formalize this problem within the minimax-regret framework, and derive
optimal sample-selection schemes under a variety of conditions. Using data from
a COVID-19 vaccine trial, we also highlight how different objectives and
decision rules can lead to meaningfully different guidance regarding optimal
sample allocation.
arXiv link: http://arxiv.org/abs/2403.01386v2
High-Dimensional Tail Index Regression: with An Application to Text Analyses of Viral Posts in Social Media
credits (e.g., "likes") of viral social media posts, we introduce a
high-dimensional tail index regression model and propose methods for estimation
and inference of its parameters. First, we present a regularized estimator,
establish its consistency, and derive its convergence rate. Second, we
introduce a debiasing technique for the regularized estimator to facilitate
inference and prove its asymptotic normality. Third, we extend our approach to
handle large-scale online streaming data using stochastic gradient descent.
Simulation studies corroborate our theoretical findings. We apply these methods
to the text analysis of viral posts on X (formerly Twitter) related to LGBTQ+
topics.
arXiv link: http://arxiv.org/abs/2403.01318v2
Electric vehicle pricing and battery costs: A misaligned assumption
internal combustion engine vehicles (ICEVs), EV adoption is challenged by
higher up-front procurement prices. Existing discourse attributes this price
differential to high battery costs and reasons that lowering these costs will
reduce EV upfront price differentials. However, other factors beyond battery
price may influence prices. Leveraging data for over 400 EV models and trims
sold in the United Sates between 2011-2023, we scrutinize these factors. We
find that contrary to existing discourse, EV MSRP has increased over time
despite declining EV battery costs. We attribute this increase to the growing
accommodation of attributes that strongly influence EV prices but have long
been underappreciated in mainstream discourse. Furthermore, and relevant to
decarbonization efforts, we observe that continued reductions in pack-level
battery costs are unlikely to deliver price parity between EVs and ICEVs. Were
pack level battery costs reduced to zero, EV MSRP would decrease by $4,025,
estimates that are insufficient to offset observed price differences between
EVs and ICEVs. These findings warrant attention as decarbonization efforts
increasingly emphasize EVs as a pathway for complying with domestic and
international climate agreements.
arXiv link: http://arxiv.org/abs/2403.00458v2
Inference for Interval-Identified Parameters Selected from an Estimated Set
average partial effects and welfare is particularly common when using
observational data and experimental data with imperfect compliance due to the
endogeneity of individuals' treatment uptake. In this setting, the researcher
is typically interested in a treatment or policy that is either selected from
the estimated set of best-performers or arises from a data-dependent selection
rule. In this paper, we develop new inference tools for interval-identified
parameters chosen via these forms of selection. We develop three types of
confidence intervals for data-dependent and interval-identified parameters,
discuss how they apply to several examples of interest and prove their uniform
asymptotic validity under weak assumptions.
arXiv link: http://arxiv.org/abs/2403.00422v2
Set-Valued Control Functions
causal effects of interest. While powerful, it requires a strong invertibility
assumption in the selection process, which limits its applicability. This paper
expands the scope of the nonparametric control function approach by allowing
the control function to be set-valued and derive sharp bounds on structural
parameters. The proposed generalization accommodates a wide range of selection
processes involving discrete endogenous variables, random coefficients,
treatment selections with interference, and dynamic treatment selections. The
framework also applies to partially observed or identified controls that are
directly motivated from economic models.
arXiv link: http://arxiv.org/abs/2403.00347v3
Testing Information Ordering for Strategic Agents
players. Specifying a priori an information structure is often difficult for
empirical researchers. We develop a test of information ordering that allows
researchers to examine if the true information structure is at least as
informative as a proposed baseline. We construct a computationally tractable
test statistic by utilizing the notion of Bayes Correlated Equilibrium (BCE) to
translate the ordering of information structures into an ordering of functions.
We apply our test to examine whether hubs provide informational advantages to
certain airlines in addition to market power.
arXiv link: http://arxiv.org/abs/2402.19425v1
An Empirical Analysis of Scam Tokens on Ethereum Blockchain
total revenue generated by counterfeit tokens on Uniswap. It offers a detailed
overview of the counterfeit token fraud process, along with a systematic
summary of characteristics associated with such fraudulent activities observed
in Uniswap. The study primarily examines the relationship between revenue from
counterfeit token scams and their defining characteristics, and analyzes the
influence of market economic factors such as return on market capitalization
and price return on Ethereum. Key findings include a significant increase in
overall transactions of counterfeit tokens on their first day of fraud, and a
rise in upfront fraud costs leading to corresponding increases in revenue.
Furthermore, a negative correlation is identified between the total revenue of
counterfeit tokens and the volatility of Ethereum market capitalization return,
while price return volatility on Ethereum is found to have a positive impact on
counterfeit token revenue, albeit requiring further investigation for a
comprehensive understanding. Additionally, the number of subscribers for the
real token correlates positively with the realized volume of scam tokens,
indicating that a larger community following the legitimate token may
inadvertently contribute to the visibility and success of counterfeit tokens.
Conversely, the number of Telegram subscribers exhibits a negative impact on
the realized volume of scam tokens, suggesting that a higher level of scrutiny
or awareness within Telegram communities may act as a deterrent to fraudulent
activities. Finally, the timing of when the scam token is introduced on the
Ethereum blockchain may have a negative impact on its success. Notably, the
cumulative amount scammed by only 42 counterfeit tokens amounted to almost
11214 Ether.
arXiv link: http://arxiv.org/abs/2402.19399v3
Extremal quantiles of intermediate orders under two-way clustering
We demonstrate that the limiting distribution of the unconditional intermediate
order quantiles in the tails converges to a Gaussian distribution. This is
remarkable as two-way cluster dependence entails potential non-Gaussianity in
general, but extremal quantiles do not suffer from this issue. Building upon
this result, we extend our analysis to extremal quantile regressions of
intermediate order.
arXiv link: http://arxiv.org/abs/2402.19268v2
Unveiling the Potential of Robustness in Selecting Conditional Average Treatment Effect Estimators
interest in estimating the Conditional Average Treatment Effect (CATE). Various
types of CATE estimators have been developed with advancements in machine
learning and causal inference. However, selecting the desirable CATE estimator
through a conventional model validation procedure remains impractical due to
the absence of counterfactual outcomes in observational data. Existing
approaches for CATE estimator selection, such as plug-in and pseudo-outcome
metrics, face two challenges. First, they must determine the metric form and
the underlying machine learning models for fitting nuisance parameters (e.g.,
outcome function, propensity function, and plug-in learner). Second, they lack
a specific focus on selecting a robust CATE estimator. To address these
challenges, this paper introduces a Distributionally Robust Metric (DRM) for
CATE estimator selection. The proposed DRM is nuisance-free, eliminating the
need to fit models for nuisance parameters, and it effectively prioritizes the
selection of a distributionally robust CATE estimator. The experimental results
validate the effectiveness of the DRM method in selecting CATE estimators that
are robust to the distribution shift incurred by covariate shift and hidden
confounders.
arXiv link: http://arxiv.org/abs/2402.18392v2
Quasi-Bayesian Estimation and Inference with Control Functions
nonparametric estimation with Bayesian inference in a two-stage process.
Applied to an endogenous discrete choice model, the approach first uses kernel
or sieve estimators to estimate the control function nonparametrically,
followed by Bayesian methods to estimate the structural parameters. This
combination leverages the advantages of both frequentist tractability for
nonparametric estimation and Bayesian computational efficiency for complicated
structural models. We analyze the asymptotic properties of the resulting
quasi-posterior distribution, finding that its mean provides a consistent
estimator for the parameters of interest, although its quantiles do not yield
valid confidence intervals. However, bootstrapping the quasi-posterior mean
accounts for the estimation uncertainty from the first stage, thereby producing
asymptotically valid confidence intervals.
arXiv link: http://arxiv.org/abs/2402.17374v2
Treatment effects without multicollinearity? Temporal order and the Gram-Schmidt process in causal inference
estimate orthogonal and economically interpretable regression coefficients. We
establish new finite sample properties for the Gram-Schmidt orthogonalization
process. Coefficients are unbiased and stable with lower standard errors than
those from Ordinary Least Squares. We provide conditions under which
coefficients represent average total treatment effects on the treated and
extend the model to groups of ordered and simultaneous regressors. Finally, we
reanalyze two studies that controlled for temporally ordered and collinear
characteristics, including race, education, and income. The new approach
expands Bohren et al.'s decomposition of systemic discrimination into
channel-specific effects and improves significance levels.
arXiv link: http://arxiv.org/abs/2402.17103v2
Towards Generalizing Inferences from Trials to Target Populations
valid estimates with minimal assumptions, serving as a cornerstone for
researchers dedicated to advancing causal inference methods. However, extending
these findings beyond the experimental cohort to achieve externally valid
estimates is crucial for broader scientific inquiry. This paper delves into the
forefront of addressing these external validity challenges, encapsulating the
essence of a multidisciplinary workshop held at the Institute for Computational
and Experimental Research in Mathematics (ICERM), Brown University, in Fall
2023. The workshop congregated experts from diverse fields including social
science, medicine, public health, statistics, computer science, and education,
to tackle the unique obstacles each discipline faces in extrapolating
experimental findings. Our study presents three key contributions: we integrate
ongoing efforts, highlighting methodological synergies across fields; provide
an exhaustive review of generalizability and transportability based on the
workshop's discourse; and identify persistent hurdles while suggesting avenues
for future research. By doing so, this paper aims to enhance the collective
understanding of the generalizability and transportability of causal effects,
fostering cross-disciplinary collaboration and offering valuable insights for
researchers working on refining and applying causal inference methods.
arXiv link: http://arxiv.org/abs/2402.17042v2
Fast Algorithms for Quantile Regression with Selection
Regression with Selection (QRS). The estimation of the parameters that model
self-selection requires the estimation of the entire quantile process several
times. Moreover, closed-form expressions of the asymptotic variance are too
cumbersome, making the bootstrap more convenient to perform inference. Taking
advantage of recent advancements in the estimation of quantile regression,
along with some specific characteristics of the QRS estimation problem, I
propose streamlined algorithms for the QRS estimator. These algorithms
significantly reduce computation time through preprocessing techniques and
quantile grid reduction for the estimation of the copula and slope parameters.
I show the optimization enhancements with some simulations. Lastly, I show how
preprocessing methods can improve the precision of the estimates without
sacrificing computational efficiency. Hence, they constitute a practical
solutions for estimators with non-differentiable and non-convex criterion
functions such as those based on copulas.
arXiv link: http://arxiv.org/abs/2402.16693v1
Information-Enriched Selection of Stationary and Non-Stationary Autoregressions using the Adaptive Lasso
non-stationary regressor in the consistent and oracle-efficient estimation of
autoregressive models using the adaptive Lasso. The enhanced weight builds on a
statistic that exploits distinct orders in probability of the OLS estimator in
time series regressions when the degree of integration differs. We provide
theoretical results on the benefit of our approach for detecting stationarity
when a tuning criterion selects the $\ell_1$ penalty parameter. Monte Carlo
evidence shows that our proposal is superior to using OLS-based weights, as
suggested by Kock [Econom. Theory, 32, 2016, 243-259]. We apply the modified
estimator to model selection for German inflation rates after the introduction
of the Euro. The results indicate that energy commodity price inflation and
headline inflation are best described by stationary autoregressions.
arXiv link: http://arxiv.org/abs/2402.16580v2
Estimating Stochastic Block Models in the Presence of Covariates
connection between two nodes, often referred to as the edge probability,
depends on the unobserved communities each of these nodes belongs to. We
consider a flexible framework in which each edge probability, together with the
probability of community assignment, are also impacted by observed covariates.
We propose a computationally tractable two-step procedure to estimate the
conditional edge probabilities as well as the community assignment
probabilities. The first step relies on a spectral clustering algorithm applied
to a localized adjacency matrix of the network. In the second step, k-nearest
neighbor regression estimates are computed on the extracted communities. We
study the statistical properties of these estimators by providing
non-asymptotic bounds.
arXiv link: http://arxiv.org/abs/2402.16322v1
Inference for Regression with Variables Generated by AI or Machine Learning
estimate latent variables of economic interest, then plug-in the estimates as
covariates in a regression. We show both theoretically and empirically that
naively treating AI/ML-generated variables as "data" leads to biased estimates
and invalid inference. To restore valid inference, we propose two methods: (1)
an explicit bias correction with bias-corrected confidence intervals, and (2)
joint estimation of the regression parameters and latent variables. We
illustrate these ideas through applications involving label imputation,
dimensionality reduction, and index construction via classification and
aggregation.
arXiv link: http://arxiv.org/abs/2402.15585v5
A Combinatorial Central Limit Theorem for Stratified Randomization
randomization, which holds under a Lindeberg-type condition. The theorem allows
for an arbitrary number or sizes of strata, with the sole requirement being
that each stratum contains at least two units. This flexibility accommodates
both a growing number of large and small strata simultaneously, while imposing
minimal conditions. We then apply this result to derive the asymptotic
distributions of two test statistics proposed for instrumental variables
settings in the presence of potentially many strata of unrestricted sizes.
arXiv link: http://arxiv.org/abs/2402.14764v2
Functional Spatial Autoregressive Models
dependent variable is a function that may exhibit functional autocorrelation
with the outcome functions of nearby units. This model can be characterized as
a simultaneous integral equation system, which, in general, does not
necessarily have a unique solution. For this issue, we provide a simple
condition on the magnitude of the spatial interaction to ensure the uniqueness
in data realization. For estimation, to account for the endogeneity caused by
the spatial interaction, we propose a regularized two-stage least squares
estimator based on a basis approximation for the functional parameter. The
asymptotic properties of the estimator including the consistency and asymptotic
normality are investigated under certain conditions. Additionally, we propose a
simple Wald-type test for detecting the presence of spatial effects. As an
empirical illustration, we apply the proposed model and method to analyze age
distributions in Japanese cities.
arXiv link: http://arxiv.org/abs/2402.14763v3
Interference Produces False-Positive Pricing Experiments
randomizing at the article-level, i.e. by changing prices of different products
to identify treatment effects. Due to customers' cross-price substitution
behavior, such experiments suffer from interference bias: the observed
difference between treatment groups in the experiment is typically
significantly larger than the global effect that could be expected after a
roll-out decision of the tested pricing policy. We show in simulations that
such bias can be as large as 100%, and report experimental data implying bias
of similar magnitude. Finally, we discuss approaches for de-biased pricing
experiments, suggesting observational methods as a potentially attractive
alternative to clustering.
arXiv link: http://arxiv.org/abs/2402.14538v1
Enhancing Rolling Horizon Production Planning Through Stochastic Optimization Evaluated by Means of Simulation
arising from fluctuating demand forecasts. Therefore, this article focuses on
the integration of updated customer demand into the rolling horizon planning
cycle. We use scenario-based stochastic programming to solve capacitated lot
sizing problems under stochastic demand in a rolling horizon environment. This
environment is replicated using a discrete event simulation-optimization
framework, where the optimization problem is periodically solved, leveraging
the latest demand information to continually adjust the production plan. We
evaluate the stochastic optimization approach and compare its performance to
solving a deterministic lot sizing model, using expected demand figures as
input, as well as to standard Material Requirements Planning (MRP). In the
simulation study, we analyze three different customer behaviors related to
forecasting, along with four levels of shop load, within a multi-item and
multi-stage production system. We test a range of significant parameter values
for the three planning methods and compute the overall costs to benchmark them.
The results show that the production plans obtained by MRP are outperformed by
deterministic and stochastic optimization. Particularly, when facing tight
resource restrictions and rising uncertainty in customer demand, the use of
stochastic optimization becomes preferable compared to deterministic
optimization.
arXiv link: http://arxiv.org/abs/2402.14506v2
Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation
inference with application to numerous disciplines. While many estimation
strategies have been proposed in the literature, the statistical optimality of
these methods has still remained an open area of investigation, especially in
regimes where these methods do not achieve parametric rates. In this paper, we
adopt the recently introduced structure-agnostic framework of statistical lower
bounds, which poses no structural properties on the nuisance functions other
than access to black-box estimators that achieve some statistical estimation
rate. This framework is particularly appealing when one is only willing to
consider estimation strategies that use non-parametric regression and
classification oracles as black-box sub-processes. Within this framework, we
prove the statistical optimality of the celebrated and widely used doubly
robust estimators for both the Average Treatment Effect (ATE) and the Average
Treatment Effect on the Treated (ATT), as well as weighted variants of the
former, which arise in policy evaluation.
arXiv link: http://arxiv.org/abs/2402.14264v4
The impact of Facebook-Cambridge Analytica data scandal on the USA tech stock market: An event study based on clustering method
scandal, with a particular focus on the Facebook data leakage scandal and its
associated events within the U.S. tech industry and two additional relevant
groups. We employ various metrics including daily spread, volatility,
volume-weighted return, and CAPM-beta for the pre-analysis clustering, and
subsequently utilize CAR (Cumulative Abnormal Return) to evaluate the impact on
firms grouped within these clusters. From a broader industry viewpoint,
significant positive CAARs are observed across U.S. sample firms over the three
days post-scandal announcement, indicating no adverse impact on the tech sector
overall. Conversely, after Facebook's initial quarterly earnings report, it
showed a notable negative effect despite reported positive performance. The
clustering principle should aid in identifying directly related companies and
thus reducing the influence of randomness. This was indeed achieved for the
effect of the key event, namely "The Effect of Congressional Hearing on Certain
Clusters across U.S. Tech Stock Market," which was identified as delayed and
significantly negative. Therefore, we recommend applying the clustering method
when conducting such or similar event studies.
arXiv link: http://arxiv.org/abs/2402.14206v1
Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE
occupational descriptions into the HISCO classification system. The manual work
involved in processing and classifying occupational descriptions is
error-prone, tedious, and time-consuming. We finetune a preexisting language
model (CANINE) to do this automatically, thereby performing in seconds and
minutes what previously took days and weeks. The model is trained on 14 million
pairs of occupational descriptions and HISCO codes in 13 different languages
contributed by 22 different sources. Our approach is shown to have accuracy,
recall, and precision above 90 percent. Our tool breaks the metaphorical HISCO
barrier and makes this data readily available for analysis of occupational
structures with broad applicability in economics, economic history, and various
related disciplines.
arXiv link: http://arxiv.org/abs/2402.13604v2
Vulnerability Webs: Systemic Risk in Software Networks
vulnerability risks through dependencies with substantial economic impact, as
seen in the Crowdstrike and HeartBleed incidents. We analyze 52,897
dependencies across 16,102 Python repositories using a strategic network
formation model incorporating observable and unobservable heterogeneity.
Through variational approximation of conditional distributions, we demonstrate
that dependency creation generates negative externalities. Vulnerability
propagation, modeled as a contagion process, shows that popular protection
heuristics are ineffective. AI-assisted coding, on the other hand, offers an
effective alternative by enabling dependency replacement with in-house code.
arXiv link: http://arxiv.org/abs/2402.13375v3
Bridging Methodologies: Angrist and Imbens' Contributions to Causal Identification
interpretation of Instrumental Variable estimates (a widespread methodology in
economics) through the lens of potential outcomes (a classical framework to
formalize causality in statistics). Bridging a gap between those two strands of
literature, they stress the importance of treatment effect heterogeneity and
show that, under defendable assumptions in various applications, this method
recovers an average causal effect for a specific subpopulation of individuals
whose treatment is affected by the instrument. They were awarded the Nobel
Prize primarily for this Local Average Treatment Effect (LATE). The first part
of this article presents that methodological contribution in-depth: the
origination in earlier applied articles, the different identification results
and extensions, and related debates on the relevance of LATEs for public policy
decisions. The second part reviews the main contributions of the authors beyond
the LATE. J. Angrist has pursued the search for informative and varied
empirical research designs in several fields, particularly in education. G.
Imbens has complemented the toolbox for treatment effect estimation in many
ways, notably through propensity score reweighting, matching, and, more
recently, adapting machine learning procedures.
arXiv link: http://arxiv.org/abs/2402.13023v1
Extending the Scope of Inference About Predictive Ability to Machine Learning Methods
dramatically over the past two decades, but uncertainty quantification for
predictive comparisons remains elusive. This paper addresses this gap by
extending the classic inference theory for predictive ability in time series to
modern machine learners, such as the Lasso or Deep Learning. We investigate
under which conditions such extensions are possible. For standard out-of-sample
asymptotic inference to be valid with machine learning, two key properties must
hold: (I) a zero-mean condition for the score of the prediction loss function
and (ii) a "fast rate" of convergence for the machine learner. Absent any of
these conditions, the estimation risk may be unbounded, and inferences invalid
and very sensitive to sample splitting. For accurate inferences, we recommend
an 80%-20% training-test splitting rule. We illustrate the wide applicability
of our results with three applications: high-dimensional time series
regressions with the Lasso, Deep learning for binary outcomes, and a new
out-of-sample test for the Martingale Difference Hypothesis (MDH). The
theoretical results are supported by extensive Monte Carlo simulations and an
empirical application evaluating the MDH of some major exchange rates at daily
and higher frequencies.
arXiv link: http://arxiv.org/abs/2402.12838v3
Inference on LATEs with covariates
covariate-specific local average treatment effects (LATEs) from a saturated
specification, without making parametric assumptions on how available
covariates enter the model. In practice, TSLS is severely biased as saturation
leads to a large number of control dummies and an equally large number of,
arguably weak, instruments. This paper derives asymptotically valid tests and
confidence intervals for the weighted average of LATEs that is targeted, yet
missed by saturated TSLS. The proposed inference procedure is robust to
unobserved treatment effect heterogeneity, covariates with rich support, and
weak identification. We find LATEs statistically significantly different from
zero in applications in criminology, finance, health, and education.
arXiv link: http://arxiv.org/abs/2402.12607v2
Non-linear Triple Changes Estimator for Targeted Policies
assumption of 'parallel trends,' which does not hold in many practical
applications. To address this issue, the econometrics literature has turned to
the triple difference estimator. Both DiD and triple difference are limited to
assessing average effects exclusively. An alternative avenue is offered by the
changes-in-changes (CiC) estimator, which provides an estimate of the entire
counterfactual distribution at the cost of relying on (stronger) distributional
assumptions. In this work, we extend the triple difference estimator to
accommodate the CiC framework, presenting the `triple changes estimator' and
its identification assumptions, thereby expanding the scope of the CiC
paradigm. Subsequently, we empirically evaluate the proposed framework and
apply it to a study examining the impact of Medicaid expansion on children's
preventive care.
arXiv link: http://arxiv.org/abs/2402.12583v1
Credible causal inference beyond toy models
extra-statistical assumptions that have (sometimes) testable implications.
Well-known sets of assumptions that are sufficient to justify the causal
interpretation of certain estimators are called identification strategies.
These templates for causal analysis, however, do not perfectly map into
empirical research practice. Researchers are often left in the disjunctive of
either abstracting away from their particular setting to fit in the templates,
risking erroneous inferences, or avoiding situations in which the templates
cannot be applied, missing valuable opportunities for conducting empirical
analysis. In this article, I show how directed acyclic graphs (DAGs) can help
researchers to conduct empirical research and assess the quality of evidence
without excessively relying on research templates. First, I offer a concise
introduction to causal inference frameworks. Then I survey the arguments in the
methodological literature in favor of using research templates, while either
avoiding or limiting the use of causal graphical models. Third, I discuss the
problems with the template model, arguing for a more flexible approach to DAGs
that helps illuminating common problems in empirical settings and improving the
credibility of causal claims. I demonstrate this approach in a series of worked
examples, showing the gap between identification strategies as invoked by
researchers and their actual applications. Finally, I conclude highlighting the
benefits that routinely incorporating causal graphical models in our scientific
discussions would have in terms of transparency, testability, and generativity.
arXiv link: http://arxiv.org/abs/2402.11659v1
Doubly Robust Inference in Causal Latent Factor Models
unobserved confounding in modern data-rich environments featuring large numbers
of units and outcomes. The proposed estimator is doubly robust, combining
outcome imputation, inverse probability weighting, and a novel cross-fitting
procedure for matrix completion. We derive finite-sample and asymptotic
guarantees, and show that the error of the new estimator converges to a
mean-zero Gaussian distribution at a parametric rate. Simulation results
demonstrate the relevance of the formal properties of the estimators analyzed
in this article.
arXiv link: http://arxiv.org/abs/2402.11652v3
Maximal Inequalities for Empirical Processes under General Mixing Conditions with an Application to Strong Approximations
of functions for a general class of mixing stochastic processes with arbitrary
mixing rates. Regardless of the speed of mixing, the bound is comprised of a
concentration rate and a novel measure of complexity. The speed of mixing,
however, affects the former quantity implying a phase transition. Fast mixing
leads to the standard root-n concentration rate, while slow mixing leads to a
slower concentration rate, its speed depends on the mixing structure. Our
findings are applied to derive strong approximation results for a general class
of mixing processes with arbitrary mixing rates.
arXiv link: http://arxiv.org/abs/2402.11394v2
Functional Partial Least-Squares: Adaptive Estimation and Inference
Hilbert space-valued predictor, a canonical example of an ill-posed inverse
problem. We show that the functional partial least squares (PLS) estimator
attains nearly minimax-optimal convergence rates over a class of ellipsoids and
propose an adaptive early stopping procedure for selecting the number of PLS
components. In addition, we develop new test that can detect local alternatives
converging at the parametric rate which can be inverted to construct confidence
sets. Simulation results demonstrate that the estimator performs favorably
relative to several existing methods and the proposed test exhibits good power
properties. We apply our methodology to evaluate the nonlinear effects of
temperature on corn and soybean yields.
arXiv link: http://arxiv.org/abs/2402.11134v2
Manipulation Test for Multidimensional RDD
discontinuity design (RDD) relies on assumptions that imply the continuity of
the density of the assignment (running) variable. The test for this implication
is commonly referred to as the manipulation test and is regularly reported in
applied research to strengthen the design's validity. The multidimensional RDD
(MRDD) extends the RDD to contexts where treatment assignment depends on
several running variables. This paper introduces a manipulation test for the
MRDD. First, it develops a theoretical model for causal inference with the
MRDD, used to derive a testable implication on the conditional marginal
densities of the running variables. Then, it constructs the test for the
implication based on a quadratic form of a vector of statistics separately
computed for each marginal density. Finally, the proposed test is compared with
alternative procedures commonly employed in applied research.
arXiv link: http://arxiv.org/abs/2402.10836v2
Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification
priorities: maximizing total welfare (or `reward') through effective treatment
assignment and swiftly concluding experiments to implement population-wide
treatments. Current literature addresses these priorities separately, with
regret minimization studies focusing on the former and best-arm identification
research on the latter. This paper bridges this divide by proposing a unified
model that simultaneously accounts for within-experiment performance and
post-experiment outcomes. We provide a sharp theory of optimal performance in
large populations that not only unifies canonical results in the literature but
also uncovers novel insights. Our theory reveals that familiar algorithms, such
as the recently proposed top-two Thompson sampling algorithm, can optimize a
broad class of objectives if a single scalar parameter is appropriately
adjusted. In addition, we demonstrate that substantial reductions in experiment
duration can often be achieved with minimal impact on both within-experiment
and post-experiment regret.
arXiv link: http://arxiv.org/abs/2402.10592v2
Nowcasting with Mixed Frequency Data Using Gaussian Processes
regressions. This involves handling frequency mismatches and specifying
functional relationships between many predictors and the dependent variable. We
use Gaussian processes (GPs) and compress the input space with structured and
unstructured MIDAS variants. This yields several versions of GP-MIDAS with
distinct properties and implications, which we evaluate in short-horizon now-
and forecasting exercises with both simulated data and data on quarterly US
output growth and inflation in the GDP deflator. It turns out that our proposed
framework leverages macroeconomic Big Data in a computationally efficient way
and offers gains in predictive accuracy compared to other machine learning
approaches along several dimensions.
arXiv link: http://arxiv.org/abs/2402.10574v2
mshw, a forecasting library to predict short-term electricity demand based on multiple seasonal Holt-Winters
forecasting of electricity demand. Current electricity systems largely require
demand forecasting so that the electricity market establishes electricity
prices as well as the programming of production units. The companies that are
part of the electrical system use exclusive software to obtain predictions,
based on the use of time series and prediction tools, whether statistical or
artificial intelligence. However, the most common form of prediction is based
on hybrid models that use both technologies. In any case, it is software with a
complicated structure, with a large number of associated variables and that
requires a high computational load to make predictions. The predictions they
can offer are not much better than those that simple models can offer. In this
paper we present a MATLAB toolbox created for the prediction of electrical
demand. The toolbox implements multiple seasonal Holt-Winters exponential
smoothing models and neural network models. The models used include the use of
discrete interval mobile seasonalities (DIMS) to improve forecasting on special
days. Additionally, the results of its application in various electrical
systems in Europe are shown, where the results obtained can be seen. The use of
this library opens a new avenue of research for the use of models with discrete
and complex seasonalities in other fields of application.
arXiv link: http://arxiv.org/abs/2402.10982v1
When Can We Use Two-Way Fixed-Effects (TWFE): A Comparison of TWFE and Novel Dynamic Difference-in-Differences Estimators
scrutiny lately. Recent literature has revealed potential shortcomings of TWFE
when the treatment effects are heterogeneous. Scholars have developed new
advanced dynamic Difference-in-Differences (DiD) estimators to tackle these
potential shortcomings. However, confusion remains in applied research as to
when the conventional TWFE is biased and what issues the novel estimators can
and cannot address. In this study, we first provide an intuitive explanation of
the problems of TWFE and elucidate the key features of the novel alternative
DiD estimators. We then systematically demonstrate the conditions under which
the conventional TWFE is inconsistent. We employ Monte Carlo simulations to
assess the performance of dynamic DiD estimators under violations of key
assumptions, which likely happens in applied cases. While the new dynamic DiD
estimators offer notable advantages in capturing heterogeneous treatment
effects, we show that the conventional TWFE performs generally well if the
model specifies an event-time function. All estimators are equally sensitive to
violations of the parallel trends assumption, anticipation effects or
violations of time-varying exogeneity. Despite their advantages, the new
dynamic DiD estimators tackle a very specific problem and they do not serve as
a universal remedy for violations of the most critical assumptions. We finally
derive, based on our simulations, recommendations for how and when to use TWFE
and the new DiD estimators in applied research.
arXiv link: http://arxiv.org/abs/2402.09928v3
Spatial Data Analysis
spatial econometrics, offering a comprehensive overview of techniques and
methodologies for analysing spatial data in the social sciences. Spatial
econometrics addresses the unique challenges posed by spatially dependent
observations, where spatial relationships among data points can be of
substantive interest or can significantly impact statistical analyses. The
chapter begins by exploring the fundamental concepts of spatial dependence and
spatial autocorrelation, and highlighting their implications for traditional
econometric models. It then introduces a range of spatial econometric models,
particularly spatial lag, spatial error, spatial lag of X, and spatial Durbin
models, illustrating how these models accommodate spatial relationships and
yield accurate and insightful results about the underlying spatial processes.
The chapter provides an intuitive guide on how to interpret those different
models. A practical example on London house prices demonstrates the application
of spatial econometrics, emphasising its relevance in uncovering hidden spatial
patterns, addressing endogeneity, and providing robust estimates in the
presence of spatial dependence.
arXiv link: http://arxiv.org/abs/2402.09895v2
Identification with Posterior-Separable Information Costs
observationally equivalent to a state-dependent stochastic choice model subject
to attention costs. I demonstrate that additive separability of unobservable
heterogeneity, together with an independence assumption, suffice for the
empirical model to admit a representative agent. Using conditional
probabilities, I show how to identify: how covariates affect the desirability
of goods, (a measure of) welfare, factual changes in welfare, and bounds on
counterfactual market shares.
arXiv link: http://arxiv.org/abs/2402.09789v1
Quantile Granger Causality in the Presence of Instability
unstable environments, for a fixed quantile or over a continuum of quantile
levels. Our proposed test statistics are consistent against fixed alternatives,
they have nontrivial power against local alternatives, and they are pivotal in
certain important special cases. In addition, we show the validity of a
bootstrap procedure when asymptotic distributions depend on nuisance
parameters. Monte Carlo simulations reveal that the proposed test statistics
have correct empirical size and high power, even in absence of structural
breaks. Moreover, a procedure providing additional insight into the timing of
Granger causal regimes based on our new tests is proposed. Finally, an
empirical application in energy economics highlights the applicability of our
method as the new tests provide stronger evidence of Granger causality.
arXiv link: http://arxiv.org/abs/2402.09744v2
Cross-Temporal Forecast Reconciliation at Digital Platforms with Machine Learning
requires high-dimensional accurate forecast streams at different levels of
cross-sectional (e.g., geographical regions) and temporal aggregation (e.g.,
minutes to days). It also necessitates coherent forecasts across all levels of
the hierarchy to ensure aligned decision making across different planning units
such as pricing, product, controlling and strategy. Given that platform data
streams feature complex characteristics and interdependencies, we introduce a
non-linear hierarchical forecast reconciliation method that produces
cross-temporal reconciled forecasts in a direct and automated way through the
use of popular machine learning methods. The method is sufficiently fast to
allow forecast-based high-frequency decision making that platforms require. We
empirically test our framework on unique, large-scale streaming datasets from a
leading on-demand delivery platform in Europe and a bicycle sharing system in
New York City.
arXiv link: http://arxiv.org/abs/2402.09033v2
Local-Polynomial Estimation for Multivariate Regression Discontinuity Designs
regression discontinuity designs in which treatment is assigned by crossing a
boundary in the space of running variables. The dominant approach uses the
Euclidean distance from a boundary point as the scalar running variable; hence,
multivariate designs are handled as uni-variate designs. However, the bandwidth
selection with the distance running variable is suboptimal and inefficient for
the underlying multivariate problem. We handle multivariate designs as
multivariate. In this study, we develop a novel asymptotic normality for
multivariate local-polynomial estimators. Our estimator is asymptotically valid
and can capture heterogeneous treatment effects over the boundary. We
demonstrate the effectiveness of our estimator through numerical simulations.
Our empirical illustration of a Colombian scholarship study reveals a richer
heterogeneity of the treatment effect that is hidden in the original estimates.
arXiv link: http://arxiv.org/abs/2402.08941v2
Inference for an Algorithmic Fairness-Accuracy Frontier
Yet, their predictive ability frequently exhibits systematic variation across
population subgroups. To assess the trade-off between fairness and accuracy
using finite data, we propose a debiased machine learning estimator for the
fairness-accuracy frontier introduced by Liang, Lu, Mu, and Okumura (2024). We
derive its asymptotic distribution and propose inference methods to test key
hypotheses in the fairness literature, such as (i) whether excluding group
identity from use in training the algorithm is optimal and (ii) whether there
are less discriminatory alternatives to a given algorithm. In addition, we
construct an estimator for the distance between a given algorithm and the
fairest point on the frontier, and characterize its asymptotic distribution.
Using Monte Carlo simulations, we evaluate the finite-sample performance of our
inference methods. We apply our framework to re-evaluate algorithms used in
hospital care management and show that our approach yields alternative
algorithms that lie on the fairness-accuracy frontier, offering improvements
along both dimensions.
arXiv link: http://arxiv.org/abs/2402.08879v2
Heterogeneity, Uncertainty and Learning: Semiparametric Identification and Estimation
which continuous outcomes depend on three types of unobservables: known
heterogeneity, initially unknown heterogeneity that may be revealed over time,
and transitory uncertainty. We consider a common environment where the
researcher only has access to a short panel on choices and realized outcomes.
We establish identification of the outcome equation parameters and the
distribution of the unobservables, under the standard assumption that unknown
heterogeneity and uncertainty are normally distributed. We also show that,
absent known heterogeneity, the model is identified without making any
distributional assumption. We then derive the asymptotic properties of a sieve
MLE estimator for the model parameters, and devise a tractable profile
likelihood-based estimation procedure. Our estimator exhibits good
finite-sample properties. Finally, we illustrate our approach with an
application to ability learning in the context of occupational choice. Our
results point to substantial ability learning based on realized wages.
arXiv link: http://arxiv.org/abs/2402.08575v2
Finding Moving-Band Statistical Arbitrages via Convex-Concave Optimization
more assets than just the traditional pair. We formulate the problem as seeking
a portfolio with the highest volatility, subject to its price remaining in a
band and a leverage limit. This optimization problem is not convex, but can be
approximately solved using the convex-concave procedure, a specific sequential
convex programming method. We show how the method generalizes to finding
moving-band statistical arbitrages, where the price band midpoint varies over
time.
arXiv link: http://arxiv.org/abs/2402.08108v1
On Bayesian Filtering for Markov Regime Switching Models
macroeconomic models using Bayesian filtering, with a specific focus on the
state-space formulation of Dynamic Stochastic General Equilibrium (DSGE) models
with multiple regimes. We outline the theoretical foundations of model
estimation, provide the details of two families of powerful multiple-regime
filters, IMM and GPB, and construct corresponding multiple-regime smoothers. A
simulation exercise, based on a prototypical New Keynesian DSGE model, is used
to demonstrate the computational robustness of the proposed filters and
smoothers and evaluate their accuracy and speed for a selection of filters from
each family. We show that the canonical IMM filter is faster and is no less,
and often more, accurate than its competitors within IMM and GPB families, the
latter including the commonly used Kim and Nelson (1999) filter. Using it with
the matching smoother improves the precision in recovering unobserved variables
by about 25 percent. Furthermore, applying it to the U.S. 1947-2023
macroeconomic time series, we successfully identify significant past policy
shifts including those related to the post-Covid-19 period. Our results
demonstrate the practical applicability and potential of the proposed routines
in macroeconomic analysis.
arXiv link: http://arxiv.org/abs/2402.08051v1
Local Projections Inference with High-Dimensional Covariates without Sparsity
estimating future responses to current shocks, robust to high-dimensional
controls without relying on sparsity assumptions. The approach is applicable to
various settings, including impulse response analysis and
difference-in-differences (DiD) estimation. While methods like LASSO exist,
they often assume most parameters are exactly zero, limiting their
effectiveness in dense data generation processes. I propose a novel technique
incorporating high-dimensional covariates in local projections using the
Orthogonal Greedy Algorithm with a high-dimensional AIC (OGA+HDAIC) model
selection method. This approach offers robustness in both sparse and dense
scenarios, improved interpretability, and more reliable causal inference in
local projections. Simulation studies show superior performance in dense and
persistent scenarios compared to conventional LP and LASSO-based approaches. In
an empirical application to Acemoglu, Naidu, Restrepo, and Robinson (2019), I
demonstrate efficiency gains and robustness to a large set of controls.
Additionally, I examine the effect of subjective beliefs on economic
aggregates, demonstrating robustness to various model specifications. A novel
state-dependent analysis reveals that inflation behaves more in line with
rational expectations in good states, but exhibits more subjective, pessimistic
dynamics in bad states.
arXiv link: http://arxiv.org/abs/2402.07743v2
A step towards the integration of machine learning and classic model-based survey methods
official statistics, is still very limited. Therefore, we propose a predictor
supported by these algorithms, which can be used to predict any population or
subpopulation characteristics. Machine learning methods have already been shown
to be very powerful in identifying and modelling complex and nonlinear
relationships between the variables, which means they have very good properties
in case of strong departures from the classic assumptions. Therefore, we
analyse the performance of our proposal under a different set-up, which, in our
opinion, is of greater importance in real-life surveys. We study only small
departures from the assumed model to show that our proposal is a good
alternative, even in comparison with optimal methods under the model. Moreover,
we propose the method of the ex ante accuracy estimation of machine learning
predictors, giving the possibility of the accuracy comparison with classic
methods. The solution to this problem is indicated in the literature as one of
the key issues in integrating these approaches. The simulation studies are
based on a real, longitudinal dataset, where the prediction of subpopulation
characteristics is considered.
arXiv link: http://arxiv.org/abs/2402.07521v2
Interference Among First-Price Pacing Equilibria: A Bias and Variance Analysis
decisions on new feature roll-outs. For online marketplaces (such as
advertising markets), standard approaches to A/B testing may lead to biased
results when buyers operate under a budget constraint, as budget consumption in
one arm of the experiment impacts performance of the other arm. To counteract
this interference, one can use a budget-split design where the budget
constraint operates on a per-arm basis and each arm receives an equal fraction
of the budget, leading to “budget-controlled A/B testing.” Despite clear
advantages of budget-controlled A/B testing, performance degrades when budget
are split too small, limiting the overall throughput of such systems. In this
paper, we propose a parallel budget-controlled A/B testing design where we use
market segmentation to identify submarkets in the larger market, and we run
parallel experiments on each submarket.
Our contributions are as follows: First, we introduce and demonstrate the
effectiveness of the parallel budget-controlled A/B test design with submarkets
in a large online marketplace environment. Second, we formally define market
interference in first-price auction markets using the first price pacing
equilibrium (FPPE) framework. Third, we propose a debiased surrogate that
eliminates the first-order bias of FPPE, drawing upon the principles of
sensitivity analysis in mathematical programs. Fourth, we derive a plug-in
estimator for the surrogate and establish its asymptotic normality. Fifth, we
provide an estimation procedure for submarket parallel budget-controlled A/B
tests. Finally, we present numerical examples on semi-synthetic data,
confirming that the debiasing technique achieves the desired coverage
properties.
arXiv link: http://arxiv.org/abs/2402.07322v3
Research on the multi-stage impact of digital economy on rural revitalization in Hainan Province based on GPM model
implementation of the rural revitalization strategy. Based on this, this study
takes Hainan Province as the research object to deeply explore the impact of
digital economic development on rural revitalization. The study collected panel
data from 2003 to 2022 to construct an evaluation index system for the digital
economy and rural revitalization and used panel regression analysis and other
methods to explore the promotion effect of the digital economy on rural
revitalization. Research results show that the digital economy has a
significant positive impact on rural revitalization, and this impact increases
as the level of fiscal expenditure increases. The issuance of digital RMB has
further exerted a regulatory effect and promoted the development of the digital
economy and the process of rural revitalization. At the same time, the
establishment of the Hainan Free Trade Port has also played a positive role in
promoting the development of the digital economy and rural revitalization. In
the prediction of the optimal strategy for rural revitalization based on the
development levels of the primary, secondary, and tertiary industries (Rate1,
Rate2, and Rate3), it was found that rate1 can encourage Hainan Province to
implement digital economic innovation, encourage rate3 to implement promotion
behaviors, and increase rate2 can At the level of sustainable development when
rate3 promotes rate2's digital economic innovation behavior, it can standardize
rate2's production behavior to the greatest extent, accelerate the faster
application of the digital economy to the rural revitalization industry, and
promote the technological advancement of enterprises.
arXiv link: http://arxiv.org/abs/2402.07170v1
High Dimensional Factor Analysis with Weak Factors
dimensional approximate factor models with weak factors in that the factor
loading ($\Lambda^0$) scales sublinearly in the number $N$ of
cross-section units, i.e., $\Lambda^{0\top} \Lambda^0
/ N^\alpha$ is positive definite in the limit for some $\alpha \in (0,1)$.
While the consistency and asymptotic normality of these estimates are by now
well known when the factors are strong, i.e., $\alpha=1$, the statistical
properties for weak factors remain less explored. Here, we show that the PC
estimator maintains consistency and asymptotical normality for any
$\alpha\in(0,1)$, provided suitable conditions regarding the dependence
structure in the noise are met. This complements earlier result by Onatski
(2012) that the PC estimator is inconsistent when $\alpha=0$, and the more
recent work by Bai and Ng (2023) who established the asymptotic normality of
the PC estimator when $\alpha \in (1/2,1)$. Our proof strategy integrates the
traditional eigendecomposition-based approach for factor models with
leave-one-out analysis similar in spirit to those used in matrix completion and
other settings. This combination allows us to deal with factors weaker than the
former and at the same time relax the incoherence and independence assumptions
often associated with the later.
arXiv link: http://arxiv.org/abs/2402.05789v1
Difference-in-Differences Estimators with Continuous Treatments and no Stayers
include prices, taxes or temperatures. Empirical researchers have usually
relied on two-way fixed effect regressions to estimate treatment effects in
such cases. However, such estimators are not robust to heterogeneous treatment
effects in general; they also rely on the linearity of treatment effects. We
propose estimators for continuous treatments that do not impose those
restrictions, and that can be used when there are no stayers: the treatment of
all units changes from one period to the next. We start by extending the
nonparametric results of de Chaisemartin et al. (2023) to cases without
stayers. We also present a parametric estimator, and use it to revisit
Desch\^enes and Greenstone (2012).
arXiv link: http://arxiv.org/abs/2402.05432v1
Selective linear segmentation for detecting relevant parameter changes
We propose a method to uncover which model parameter truly vary when a
change-point is detected. Given a set of breakpoints, we use a penalized
likelihood approach to select the best set of parameters that changes over time
and we prove that the penalty function leads to a consistent selection of the
true model. Estimation is carried out via the deterministic annealing
expectation-maximization algorithm. Our method accounts for model selection
uncertainty and associates a probability to all the possible time-varying
parameter specifications. Monte Carlo simulations highlight that the method
works well for many time series models including heteroskedastic processes. For
a sample of 14 Hedge funds (HF) strategies, using an asset based style pricing
model, we shed light on the promising ability of our method to detect the
time-varying dynamics of risk exposures as well as to forecast HF returns.
arXiv link: http://arxiv.org/abs/2402.05329v1
Inference for Two-Stage Extremum Estimators
focusing on extremum estimators in the second stage. We accommodate a broad
range of first-stage estimators, including extremum estimators,
high-dimensional estimators, and other types of estimators such as Bayesian
estimators. The key contribution of our approach lies in its ability to
estimate the asymptotic distribution of two-stage estimators, even when the
distributions of both the first- and second-stage estimators are non-normal and
when the second-stage estimator's bias, scaled by the square root of the sample
size, does not vanish asymptotically. This enables reliable inference in
situations where standard methods fail. Additionally, we propose a debiased
estimator, based on the mean of the estimated distribution function, which
exhibits improved finite sample properties. Unlike resampling methods, our
approach avoids the need for multiple calculations of the two-stage estimator.
We illustrate the effectiveness of our method in an empirical application on
peer effects in adolescent fast-food consumption, where we address the issue of
biased instrumental variable estimates resulting from many weak instruments.
arXiv link: http://arxiv.org/abs/2402.05030v2
What drives the European carbon market? Macroeconomic factors and forecasts
widely used policy measure to achieve the target of net-zero emissions by 2050.
This paper tackles the issue of producing point, direction-of-change, and
density forecasts for the monthly real price of carbon within the EU Emissions
Trading Scheme (EU ETS). We aim to uncover supply- and demand-side forces that
can contribute to improving the prediction accuracy of models at short- and
medium-term horizons. We show that a simple Bayesian Vector Autoregressive
(BVAR) model, augmented with either one or two factors capturing a set of
predictors affecting the price of carbon, provides substantial accuracy gains
over a wide set of benchmark forecasts, including survey expectations and
forecasts made available by data providers. We extend the study to verified
emissions and demonstrate that, in this case, adding stochastic volatility can
further improve the forecasting performance of a single-factor BVAR model. We
rely on emissions and price forecasts to build market monitoring tools that
track demand and price pressure in the EU ETS market. Our results are relevant
for policymakers and market practitioners interested in monitoring the carbon
market dynamics.
arXiv link: http://arxiv.org/abs/2402.04828v2
Hyperparameter Tuning for Causal Inference with Double Machine Learning: A Simulation Study
of modern machine learning (ML) methods in predictive tasks. While there is an
extensive literature on tuning ML learners for prediction, there is only little
guidance available on tuning ML learners for causal machine learning and how to
select among different ML learners. In this paper, we empirically assess the
relationship between the predictive performance of ML methods and the resulting
causal estimation based on the Double Machine Learning (DML) approach by
Chernozhukov et al. (2018). DML relies on estimating so-called nuisance
parameters by treating them as supervised learning problems and using them as
plug-in estimates to solve for the (causal) parameter. We conduct an extensive
simulation study using data from the 2019 Atlantic Causal Inference Conference
Data Challenge. We provide empirical insights on the role of hyperparameter
tuning and other practical decisions for causal estimation with DML. First, we
assess the importance of data splitting schemes for tuning ML learners within
Double Machine Learning. Second, we investigate how the choice of ML methods
and hyperparameters, including recent AutoML frameworks, impacts the estimation
performance for a causal parameter of interest. Third, we assess to what extent
the choice of a particular causal model, as characterized by incorporated
parametric assumptions, can be based on predictive performance metrics.
arXiv link: http://arxiv.org/abs/2402.04674v1
Fast Online Changepoint Detection
model. We propose a class of heavily weighted statistics based on the CUSUM
process of the regression residuals, which are specifically designed to ensure
timely detection of breaks occurring early on during the monitoring horizon. We
subsequently propose a class of composite statistics, constructed using
different weighing schemes; the decision rule to mark a changepoint is based on
the largest statistic across the various weights, thus effectively working like
a veto-based voting mechanism, which ensures fast detection irrespective of the
location of the changepoint. Our theory is derived under a very general form of
weak dependence, thus being able to apply our tests to virtually all time
series encountered in economics, medicine, and other applied sciences. Monte
Carlo simulations show that our methodologies are able to control the
procedure-wise Type I Error, and have short detection delays in the presence of
breaks.
arXiv link: http://arxiv.org/abs/2402.04433v1
Monthly GDP nowcasting with Machine Learning and Unstructured Data
"nowcasting" models offer a distinct advantage for informed decision-making in
both public and private sectors. This study introduces ML-based GDP growth
projection models for monthly rates in Peru, integrating structured
macroeconomic indicators with high-frequency unstructured sentiment variables.
Analyzing data from January 2007 to May 2023, encompassing 91 leading economic
indicators, the study evaluates six ML algorithms to identify optimal
predictors. Findings highlight the superior predictive capability of ML models
using unstructured data, particularly Gradient Boosting Machine, LASSO, and
Elastic Net, exhibiting a 20% to 25% reduction in prediction errors compared to
traditional AR and Dynamic Factor Models (DFM). This enhanced performance is
attributed to better handling of data of ML models in high-uncertainty periods,
such as economic crises.
arXiv link: http://arxiv.org/abs/2402.04165v1
Data-driven Policy Learning for Continuous Treatments
observational data. Continuous treatments present more significant challenges
than discrete ones because population welfare may need nonparametric
estimation, and policy space may be infinite-dimensional and may satisfy shape
restrictions. We propose to approximate the policy space with a sequence of
finite-dimensional spaces and, for any given policy, obtain the empirical
welfare by applying the kernel method. We consider two cases: known and unknown
propensity scores. In the latter case, we allow for machine learning of the
propensity score and modify the empirical welfare to account for the effect of
machine learning. The learned policy maximizes the empirical welfare or the
modified empirical welfare over the approximating space. In both cases, we
modify the penalty algorithm proposed in mbakop2021model to
data-automate the tuning parameters (i.e., bandwidth and dimension of the
approximating space) and establish an oracle inequality for the welfare regret.
arXiv link: http://arxiv.org/abs/2402.02535v2
Decomposing Global Bank Network Connectedness: What is Common, Idiosyncratic and When?
connectedness in both the time and frequency domains. By employing a factor
model with sparse VAR idiosyncratic components, we decompose system-wide
connectedness (SWC) into two key drivers: (i) common component shocks and (ii)
idiosyncratic shocks. We also provide bootstrap confidence bands for all SWC
measures. Furthermore, spectral density estimation allows us to disentangle SWC
into short-, medium-, and long-term frequency responses to these shocks. We
apply our methodology to two datasets of daily stock price volatilities for
over 90 global banks, spanning the periods 2003-2013 and 2014-2023. Our
empirical analysis reveals that SWC spikes during global crises, primarily
driven by common component shocks and their short term effects. Conversely, in
normal times, SWC is largely influenced by idiosyncratic shocks and medium-term
dynamics.
arXiv link: http://arxiv.org/abs/2402.02482v2
Bootstrapping Fisher Market Equilibrium and First-Price Pacing Equilibrium
which also has applications in fair and efficient resource allocation.
First-price pacing equilibrium (FPPE) is a model capturing budget-management
mechanisms in first-price auctions. In certain practical settings such as
advertising auctions, there is an interest in performing statistical inference
over these models. A popular methodology for general statistical inference is
the bootstrap procedure. Yet, for LFM and FPPE there is no existing theory for
the valid application of bootstrap procedures. In this paper, we introduce and
devise several statistically valid bootstrap inference procedures for LFM and
FPPE. The most challenging part is to bootstrap general FPPE, which reduces to
bootstrapping constrained M-estimators, a largely unexplored problem. We devise
a bootstrap procedure for FPPE under mild degeneracy conditions by using the
powerful tool of epi-convergence theory. Experiments with synthetic and
semi-real data verify our theory.
arXiv link: http://arxiv.org/abs/2402.02303v6
One-inflated zero-truncated Poisson and negative binomial regression models
zero-truncated negative binomial (ZTNB) model. We find it should seldom be
used. Instead, we recommend the one-inflated zero-truncated negative binomial
(OIZTNB) model developed here. Zero-truncated count data often contain an
excess of 1s, leading to bias and inconsistency in the ZTNB model. The
importance of the OIZTNB model is apparent given the obvious presence of
one-inflation in four datasets that have traditionally championed the standard
ZTNB. We provide estimation, marginal effects, and a suite of accompanying
tools in the R package oneinfl, available on CRAN.
arXiv link: http://arxiv.org/abs/2402.02272v2
The general solution to an autoregressive law of motion
autoregressive law of motion in a finite-dimensional complex vector space.
Every solution is shown to be the sum of three parts, each corresponding to a
directed flow of time. One part flows forward from the arbitrarily distant
past; one flows backward from the arbitrarily distant future; and one flows
outward from time zero. The three parts are obtained by applying three
complementary spectral projections to the solution, these corresponding to a
separation of the eigenvalues of the autoregressive operator according to
whether they are inside, outside or on the unit circle. We provide a
finite-dimensional parametrization of the set of all solutions.
arXiv link: http://arxiv.org/abs/2402.01966v2
Sparse spanning portfolios and under-diversification with second-order stochastic dominance
constraints on portfolios improves the investment opportunity set for
risk-averse investors. We formulate a new estimation procedure for sparse
second-order stochastic spanning based on a greedy algorithm and Linear
Programming. We show the optimal recovery of the sparse solution asymptotically
whether spanning holds or not. From large equity datasets, we estimate the
expected utility loss due to possible under-diversification, and find that
there is no benefit from expanding a sparse opportunity set beyond 45 assets.
The optimal sparse portfolio invests in 10 industry sectors and cuts tail risk
when compared to a sparse mean-variance portfolio. On a rolling-window basis,
the number of assets shrinks to 25 assets in crisis periods, while standard
factor models cannot explain the performance of the sparse portfolios.
arXiv link: http://arxiv.org/abs/2402.01951v2
Data-driven model selection within the matrix completion method for causal panel data models
regulate the rank of the underlying factor model using nuclear norm
minimization. This convex optimization problem enables concurrent
regularization of a potentially high-dimensional set of covariates to shrink
the model size. For valid finite sample inference, we adopt a permutation-based
approach and prove its validity for any treatment assignment mechanism.
Simulations illustrate the consistency of the proposed estimator in parameter
estimation and variable selection. An application to public health policies in
Germany demonstrates the data-driven model selection feature on empirical data
and finds no effect of travel restrictions on the containment of severe
Covid-19 infections.
arXiv link: http://arxiv.org/abs/2402.01069v1
DoubleMLDeep: Estimation of Causal Effects with Multimodal Data
images, in causal inference and treatment effect estimation. We propose a
neural network architecture that is adapted to the double machine learning
(DML) framework, specifically the partially linear model. An additional
contribution of our paper is a new method to generate a semi-synthetic dataset
which can be used to evaluate the performance of causal effect estimation in
the presence of text and images as confounders. The proposed methods and
architectures are evaluated on the semi-synthetic dataset and compared to
standard approaches, highlighting the potential benefit of using text and
images directly in causal studies. Our findings have implications for
researchers and practitioners in economics, marketing, finance, medicine and
data science in general who are interested in estimating causal quantities
using non-traditional data.
arXiv link: http://arxiv.org/abs/2402.01785v1
The prices of renewable commodities: A robust stationarity analysis
the shocks affecting the prices of renewable commodities, which have potential
implications on stabilization policies and economic forecasting, among other
areas. A robust methodology is employed that enables the determination of the
potential presence and number of instant/gradual structural changes in the
series, stationarity testing conditional on the number of changes detected, and
the detection of change points. This procedure is applied to the annual real
prices of eighteen renewable commodities over the period of 1900-2018. Results
indicate that most of the series display non-linear features, including
quadratic patterns and regime transitions that often coincide with well-known
political and economic episodes. The conclusions of stationarity testing
suggest that roughly half of the series are integrated. Stationarity fails to
be rejected for grains, whereas most livestock and textile commodities do
reject stationarity. Evidence is mixed in all soft commodities and tropical
crops, where stationarity can be rejected in approximately half of the cases.
The implication would be that for these commodities, stabilization schemes
would not be recommended.
arXiv link: http://arxiv.org/abs/2402.01005v1
EU-28's progress towards the 2020 renewable energy share. A club convergence analysis
common goal of 20% in the renewable energy share indicator by year 2020. The
potential presence of clubs of convergence towards different steady state
equilibria is also analyzed from both the standpoints of global convergence to
the 20% goal and specific convergence to the various targets assigned to Member
States. Two clubs of convergence are detected in the former case, each
corresponding to different RES targets. A probit model is also fitted with the
aim of better understanding the determinants of club membership, that seemingly
include real GDP per capita, expenditure on environmental protection, energy
dependence, and nuclear capacity, with all of them having statistically
significant effects. Finally, convergence is also analyzed separately for the
transport, heating and cooling, and electricity sectors.
arXiv link: http://arxiv.org/abs/2402.00788v1
Arellano-Bond LASSO Estimator for Dynamic Linear Panel Models
models, widely used in practice. However, the estimator is severely biased when
the data's time series dimension $T$ is long due to the large degree of
overidentification. We show that weak dependence along the panel's time series
dimension naturally implies approximate sparsity of the most informative moment
conditions, motivating the following approach to remove the bias: First, apply
LASSO to the cross-section data at each time period to construct most
informative (and cross-fitted) instruments, using lagged values of suitable
covariates. This step relies on approximate sparsity to select the most
informative instruments. Second, apply a linear instrumental variable estimator
after first differencing the dynamic structural equation using the constructed
instruments. Under weak time series dependence, we show the new estimator is
consistent and asymptotically normal under much weaker conditions on $T$'s
growth than the Arellano-Bond estimator. Our theory covers models with high
dimensional covariates, including multiple lags of the dependent variable,
common in modern applications. We illustrate our approach by applying it to
weekly county-level panel data from the United States to study opening K-12
schools and other mitigation policies' short and long-term effects on
COVID-19's spread.
arXiv link: http://arxiv.org/abs/2402.00584v4
Stochastic convergence in per capita CO$_2$ emissions. An approach from nonlinear stationarity analysis
28 OECD countries for the 1901-2009 period. The analysis is carried out at two
aggregation levels, first for the whole set of countries (joint analysis) and
then separately for developed and developing states (group analysis). A
powerful time series methodology, adapted to a nonlinear framework that allows
for quadratic trends with possibly smooth transitions between regimes, is
applied. This approach provides more robust conclusions in convergence path
analysis, enabling (a) robust detection of the presence, and if so, the number
of changes in the level and/or slope of the trend of the series, (b) inferences
on stationarity of relative per capita CO$_2$ emissions, conditionally on the
presence of breaks and smooth transitions between regimes, and (c) estimation
of change locations in the convergence paths. Finally, as stochastic
convergence is attained when both stationarity around a trend and
$\beta$-convergence hold, the linear approach proposed by Tomljanovich and
Vogelsang (2002) is extended in order to allow for more general quadratic
models. Overall, joint analysis finds some evidence of stochastic convergence
in per capita CO$_2$ emissions. Some dispersion in terms of $\beta$-convergence
is detected by group analysis, particularly among developed countries. This is
in accordance with per capita GDP not being the sole determinant of convergence
in emissions, with factors like search for more efficient technologies, fossil
fuel substitution, innovation, and possibly outsources of industries, also
having a crucial role.
arXiv link: http://arxiv.org/abs/2402.00567v1
Finite- and Large-Sample Inference for Ranks using Multinomial Data with an Application to Ranking Political Parties
revealed through data on choices. A prominent example is the ranking of
political candidates or parties using the estimated share of support each one
receives in surveys or polls about political attitudes. Since these rankings
are computed using estimates of the share of support rather than the true share
of support, there may be considerable uncertainty concerning the true ranking
of the political candidates or parties. In this paper, we consider the problem
of accounting for such uncertainty by constructing confidence sets for the rank
of each category. We consider both the problem of constructing marginal
confidence sets for the rank of a particular category as well as simultaneous
confidence sets for the ranks of all categories. A distinguishing feature of
our analysis is that we exploit the multinomial structure of the data to
develop confidence sets that are valid in finite samples. We additionally
develop confidence sets using the bootstrap that are valid only approximately
in large samples. We use our methodology to rank political parties in Australia
using data from the 2019 Australian Election Survey. We find that our
finite-sample confidence sets are informative across the entire ranking of
political parties, even in Australian territories with few survey respondents
and/or with parties that are chosen by only a small share of the survey
respondents. In contrast, the bootstrap-based confidence sets may sometimes be
considerably less informative. These findings motivate us to compare these
methods in an empirically-driven simulation study, in which we conclude that
our finite-sample confidence sets often perform better than their large-sample,
bootstrap-based counterparts, especially in settings that resemble our
empirical application.
arXiv link: http://arxiv.org/abs/2402.00192v1
The Mixed Aggregate Preference Logit Model: A Machine Learning Approach to Modeling Unobserved Heterogeneity in Discrete Choice Analysis
"maple”) model, a novel class of discrete choice models that leverages machine
learning to model unobserved heterogeneity in discrete choice analysis. The
traditional mixed logit model (also known as "random parameters logit”)
parameterizes preference heterogeneity through assumptions about
feature-specific heterogeneity distributions. These parameters are also
typically assumed to be linearly added in a random utility (or random regret)
model. MAPL models relax these assumptions by instead directly relating model
inputs to parameters of alternative-specific distributions of aggregate
preference heterogeneity, with no feature-level assumptions required. MAPL
models eliminate the need to make any assumption about the functional form of
the latent decision model, freeing modelers from potential misspecification
errors. In a simulation experiment, we demonstrate that a single MAPL model
specification is capable of correctly modeling multiple different
data-generating processes with different forms of utility and heterogeneity
specifications. MAPL models advance machine-learning-based choice models by
accounting for unobserved heterogeneity. Further, MAPL models can be leveraged
by traditional choice modelers as a diagnostic tool for identifying utility and
heterogeneity misspecification.
arXiv link: http://arxiv.org/abs/2402.00184v2
The Fourier-Malliavin Volatility (FMVol) MATLAB library
library for MATLAB. This library includes functions that implement Fourier-
Malliavin estimators (see Malliavin and Mancino (2002, 2009)) of the volatility
and co-volatility of continuous stochastic volatility processes and
second-order quantities, like the quarticity (the squared volatility), the
volatility of volatility and the leverage (the covariance between changes in
the process and changes in its volatility). The Fourier-Malliavin method is
fully non-parametric, does not require equally-spaced observations and is
robust to measurement errors, or noise, without any preliminary bias correction
or pre-treatment of the observations. Further, in its multivariate version, it
is intrinsically robust to irregular and asynchronous sampling. Although
originally introduced for a specific application in financial econometrics,
namely the estimation of asset volatilities, the Fourier-Malliavin method is a
general method that can be applied whenever one is interested in reconstructing
the latent volatility and second-order quantities of a continuous stochastic
volatility process from discrete observations.
arXiv link: http://arxiv.org/abs/2402.00172v1
Regularizing Fairness in Optimal Policy Learning with Distributional Targets
relative effectiveness of treatments, and (ii) chooses an implementation
mechanism that implies an “optimal” predicted outcome distribution according
to some target functional. Nevertheless, a fairness-aware decision maker may
not be satisfied achieving said optimality at the cost of being “unfair"
against a subgroup of the population, in the sense that the outcome
distribution in that subgroup deviates too strongly from the overall optimal
outcome distribution. We study a framework that allows the decision maker to
regularize such deviations, while allowing for a wide range of target
functionals and fairness measures to be employed. We establish regret and
consistency guarantees for empirical success policies with (possibly)
data-driven preference parameters, and provide numerical results. Furthermore,
we briefly illustrate the methods in two empirical settings.
arXiv link: http://arxiv.org/abs/2401.17909v2
Marginal treatment effects in the absence of instrumental variables
treatment effect (MTE) without imposing the instrumental variable (IV)
assumptions of independence, exclusion, and separability (or monotonicity).
Under a new definition of the MTE based on reduced-form treatment error that is
statistically independent of the covariates, we find that the relationship
between the MTE and standard treatment parameters holds in the absence of IVs.
We provide a set of sufficient conditions ensuring the identification of the
defined MTE in an environment of essential heterogeneity. The key conditions
include a linear restriction on potential outcome regression functions, a
nonlinear restriction on the propensity score, and a conditional mean
independence restriction that will lead to additive separability. We prove this
identification using the notion of semiparametric identification based on
functional form. And we provide an empirical application for the Head Start
program to illustrate the usefulness of the proposed method in analyzing
heterogenous causal effects when IVs are elusive.
arXiv link: http://arxiv.org/abs/2401.17595v2
Partial Identification of Binary Choice Models with Misreported Outcomes
with misreported dependent variables. We propose two distinct approaches by
exploiting different instrumental variables respectively. In the first
approach, the instrument is assumed to only affect the true dependent variable
but not misreporting probabilities. The second approach uses an instrument that
influences misreporting probabilities monotonically while having no effect on
the true dependent variable. Moreover, we derive identification results under
additional restrictions on misreporting, including bounded/monotone
misreporting probabilities. We use simulations to demonstrate the robust
performance of our approaches, and apply the method to study educational
attainment.
arXiv link: http://arxiv.org/abs/2401.17137v1
Congestion Pricing for Efficiency and Equity: Theory and Applications to the San Francisco Bay Area
congestion, raises concerns about widening socioeconomic disparities due to its
disproportionate impact on low-income travelers. We address this concern by
proposing a new class of congestion pricing schemes that not only minimize
total travel time, but also incorporate an equity objective, reducing
disparities in the relative change in travel costs across populations with
different incomes, following the implementation of tolls. Our analysis builds
on a congestion game model with heterogeneous traveler populations. We present
four pricing schemes that account for practical considerations, such as the
ability to charge differentiated tolls to various traveler populations and the
option to toll all or only a subset of edges in the network. We evaluate our
pricing schemes in the calibrated freeway network of the San Francisco Bay
Area. We demonstrate that the proposed congestion pricing schemes improve both
the total travel time and the equity objective compared to the current pricing
scheme.
Our results further show that pricing schemes charging differentiated prices
to traveler populations with varying value-of-time lead to a more equitable
distribution of travel costs compared to those that charge a homogeneous price
to all.
arXiv link: http://arxiv.org/abs/2401.16844v2
Graph Neural Networks: Theory for Estimation with Application on Network Heterogeneity
and estimating network heterogeneity. Network heterogeneity is characterized by
variations in unit's decisions or outcomes that depend not only on its own
attributes but also on the conditions of its surrounding neighborhood. We
delineate the convergence rate of the graph neural networks estimator, as well
as its applicability in semiparametric causal inference with heterogeneous
treatment effects. The finite-sample performance of our estimator is evaluated
through Monte Carlo simulations. In an empirical setting related to
microfinance program participation, we apply the new estimator to examine the
average treatment effects and outcomes of counterfactual policies, and to
propose an enhanced strategy for selecting the initial recipients of program
information in social networks.
arXiv link: http://arxiv.org/abs/2401.16275v1
Comparing MCMC algorithms in Stochastic Volatility Models using Simulation Based Calibration
competing Markov chain Monte Carlo algorithms for estimating the posterior
distribution of a stochastic volatility model. In particular, the bespoke
'off-set mixture approximation' algorithm proposed by Kim, Shephard, and Chib
(1998) is explored together with a Hamiltonian Monte Carlo algorithm
implemented through Stan. The SBC analysis involves a simulation study to
assess whether each sampling algorithm has the capacity to produce valid
inference for the correctly specified model, while also characterising
statistical efficiency through the effective sample size. Results show that
Stan's No-U-Turn sampler, an implementation of Hamiltonian Monte Carlo,
produces a well-calibrated posterior estimate while the celebrated off-set
mixture approach is less efficient and poorly calibrated, though model
parameterisation also plays a role. Limitations and restrictions of generality
are discussed.
arXiv link: http://arxiv.org/abs/2402.12384v1
Testing the Exogeneity of Instrumental Variables and Regressors in Linear Regression Models Using Copulas
variables in linear regression models. We show that the exogeneity of
instrumental variables is equivalent to the exogeneity of their standard normal
transformations with the same CDF value. Then, we establish a Wald test for the
exogeneity of the instrumental variables. We demonstrate the performance of our
test using simulation studies. Our simulations show that if the instruments are
actually endogenous, our test rejects the exogeneity hypothesis approximately
93% of the time at the 5% significance level. Conversely, when instruments are
truly exogenous, it dismisses the exogeneity assumption less than 30% of the
time on average for data with 200 observations and less than 2% of the time for
data with 1,000 observations. Our results demonstrate our test's effectiveness,
offering significant value to applied econometricians.
arXiv link: http://arxiv.org/abs/2401.15253v1
csranks: An R Package for Estimation and Inference Involving Ranks
involving ranks. First, we review methods for the construction of confidence
sets for ranks, namely marginal and simultaneous confidence sets as well as
confidence sets for the identities of the tau-best. Second, we review methods
for estimation and inference in regressions involving ranks. Third, we describe
the implementation of these methods in csranks and illustrate their usefulness
in two examples: one about the quantification of uncertainty in the PISA
ranking of countries and one about the measurement of intergenerational
mobility using rank-rank regressions.
arXiv link: http://arxiv.org/abs/2401.15205v1
High-dimensional forecasting with known knowns and known unknowns
brief review of the general issues, this paper considers ways of using
high-dimensional data in forecasting. We consider selecting variables from a
known active set, known knowns, using Lasso and OCMT, and approximating
unobserved latent factors, known unknowns, by various means. This combines both
sparse and dense approaches. We demonstrate the various issues involved in
variable selection in a high-dimensional setting with an application to
forecasting UK inflation at different horizons over the period 2020q1-2023q1.
This application shows both the power of parsimonious models and the importance
of allowing for global variables.
arXiv link: http://arxiv.org/abs/2401.14582v2
Structural Periodic Vector Autoregressions
seasonal adjustment techniques before it is used for structural inference, this
may distort valuable information in the data. As an alternative method to
commonly used structural vector autoregressions (SVARs) for seasonally adjusted
data, we propose to model potential periodicity in seasonally unadjusted (raw)
data directly by structural periodic vector autoregressions (SPVARs). This
approach does not only allow for periodically time-varying intercepts, but also
for periodic autoregressive parameters and innovations variances. As this
larger flexibility leads to an increased number of parameters, we propose
linearly constrained estimation techniques. Moreover, based on SPVARs, we
provide two novel identification schemes and propose a general framework for
impulse response analyses that allows for direct consideration of seasonal
patterns. We provide asymptotic theory for SPVAR estimators and impulse
responses under flexible linear restrictions and introduce a test for
seasonality in impulse responses. For the construction of confidence intervals,
we discuss several residual-based (seasonal) bootstrap methods and prove their
bootstrap consistency under different assumptions. A real data application
shows that useful information about the periodic structure in the data may be
lost when relying on common seasonal adjustment methods.
arXiv link: http://arxiv.org/abs/2401.14545v2
Identification of Nonseparable Models with Endogenous Control Variables
models with the presence of potentially endogenous control variables. We show
that given the treatment variable and the controls are measurably separated,
the usual conditional independence condition or availability of excluded
instrument suffices for identification.
arXiv link: http://arxiv.org/abs/2401.14395v1
Entrywise Inference for Missing Panel Data: A Simple and Instance-Optimal Approach
by units and columns indexed by time. We consider inferential questions
associated with the missing data version of panel data induced by staggered
adoption. We propose a computationally efficient procedure for estimation,
involving only simple matrix algebra and singular value decomposition, and
prove non-asymptotic and high-probability bounds on its error in estimating
each missing entry. By controlling proximity to a suitably scaled Gaussian
variable, we develop and analyze a data-driven procedure for constructing
entrywise confidence intervals with pre-specified coverage. Despite its
simplicity, our procedure turns out to be instance-optimal: we prove that the
width of our confidence intervals match a non-asymptotic instance-wise lower
bound derived via a Bayesian Cram\'{e}r-Rao argument. We illustrate the
sharpness of our theoretical characterization on a variety of numerical
examples. Our analysis is based on a general inferential toolbox for SVD-based
algorithm applied to the matrix denoising model, which might be of independent
interest.
arXiv link: http://arxiv.org/abs/2401.13665v2
New accessibility measures based on unconventional big data sources
related to the accessibility to medical infrastructures. The increasing
availability of data automatically collected through unconventional sources
(such as webscraping, crowdsourcing or internet of things) recently opened
previously unconceivable opportunities to researchers interested in measuring
accessibility and to use it as a tool for real-time monitoring, surveillance
and health policies definition. This paper contributes to this strand of
literature proposing new accessibility measures that can be continuously feeded
by automatic data collection. We present new measures of accessibility and we
illustrate their use to study the territorial impact of supply-side shocks of
health facilities. We also illustrate the potential of our proposal with a case
study based on a huge set of data (related to the Emergency Departments in
Milan, Italy) that have been webscraped for the purpose of this paper every 5
minutes since November 2021 to March 2022, amounting to approximately 5 million
observations.
arXiv link: http://arxiv.org/abs/2401.13370v1
Realized Stochastic Volatility Model with Skew-t Distributions for Improved Volatility and Quantile Forecasting
evaluating financial tail risks such as value-at-risk and expected shortfall.
This study proposes an extension of the traditional stochastic volatility
model, termed the realized stochastic volatility model, that incorporates
realized volatility as an efficient proxy for latent volatility. To better
capture the stylized features of financial return distributions, particularly
skewness and heavy tails, we introduce three variants of skewed
t-distributions, two of which incorporate skew-normal components to flexibly
model asymmetry. The models are estimated using a Bayesian Markov chain Monte
Carlo approach and applied to daily returns and realized volatilities from
major U.S. and Japanese stock indices. Empirical results demonstrate that
incorporating both realized volatility and flexible return distributions
substantially improves the accuracy of volatility and tail risk forecasts.
arXiv link: http://arxiv.org/abs/2401.13179v3
Inference under partial identification with minimax test statistics
of statistics based on an outer minimization of an inner maximization. Such
test statistics, which arise frequently in moment models, are of special
interest in providing hypothesis tests under partial identification. Under
general conditions, we provide an asymptotic characterization of such test
statistics using the minimax theorem, and a means of computing critical values
using the bootstrap. Making some light regularity assumptions, our results
augment several asymptotic approximations that have been provided for partially
identified hypothesis tests, and extend them by mitigating their dependence on
local linear approximations of the parameter space. These asymptotic results
are generally simple to state and straightforward to compute (esp.\
adversarially).
arXiv link: http://arxiv.org/abs/2401.13057v2
Interpreting Event-Studies from Recent Difference-in-Differences Methods
recent difference-in-differences methods. I show that even when specialized to
the case of non-staggered treatment timing, the default plots produced by
software for three of the most popular recent methods (de Chaisemartin and
D'Haultfoeuille, 2020; Callaway and SantAnna, 2021; Borusyak, Jaravel and
Spiess, 2024) do not match those of traditional two-way fixed effects (TWFE)
event-studies: the new methods may show a kink or jump at the time of treatment
even when the TWFE event-study shows a straight line. This difference stems
from the fact that the new methods construct the pre-treatment coefficients
asymmetrically from the post-treatment coefficients. As a result, visual
heuristics for analyzing TWFE event-study plots should not be immediately
applied to those from these methods. I conclude with practical recommendations
for constructing and interpreting event-study plots when using these methods.
arXiv link: http://arxiv.org/abs/2401.12309v1
Temporal Aggregation for the Synthetic Control Method
impact of a treatment on a single unit with panel data. Two challenges arise
with higher frequency data (e.g., monthly versus yearly): (1) achieving
excellent pre-treatment fit is typically more challenging; and (2) overfitting
to noise is more likely. Aggregating data over time can mitigate these problems
but can also destroy important signal. In this paper, we bound the bias for SCM
with disaggregated and aggregated outcomes and give conditions under which
aggregating tightens the bounds. We then propose finding weights that balance
both disaggregated and aggregated series.
arXiv link: http://arxiv.org/abs/2401.12084v2
A Bracketing Relationship for Long-Term Policy Evaluation with Combined Experimental and Observational Data
credible long-term policy evaluation. The literature offers two key but
non-nested assumptions, namely the latent unconfoundedness (LU; Athey et al.,
2020) and equi-confounding bias (ECB; Ghassami et al., 2022) conditions, to
correct observational selection. Committing to the wrong assumption leads to
biased estimation. To mitigate such risks, we provide a novel bracketing
relationship (cf. Angrist and Pischke, 2009) repurposed for the setting with
data combination: the LU-based estimand and the ECB-based estimand serve as the
lower and upper bounds, respectively, with the true causal effect lying in
between if either assumption holds. For researchers further seeking point
estimates, our Lalonde-style exercise suggests the conservatively more robust
LU-based lower bounds align closely with the hold-out experimental estimates
for educational policy evaluation. We investigate the economic substantives of
these findings through the lens of a nonparametric class of selection
mechanisms and sensitivity analysis. We uncover as key the sub-martingale
property and sufficient-statistics role (Chetty, 2009) of the potential
outcomes of student test scores (Chetty et al., 2011, 2014).
arXiv link: http://arxiv.org/abs/2401.12050v1
Local Identification in Instrumental Variable Multivariate Quantile Regression Models
and Hansen (2005), a one-dimensional unobserved rank variable monotonically
determines a single potential outcome. Even when multiple outcomes are
simultaneously of interest, it is common to apply the IVQR model to each of
them separately. This practice implicitly assumes that the rank variable of
each regression model affects only the corresponding outcome and does not
affect the other outcomes. In reality, however, it is often the case that all
rank variables together determine the outcomes, which leads to a systematic
correlation between the outcomes. To deal with this, we propose a nonlinear IV
model that allows for multivariate unobserved heterogeneity, each of which is
considered as a rank variable for an observed outcome. We show that the
structural function of our model is locally identified under the assumption
that the IV and the treatment variable are sufficiently positively correlated.
arXiv link: http://arxiv.org/abs/2401.11422v3
Estimation with Pairwise Observations
regression model. The procedure is not driven by the optimisation of any
objective function rather, it is a simple weighted average of slopes from
observation pairs. The paper shows that such estimator is consistent for
carefully selected weights. Other properties, such as asymptotic distributions,
have also been derived to facilitate valid statistical inference. Unlike
traditional methods, such as Least Squares and Maximum Likelihood, among
others, the estimated residual of this estimator is not by construction
orthogonal to the explanatory variables of the model. This property allows a
wide range of practical applications, such as the testing of endogeneity, i.e.,
the correlation between the explanatory variables and the disturbance terms.
arXiv link: http://arxiv.org/abs/2401.11229v2
Information Based Inference in Models with Set-Valued Predictions and Misspecification
identified parameters in incomplete models that is valid both when the model is
correctly specified and when it is misspecified. Key features of the method
are: (i) it is based on minimizing a suitably defined Kullback-Leibler
information criterion that accounts for incompleteness of the model and
delivers a non-empty pseudo-true set; (ii) it is computationally tractable;
(iii) its implementation is the same for both correctly and incorrectly
specified models; (iv) it exploits all information provided by variation in
discrete and continuous covariates; (v) it relies on Rao's score statistic,
which is shown to be asymptotically pivotal.
arXiv link: http://arxiv.org/abs/2401.11046v1
When the Universe is Too Big: Bounding Consideration Probabilities for Plackett-Luce Rankings
items by making repeated choices from a universe of items. But in many cases
the universe is too big for people to plausibly consider all options. In the
choice literature, this issue has been addressed by supposing that individuals
first sample a small consideration set and then choose among the considered
items. However, inferring unobserved consideration sets (or item consideration
probabilities) in this "consider then choose" setting poses significant
challenges, because even simple models of consideration with strong
independence assumptions are not identifiable, even if item utilities are
known. We apply the consider-then-choose framework to top-$k$ rankings, where
we assume rankings are constructed according to a Plackett-Luce model after
sampling a consideration set. While item consideration probabilities remain
non-identified in this setting, we prove that we can infer bounds on the
relative values of consideration probabilities. Additionally, given a condition
on the expected consideration set size and known item utilities, we derive
absolute upper and lower bounds on item consideration probabilities. We also
provide algorithms to tighten those bounds on consideration probabilities by
propagating inferred constraints. Thus, we show that we can learn useful
information about consideration probabilities despite not being able to
identify them precisely. We demonstrate our methods on a ranking dataset from a
psychology experiment with two different ranking tasks (one with fixed
consideration sets and one with unknown consideration sets). This combination
of data allows us to estimate utilities and then learn about unknown
consideration probabilities using our bounds.
arXiv link: http://arxiv.org/abs/2401.11016v2
Nowcasting economic activity in European regions using a mixed-frequency dynamic factor model
planning, implementing and evaluating locally targeted economic policies.
However, European regional accounts for output are published at an annual
frequency and with a two-year delay. To obtain robust and more timely measures
in a computationally efficient manner, we propose a mixed-frequency dynamic
factor model that accounts for national information to produce high-frequency
estimates of the regional gross value added (GVA). We show that our model
produces reliable nowcasts of GVA in 162 regions across 12 European countries.
arXiv link: http://arxiv.org/abs/2401.10054v1
A Quantile Nelson-Siegel model
perspective. Building on the dynamic Nelson-Siegel model of Diebold et al.
(2006), we extend its traditional mean-based approach to a quantile regression
setting, enabling the estimation of yield curve factors - level, slope, and
curvature - at specific quantiles of the conditional distribution. A key
advantage of our framework is its ability to characterize the entire
conditional distribution of the yield curve across maturities and over time. In
an empirical analysis of the U.S. term structure of interest rates, our method
demonstrates superior out-of-sample forecasting performance, particularly in
capturing the tails of the yield distribution - an aspect increasingly
emphasized in the recent literature on distributional forecasting. In addition
to its forecasting advantages, our approach reveals rich distributional
features beyond the mean. In particular, we find that the dynamic changes in
these distributional features differ markedly between the Great Recession and
the COVID-19 pandemic period, highlighting a fundamental shift in how interest
rate markets respond to distinct economic shocks.
arXiv link: http://arxiv.org/abs/2401.09874v2
Assessing the impact of forced and voluntary behavioral changes on economic-epidemiological co-dynamics: A comparative case study between Belgium and Sweden during the 2020 COVID-19 pandemic
population behavior to prevent their healthcare systems from collapsing. Sweden
adopted a strategy centered on voluntary sanitary recommendations while Belgium
resorted to mandatory measures. Their consequences on pandemic progression and
associated economic impacts remain insufficiently understood. This study
leverages the divergent policies of Belgium and Sweden during the COVID-19
pandemic to relax the unrealistic -- but persistently used -- assumption that
social contacts are not influenced by an epidemic's dynamics. We develop an
epidemiological-economic co-simulation model where pandemic-induced behavioral
changes are a superposition of voluntary actions driven by fear, prosocial
behavior or social pressure, and compulsory compliance with government
directives. Our findings emphasize the importance of early responses, which
reduce the stringency of measures necessary to safeguard healthcare systems and
minimize ensuing economic damage. Voluntary behavioral changes lead to a
pattern of recurring epidemics, which should be regarded as the natural
long-term course of pandemics. Governments should carefully consider prolonging
lockdown longer than necessary because this leads to higher economic damage and
a potentially higher second surge when measures are released. Our model can aid
policymakers in the selection of an appropriate long-term strategy that
minimizes economic damage.
arXiv link: http://arxiv.org/abs/2401.08442v1
Causal Machine Learning for Moderation Effects
(treatments) on average and for subgroups. The causal machine learning
literature has recently provided tools for estimating group average treatment
effects (GATE) to better describe treatment heterogeneity. This paper addresses
the challenge of interpreting such differences in treatment effects between
groups while accounting for variations in other covariates. We propose a new
parameter, the balanced group average treatment effect (BGATE), which measures
a GATE with a specific distribution of a priori-determined covariates. By
taking the difference between two BGATEs, we can analyze heterogeneity more
meaningfully than by comparing two GATEs, as we can separate the difference due
to the different distributions of other variables and the difference due to the
variable of interest. The main estimation strategy for this parameter is based
on double/debiased machine learning for discrete treatments in an
unconfoundedness setting, and the estimator is shown to be
$N$-consistent and asymptotically normal under standard conditions. We
propose two additional estimation strategies: automatic debiased machine
learning and a specific reweighting procedure. Last, we demonstrate the
usefulness of these parameters in a small-scale simulation study and in an
empirical example.
arXiv link: http://arxiv.org/abs/2401.08290v3
A Note on Uncertainty Quantification for Maximum Likelihood Parameters Estimated with Heuristic Based Optimization Algorithms
researcher inference. Heuristic-based algorithms are able to “break free" of
these local optima to eventually converge to the true global optimum. However,
given that they do not provide the gradient/Hessian needed to approximate the
covariance matrix and that the significantly longer computational time they
require for convergence likely precludes resampling procedures for inference,
researchers often are unable to quantify uncertainty in the estimates they
derive with these methods. This note presents a simple and relatively fast
two-step procedure to estimate the covariance matrix for parameters estimated
with these algorithms. This procedure relies on automatic differentiation, a
computational means of calculating derivatives that is popular in machine
learning applications. A brief empirical example demonstrates the advantages of
this procedure relative to bootstrapping and shows the similarity in standard
error estimates between this procedure and that which would normally accompany
maximum likelihood estimation with a gradient-based algorithm.
arXiv link: http://arxiv.org/abs/2401.07176v1
Inference for Synthetic Controls via Refined Placebo Tests
unit and a small number of control units. A common inferential task in this
setting is to test null hypotheses regarding the average treatment effect on
the treated. Inference procedures that are justified asymptotically are often
unsatisfactory due to (1) small sample sizes that render large-sample
approximation fragile and (2) simplification of the estimation procedure that
is implemented in practice. An alternative is permutation inference, which is
related to a common diagnostic called the placebo test. It has provable Type-I
error guarantees in finite samples without simplification of the method, when
the treatment is uniformly assigned. Despite this robustness, the placebo test
suffers from low resolution since the null distribution is constructed from
only $N$ reference estimates, where $N$ is the sample size. This creates a
barrier for statistical inference at a common level like $\alpha = 0.05$,
especially when $N$ is small. We propose a novel leave-two-out procedure that
bypasses this issue, while still maintaining the same finite-sample Type-I
error guarantee under uniform assignment for a wide range of $N$. Unlike the
placebo test whose Type-I error always equals the theoretical upper bound, our
procedure often achieves a lower unconditional Type-I error than theory
suggests; this enables useful inference in the challenging regime when $\alpha
< 1/N$. Empirically, our procedure achieves a higher power when the effect size
is reasonably large and a comparable power otherwise. We generalize our
procedure to non-uniform assignments and show how to conduct sensitivity
analysis. From a methodological perspective, our procedure can be viewed as a
new type of randomization inference different from permutation or rank-based
inference, which is particularly effective in small samples.
arXiv link: http://arxiv.org/abs/2401.07152v3
Bubble Modeling and Tagging: A Stochastic Nonlinear Autoregression Approach
when a bubble is formed. The economic or financial bubble, especially its
dynamics, is an intriguing topic that has been attracting longstanding
attention. To illustrate the dynamics of the local explosion itself, the paper
presents a novel, simple, yet useful time series model, called the stochastic
nonlinear autoregressive model, which is always strictly stationary and
geometrically ergodic and can create long swings or persistence observed in
many macroeconomic variables. When a nonlinear autoregressive coefficient is
outside of a certain range, the model has periodically explosive behaviors and
can then be used to portray the bubble dynamics. Further, the quasi-maximum
likelihood estimation (QMLE) of our model is considered, and its strong
consistency and asymptotic normality are established under minimal assumptions
on innovation. A new model diagnostic checking statistic is developed for model
fitting adequacy. In addition, two methods for bubble tagging are proposed, one
from the residual perspective and the other from the null-state perspective.
Monte Carlo simulation studies are conducted to assess the performances of the
QMLE and the two bubble tagging methods in finite samples. Finally, the
usefulness of the model is illustrated by an empirical application to the
monthly Hang Seng Index.
arXiv link: http://arxiv.org/abs/2401.07038v2
Deep Learning With DAGs
variables or events. Although directed acyclic graphs (DAGs) are increasingly
used to represent these theories, their full potential has not yet been
realized in practice. As non-parametric causal models, DAGs require no
assumptions about the functional form of the hypothesized relationships.
Nevertheless, to simplify the task of empirical evaluation, researchers tend to
invoke such assumptions anyway, even though they are typically arbitrary and do
not reflect any theoretical content or prior knowledge. Moreover, functional
form assumptions can engender bias, whenever they fail to accurately capture
the complexity of the causal system under investigation. In this article, we
introduce causal-graphical normalizing flows (cGNFs), a novel approach to
causal inference that leverages deep neural networks to empirically evaluate
theories represented as DAGs. Unlike conventional approaches, cGNFs model the
full joint distribution of the data according to a DAG supplied by the analyst,
without relying on stringent assumptions about functional form. In this way,
the method allows for flexible, semi-parametric estimation of any causal
estimand that can be identified from the DAG, including total effects,
conditional effects, direct and indirect effects, and path-specific effects. We
illustrate the method with a reanalysis of Blau and Duncan's (1967) model of
status attainment and Zhou's (2019) model of conditional versus controlled
mobility. To facilitate adoption, we provide open-source software together with
a series of online tutorials for implementing cGNFs. The article concludes with
a discussion of current limitations and directions for future development.
arXiv link: http://arxiv.org/abs/2401.06864v1
Robust Analysis of Short Panels
probability distributions one may wish to place minimal restrictions. Leading
examples in panel data models are individual-specific variables sometimes
treated as "fixed effects" and, in dynamic models, initial conditions. This
paper presents a generally applicable method for characterizing sharp
identified sets when models place no restrictions on the probability
distribution of certain latent variables and no restrictions on their
covariation with other variables. In our analysis latent variables on which
restrictions are undesirable are removed, leading to econometric analysis
robust to misspecification of restrictions on their distributions which are
commonplace in the applied panel data literature. Endogenous explanatory
variables are easily accommodated. Examples of application to some static and
dynamic binary, ordered and multiple discrete choice and censored panel data
models are presented.
arXiv link: http://arxiv.org/abs/2401.06611v1
Exposure effects are not automatically useful for policymaking
opportunity to share our perspective as social scientists. In his article,
Savje recommends misspecified exposure effects as a way to avoid strong
assumptions about interference when analyzing the results of an experiment. In
this invited discussion, we highlight a limiation of Savje's recommendation:
exposure effects are not generally useful for evaluating social policies
without the strong assumptions that Savje seeks to avoid.
arXiv link: http://arxiv.org/abs/2401.06264v2
Covariance Function Estimation for High-Dimensional Functional Time Series with Dual Factor Structures
high-dimensional functional time series. In this model, a high-dimensional
fully functional factor parametrisation is imposed on the observed functional
processes, whereas a low-dimensional version (via series approximation) is
assumed for the latent functional factors. We extend the classic principal
component analysis technique for the estimation of a low-rank structure to the
estimation of a large covariance matrix of random functions that satisfies a
notion of (approximate) functional "low-rank plus sparse" structure; and
generalise the matrix shrinkage method to functional shrinkage in order to
estimate the sparse structure of functional idiosyncratic components. Under
appropriate regularity conditions, we derive the large sample theory of the
developed estimators, including the consistency of the estimated factors and
functional factor loadings and the convergence rates of the estimated matrices
of covariance functions measured by various (functional) matrix norms.
Consistent selection of the number of factors and a data-driven rule to choose
the shrinkage parameter are discussed. Simulation and empirical studies are
provided to demonstrate the finite-sample performance of the developed model
and estimation methodology.
arXiv link: http://arxiv.org/abs/2401.05784v2
On Efficient Inference of Causal Effects with Multiple Mediators
effects involving multiple interacting mediators. Most existing works either
impose a linear model assumption among the mediators or are restricted to
handle conditionally independent mediators given the exposure. To overcome
these limitations, we define causal and individual mediation effects in a
general setting, and employ a semiparametric framework to develop quadruply
robust estimators for these causal effects. We further establish the asymptotic
normality of the proposed estimators and prove their local semiparametric
efficiencies. The proposed method is empirically validated via simulated and
real datasets concerning psychiatric disorders in trauma survivors.
arXiv link: http://arxiv.org/abs/2401.05517v1
A Deep Learning Representation of Spatial Interaction Model for Resilient Spatial Planning of Community Business Clusters
complex and context-aware interactions between business clusters and trade
areas. To address the limitation, we propose a SIM-GAT model to predict
spatiotemporal visitation flows between community business clusters and their
trade areas. The model innovatively represents the integrated system of
business clusters, trade areas, and transportation infrastructure within an
urban region using a connected graph. Then, a graph-based deep learning model,
i.e., Graph AttenTion network (GAT), is used to capture the complexity and
interdependencies of business clusters. We developed this model with data
collected from the Miami metropolitan area in Florida. We then demonstrated its
effectiveness in capturing varying attractiveness of business clusters to
different residential neighborhoods and across scenarios with an eXplainable AI
approach. We contribute a novel method supplementing conventional SIMs to
predict and analyze the dynamics of inter-connected community business
clusters. The analysis results can inform data-evidenced and place-specific
planning strategies helping community business clusters better accommodate
their customers across scenarios, and hence improve the resilience of community
businesses.
arXiv link: http://arxiv.org/abs/2401.04849v1
IV Estimation of Panel Data Tobit Models with Normal Errors
in a censored regression model with normal errors. This paper demonstrates that
a similar approach can be used to construct moment conditions for
fixed--effects versions of the model considered by Amemiya. This result
suggests estimators for models that have not previously been considered.
arXiv link: http://arxiv.org/abs/2401.04803v1
Robust Bayesian Method for Refutable Models
by some data distributions. The econometrician starts with a refutable
structural assumption which can be written as the intersection of several
assumptions. To avoid the assumption refutable, the econometrician first takes
a stance on which assumption $j$ will be relaxed and considers a function $m_j$
that measures the deviation from the assumption $j$. She then specifies a set
of prior beliefs $\Pi_s$ whose elements share the same marginal distribution
$\pi_{m_j}$ which measures the likelihood of deviations from assumption $j$.
Compared to the standard Bayesian method that specifies a single prior, the
robust Bayesian method allows the econometrician to take a stance only on the
likeliness of violation of assumption $j$ while leaving other features of the
model unspecified. We show that many frequentist approaches to relax refutable
assumptions are equivalent to particular choices of robust Bayesian prior sets,
and thus we give a Bayesian interpretation to the frequentist methods. We use
the local average treatment effect ($LATE$) in the potential outcome framework
as the leading illustrating example.
arXiv link: http://arxiv.org/abs/2401.04512v3
Teacher bias or measurement error?
educational trajectories. Previous studies have shown that students of low
socioeconomic status (SES) receive worse subjective evaluations than their high
SES peers, even when they score similarly on objective standardized tests. This
is often interpreted as evidence of teacher bias. Measurement error in test
scores challenges this interpretation. We discuss how both classical and
non-classical measurement error in test scores generate a biased coefficient of
the conditional SES gap, and consider three empirical strategies to address
this bias. Using administrative data from the Netherlands, where secondary
school track recommendations are pivotal teacher judgments, we find that
measurement error explains 35 to 43% of the conditional SES gap in track
recommendations.
arXiv link: http://arxiv.org/abs/2401.04200v4
Robust Estimation in Network Vector Autoregression with Nonstationary Regressors
autoregressive model with nonstationary regressors. In particular, network
dependence is characterized by a nonstochastic adjacency matrix. The
information set includes a stationary regressand and a node-specific vector of
nonstationary regressors, both observed at the same equally spaced time
frequencies. Our proposed econometric specification correponds to the NVAR
model under time series nonstationarity which relies on the local-to-unity
parametrization for capturing the unknown form of persistence of these
node-specific regressors. Robust econometric estimation is achieved using an
IVX-type estimator and the asymptotic theory analysis for the augmented vector
of regressors is studied based on a double asymptotic regime where both the
network size and the time dimension tend to infinity.
arXiv link: http://arxiv.org/abs/2401.04050v1
Identification with possibly invalid IVs
quasi-instrumental variables (quasi-IVs). A quasi-IV is a relevant but possibly
invalid IV because it is not exogenous or not excluded. We show that a variety
of models with discrete or continuous endogenous treatment which are usually
identified with an IV - quantile models with rank invariance, additive models
with homogenous treatment effects, and local average treatment effect models -
can be identified under the joint relevance of two complementary quasi-IVs
instead. To achieve identification, we complement one excluded but possibly
endogenous quasi-IV (e.g., "relevant proxies" such as lagged treatment choice)
with one exogenous (conditional on the excluded quasi-IV) but possibly included
quasi-IV (e.g., random assignment or exogenous market shocks). Our approach
also holds if any of the two quasi-IVs turns out to be a valid IV. In practice,
being able to address endogeneity with complementary quasi-IVs instead of IVs
is convenient since there are many applications where quasi-IVs are more
readily available. Difference-in-differences is a notable example: time is an
exogenous quasi-IV while the group assignment acts as a complementary excluded
quasi-IV.
arXiv link: http://arxiv.org/abs/2401.03990v4
Adaptive Experimental Design for Policy Learning
aiming to design an adaptive experiment to identify the best treatment arm
conditioned on contextual information (covariates). We consider a
decision-maker who assigns treatment arms to experimental units during an
experiment and recommends the estimated best treatment arm based on the
contexts at the end of the experiment. The decision-maker uses a policy for
recommendations, which is a function that provides the estimated best treatment
arm given the contexts. In our evaluation, we focus on the worst-case expected
regret, a relative measure between the expected outcomes of an optimal policy
and our proposed policy. We derive a lower bound for the expected simple regret
and then propose a strategy called Adaptive Sampling-Policy Learning (PLAS). We
prove that this strategy is minimax rate-optimal in the sense that its leading
factor in the regret upper bound matches the lower bound as the number of
experimental units increases.
arXiv link: http://arxiv.org/abs/2401.03756v4
Counterfactuals in factor models
values of a (possibly continuous) treatment, are linked through common factors.
The factors can be estimated using a panel of regressors. We propose a
procedure to estimate time-specific and unit-specific average marginal effects
in this context. Our approach can be used either with high-dimensional time
series or with large panels. It allows for treatment effects heterogenous
across time and units and is straightforward to implement since it only relies
on principal components analysis and elementary computations. We derive the
asymptotic distribution of our estimator of the average marginal effect and
highlight its solid finite sample performance through a simulation exercise.
The approach can also be used to estimate average counterfactuals or adapted to
an instrumental variables setting and we discuss these extensions. Finally, we
illustrate our novel methodology through an empirical application on income
inequality.
arXiv link: http://arxiv.org/abs/2401.03293v1
Roughness Signature Functions
which was used to measure the activity of a semimartingale, this paper
introduces the roughness signature function. The paper illustrates how it can
be used to determine whether a discretely observed process is generated by a
continuous process that is rougher than a Brownian motion, a pure-jump process,
or a combination of the two. Further, if a continuous rough process is present,
the function gives an estimate of the roughness index. This is done through an
extensive simulation study, where we find that the roughness signature function
works as expected on rough processes. We further derive some asymptotic
properties of this new signature function. The function is applied empirically
to three different volatility measures for the S&P500 index. The three measures
are realized volatility, the VIX, and the option-extracted volatility estimator
of Todorov (2019). The realized volatility and option-extracted volatility show
signs of roughness, with the option-extracted volatility appearing smoother
than the realized volatility, while the VIX appears to be driven by a
continuous martingale with jumps.
arXiv link: http://arxiv.org/abs/2401.02819v1
Efficient Computation of Confidence Sets Using Classification on Equidistributed Grids
of the true parameters. Confidence sets (CS) of the true parameters are derived
by inverting these tests. However, they often lack analytical expressions,
necessitating a grid search to obtain the CS numerically by retaining the grid
points that pass the test. When the statistic is not asymptotically pivotal,
constructing the critical value for each grid point in the parameter space adds
to the computational burden. In this paper, we convert the computational issue
into a classification problem by using a support vector machine (SVM)
classifier. Its decision function provides a faster and more systematic way of
dividing the parameter space into two regions: inside vs. outside of the
confidence set. We label those points in the CS as 1 and those outside as -1.
Researchers can train the SVM classifier on a grid of manageable size and use
it to determine whether points on denser grids are in the CS or not. We
establish certain conditions for the grid so that there is a tuning that allows
us to asymptotically reproduce the test in the CS. This means that in the
limit, a point is classified as belonging to the confidence set if and only if
it is labeled as 1 by the SVM.
arXiv link: http://arxiv.org/abs/2401.01804v2
Model Averaging and Double Machine Learning
stacking, a model averaging method for combining multiple candidate learners,
to estimate structural parameters. In addition to conventional stacking, we
consider two stacking variants available for DDML: short-stacking exploits the
cross-fitting step of DDML to substantially reduce the computational burden and
pooled stacking enforces common stacking weights over cross-fitting folds.
Using calibrated simulation studies and two applications estimating gender gaps
in citations and wages, we show that DDML with stacking is more robust to
partially unknown functional forms than common alternative approaches based on
single pre-selected learners. We provide Stata and R software implementing our
proposals.
arXiv link: http://arxiv.org/abs/2401.01645v2
Classification and Treatment Learning with Constraints via Composite Heaviside Optimization: a Progressive MIP Method
a progressive (mixed) integer programming (PIP) method for solving multi-class
classification and multi-action treatment problems with constraints. A
Heaviside composite function is a composite of a Heaviside function (i.e., the
indicator function of either the open $( \, 0,\infty )$ or closed $[ \,
0,\infty \, )$ interval) with a possibly nondifferentiable function.
Modeling-wise, we show how Heaviside composite optimization provides a unified
formulation for learning the optimal multi-class classification and
multi-action treatment rules, subject to rule-dependent constraints stipulating
a variety of domain restrictions. A Heaviside composite function has an
equivalent discrete formulation, and the resulting optimization problem can in
principle be solved by integer programming (IP) methods. Nevertheless, for
constrained learning problems with large data sets, a straightforward
application of off-the-shelf IP solvers is usually ineffective in achieving
global optimality. To alleviate such a computational burden, our major
contribution is the proposal of the PIP method by leveraging the effectiveness
of state-of-the-art IP solvers for problems of modest sizes. We provide the
theoretical advantage of the PIP method with the connection to continuous
optimization and show that the computed solution is locally optimal for a broad
class of Heaviside composite optimization problems. The numerical performance
of the PIP method is demonstrated by extensive computational experimentation.
arXiv link: http://arxiv.org/abs/2401.01565v2
Robust Inference for Multiple Predictive Regressions with an Application on Bond Risk Premia
multiple predictors that could be highly persistent. Our method improves the
popular extended instrumental variable (IVX) testing (Phillips and Lee, 2013;
Kostakis et al., 2015) in that, besides addressing the two bias effects found
in Hosseinkouchack and Demetrescu (2021), we find and deal with the
variance-enlargement effect. We show that two types of higher-order terms
induce these distortion effects in the test statistic, leading to significant
over-rejection for one-sided tests and tests in multiple predictive
regressions. Our improved IVX-based test includes three steps to tackle all the
issues above regarding finite sample bias and variance terms. Thus, the test
statistics perform well in size control, while its power performance is
comparable with the original IVX. Monte Carlo simulations and an empirical
study on the predictability of bond risk premia are provided to demonstrate the
effectiveness of the newly proposed approach.
arXiv link: http://arxiv.org/abs/2401.01064v1
Changes-in-Changes for Ordered Choice Models: Too Many "False Zeros"?
ordered outcomes, building upon elements from a continuous Changes-in-Changes
model. We focus on outcomes derived from self-reported survey data eliciting
socially undesirable, illegal, or stigmatized behaviors like tax evasion or
substance abuse, where too many "false zeros", or more broadly, underreporting
are likely. We start by providing a characterization for parallel trends within
a general threshold-crossing model. We then propose a partial and point
identification framework for different distributional treatment effects when
the outcome is subject to underreporting. Applying our methodology, we
investigate the impact of recreational marijuana legalization for adults in
several U.S. states on the short-term consumption behavior of 8th-grade
high-school students. The results indicate small, but significant increases in
consumption probabilities at each level. These effects are further amplified
upon accounting for misreporting.
arXiv link: http://arxiv.org/abs/2401.00618v3
How industrial clusters influence the growth of the regional GDP: A spatial-approach
from German NUTS 3 regions. Our goal is to gain a deeper understanding of the
significance and interdependence of industry clusters in shaping the dynamics
of GDP. To achieve a more nuanced spatial differentiation, we introduce
indicator matrices for each industry sector which allows for extending the
spatial Durbin model to a new version of it. This approach is essential due to
both the economic importance of these sectors and the potential issue of
omitted variables. Failing to account for industry sectors can lead to omitted
variable bias and estimation problems. To assess the effects of the major
industry sectors, we incorporate eight distinct branches of industry into our
analysis. According to prevailing economic theory, these clusters should have a
positive impact on the regions they are associated with. Our findings indeed
reveal highly significant impacts, which can be either positive or negative, of
specific sectors on local GDP growth. Spatially, we observe that direct and
indirect effects can exhibit opposite signs, indicative of heightened
competitiveness within and between industry sectors. Therefore, we recommend
that industry sectors should be taken into consideration when conducting
spatial analysis of GDP. Doing so allows for a more comprehensive understanding
of the economic dynamics at play.
arXiv link: http://arxiv.org/abs/2401.10261v1
Identification of Nonlinear Dynamic Panels under Partial Stationarity
nonlinear panel data models, including binary choice, ordered response, and
other types of limited dependent variable models. Our approach accommodates
dynamic models with any number of lagged dependent variables as well as other
types of endogenous covariates. Our identification strategy relies on a partial
stationarity condition, which allows for not only an unknown distribution of
errors, but also temporal dependencies in errors. We derive partial
identification results under flexible model specifications and establish
sharpness of our identified set in the binary choice setting. We demonstrate
the robust finite-sample performance of our approach using Monte Carlo
simulations, and apply the approach to analyze the empirical application of
income categories using various ordered choice models.
arXiv link: http://arxiv.org/abs/2401.00264v4
Forecasting CPI inflation under economic policy and geopolitical uncertainties
for both academics and policymakers at the central banks. This study introduces
a filtered ensemble wavelet neural network (FEWNet) to forecast CPI inflation,
which is tested on BRIC countries. FEWNet breaks down inflation data into high
and low-frequency components using wavelets and utilizes them along with other
economic factors (economic policy uncertainty and geopolitical risk) to produce
forecasts. All the wavelet-transformed series and filtered exogenous variables
are fed into downstream autoregressive neural networks to make the final
ensemble forecast. Theoretically, we show that FEWNet reduces the empirical
risk compared to fully connected autoregressive neural networks. FEWNet is more
accurate than other forecasting methods and can also estimate the uncertainty
in its predictions due to its capacity to effectively capture non-linearities
and long-range dependencies in the data through its adaptable architecture.
This makes FEWNet a valuable tool for central banks to manage inflation.
arXiv link: http://arxiv.org/abs/2401.00249v2
Robust Inference in Panel Data Models: Some Effects of Heteroskedasticity and Leveraged Data in Small Samples
estimators of the variance become inefficient and statistical inference
conducted with invalid standard errors leads to misleading rejection rates.
Despite a vast cross-sectional literature on the downward bias of robust
standard errors, the problem is not extensively covered in the panel data
framework. We investigate the consequences of the simultaneous presence of
small sample size, heteroskedasticity and data points that exhibit extreme
values in the covariates ('good leverage points') on the statistical inference.
Focusing on one-way linear panel data models, we examine asymptotic and finite
sample properties of a battery of heteroskedasticity-consistent estimators
using Monte Carlo simulations. We also propose a hybrid estimator of the
variance-covariance matrix. Results show that conventional standard errors are
always dominated by more conservative estimators of the variance, especially in
small samples. In addition, all types of HC standard errors have excellent
performances in terms of size and power tests under homoskedasticity.
arXiv link: http://arxiv.org/abs/2312.17676v1
Decision Theory for Treatment Choice Problems with Partial Identification
choice problems with partial identification. We show that, in a general class
of problems with Gaussian likelihood, all decision rules are admissible; it is
maximin-welfare optimal to ignore all data; and, for severe enough partial
identification, there are infinitely many minimax-regret optimal decision
rules, all of which sometimes randomize the policy recommendation. We uniquely
characterize the minimax-regret optimal rule that least frequently randomizes,
and show that, in some cases, it can outperform other minimax-regret optimal
rules in terms of what we term profiled regret. We analyze the implications of
our results in the aggregation of experimental estimates for policy adoption,
extrapolation of Local Average Treatment Effects, and policy making in the
presence of omitted variable bias.
arXiv link: http://arxiv.org/abs/2312.17623v3
Bayesian Analysis of High Dimensional Vector Error Correction Model
cointegration relationships amongst multivariate non-stationary time series. In
this paper, we focus on high dimensional setting and seek for
sample-size-efficient methodology to determine the level of cointegration. Our
investigation centres at a Bayesian approach to analyse the cointegration
matrix, henceforth determining the cointegration rank. We design two algorithms
and implement them on simulated examples, yielding promising results
particularly when dealing with high number of variables and relatively low
number of observations. Furthermore, we extend this methodology to empirically
investigate the constituents of the S&P 500 index, where low-volatility
portfolios can be found during both in-sample training and out-of-sample
testing periods.
arXiv link: http://arxiv.org/abs/2312.17061v2
Development of Choice Model for Brand Evaluation
how personal preferences of decision makers (customers) for products influence
demand at the level of the individual. The contemporary choice theory is built
upon the characteristics of the decision maker, alternatives available for the
choice of the decision maker, the attributes of the available alternatives and
decision rules that the decision maker uses to make a choice. The choice set in
our research is represented by six major brands (products) of laundry
detergents in the Japanese market. We use the panel data of the purchases of 98
households to which we apply the hierarchical probit model, facilitated by a
Markov Chain Monte Carlo simulation (MCMC) in order to evaluate the brand
values of six brands. The applied model also allows us to evaluate the tangible
and intangible brand values. These evaluated metrics help us to assess the
brands based on their tangible and intangible characteristics. Moreover,
consumer choice modeling also provides a framework for assessing the
environmental performance of laundry detergent brands as the model uses the
information on components (physical attributes) of laundry detergents.
arXiv link: http://arxiv.org/abs/2312.16927v1
Modeling Systemic Risk: A Time-Varying Nonparametric Causal Inference Framework
(TV-DIG) framework to estimate the evolving causal structure in time series
networks, thereby addressing the limitations of traditional econometric models
in capturing high-dimensional, nonlinear, and time-varying interconnections
among series. This framework employs an information-theoretic measure rooted in
a generalized version of Granger-causality, which is applicable to both linear
and nonlinear dynamics. Our framework offers advancements in measuring systemic
risk and establishes meaningful connections with established econometric
models, including vector autoregression and switching models. We evaluate the
efficacy of our proposed model through simulation experiments and empirical
analysis, reporting promising results in recovering simulated time-varying
networks with nonlinear and multivariate structures. We apply this framework to
identify and monitor the evolution of interconnectedness and systemic risk
among major assets and industrial sectors within the financial network. We
focus on cryptocurrencies' potential systemic risks to financial stability,
including spillover effects on other sectors during crises like the COVID-19
pandemic and the Federal Reserve's 2020 emergency response. Our findings
reveals significant, previously underrecognized pre-2020 influences of
cryptocurrencies on certain financial sectors, highlighting their potential
systemic risks and offering a systematic approach in tracking evolving
cross-sector interactions within financial networks.
arXiv link: http://arxiv.org/abs/2312.16707v1
Best-of-Both-Worlds Linear Contextual Bandits
an instance of the multi-armed bandit problem, under an adversarial corruption.
At each round, a decision-maker observes an independent and identically
distributed context and then selects an arm based on the context and past
observations. After selecting an arm, the decision-maker incurs a loss
corresponding to the selected arm. The decision-maker aims to minimize the
cumulative loss over the trial. The goal of this study is to develop a strategy
that is effective in both stochastic and adversarial environments, with
theoretical guarantees. We first formulate the problem by introducing a novel
setting of bandits with adversarial corruption, referred to as the contextual
adversarial regime with a self-bounding constraint. We assume linear models for
the relationship between the loss and the context. Then, we propose a strategy
that extends the RealLinExp3 by Neu & Olkhovskaya (2020) and the
Follow-The-Regularized-Leader (FTRL). The regret of our proposed algorithm is
shown to be upper-bounded by $O\left(\min\left\{(\log(T))^3{\Delta_{*}}
+ \frac{C(\log(T))^3{\Delta_{*}}},\ \
T(\log(T))^2\right\}\right)$, where $T \inN$ is the number of
rounds, $\Delta_{*} > 0$ is the constant minimum gap between the best and
suboptimal arms for any context, and $C\in[0, T] $ is an adversarial corruption
parameter. This regret upper bound implies
$O\left((\log(T))^3{\Delta_{*}}\right)$ in a stochastic environment and
by $O\left( T(\log(T))^2\right)$ in an adversarial environment. We refer
to our strategy as the Best-of-Both-Worlds (BoBW) RealFTRL, due to its
theoretical guarantees in both stochastic and adversarial regimes.
arXiv link: http://arxiv.org/abs/2312.16489v1
Incentive-Aware Synthetic Control: Accurate Counterfactual Estimation via Incentivized Exploration
approach used to estimate the treatment effect on the treated in a panel data
setting. We shed light on a frequently overlooked but ubiquitous assumption
made in SCMs of "overlap": a treated unit can be written as some combination --
typically, convex or linear combination -- of the units that remain under
control. We show that if units select their own interventions, and there is
sufficiently large heterogeneity between units that prefer different
interventions, overlap will not hold. We address this issue by proposing a
framework which incentivizes units with different preferences to take
interventions they would not normally consider. Specifically, leveraging tools
from information design and online learning, we propose a SCM that incentivizes
exploration in panel data settings by providing incentive-compatible
intervention recommendations to units. We establish this estimator obtains
valid counterfactual estimates without the need for an a priori overlap
assumption. We extend our results to the setting of synthetic interventions,
where the goal is to produce counterfactual outcomes under all interventions,
not just control. Finally, we provide two hypothesis tests for determining
whether unit overlap holds for a given panel dataset.
arXiv link: http://arxiv.org/abs/2312.16307v2
Direct Multi-Step Forecast based Comparison of Nested Models via an Encompassing Test
forecasts obtained from a pair of nested models that is based on the forecast
encompassing principle. Our proposed approach relies on an alternative way of
testing the population moment restriction implied by the forecast encompassing
principle and that links the forecast errors from the two competing models in a
particular way. Its key advantage is that it is able to bypass the variance
degeneracy problem afflicting model based forecast comparisons across nested
models. It results in a test statistic whose limiting distribution is standard
normal and which is particularly simple to construct and can accommodate both
single period and longer-horizon prediction comparisons. Inferences are also
shown to be robust to different predictor types, including stationary,
highly-persistent and purely deterministic processes. Finally, we illustrate
the use of our proposed approach through an empirical application that explores
the role of global inflation in enhancing individual country specific inflation
forecasts.
arXiv link: http://arxiv.org/abs/2312.16099v1
Pricing with Contextual Elasticity and Heteroscedastic Valuation
whether to purchase a product based on its features and price. We introduce a
novel approach to modeling a customer's expected demand by incorporating
feature-based price elasticity, which can be equivalently represented as a
valuation with heteroscedastic noise. To solve the problem, we propose a
computationally efficient algorithm called "Pricing with Perturbation (PwP)",
which enjoys an $O(dT\log T)$ regret while allowing arbitrary
adversarial input context sequences. We also prove a matching lower bound at
$\Omega(dT)$ to show the optimality regarding $d$ and $T$ (up to $\log
T$ factors). Our results shed light on the relationship between contextual
elasticity and heteroscedastic valuation, providing insights for effective and
practical pricing strategies.
arXiv link: http://arxiv.org/abs/2312.15999v1
Negative Control Falsification Tests for Instrumental Variable Designs
two types of falsification tests. We characterize these tests as conditional
independence tests between negative control variables -- proxies for unobserved
variables posing a threat to the identification -- and the IV or the outcome.
We describe the conditions that variables must satisfy in order to serve as
negative controls. We show that these falsification tests examine not only
independence and the exclusion restriction, but also functional form
assumptions. Our analysis reveals that conventional applications of these tests
may flag problems even in valid IV designs. We offer implementation guidance to
address these issues.
arXiv link: http://arxiv.org/abs/2312.15624v3
Zero-Inflated Bandits
which can significantly hinder learning efficiency. Leveraging problem-specific
structures for careful distribution modeling is recognized as essential for
improving estimation efficiency in statistics. However, this approach remains
under-explored in the context of bandits. To address this gap, we initiate the
study of zero-inflated bandits, where the reward is modeled using a classic
semi-parametric distribution known as the zero-inflated distribution. We
develop algorithms based on the Upper Confidence Bound and Thompson Sampling
frameworks for this specific structure. The superior empirical performance of
these methods is demonstrated through extensive numerical studies.
arXiv link: http://arxiv.org/abs/2312.15595v3
The Challenge of Using LLMs to Simulate Human Behavior: A Causal Inference Perspective
human behavior. We identify a fundamental challenge in using them to simulate
experiments: when LLM-simulated subjects are blind to the experimental design
(as is standard practice with human subjects), variations in treatment
systematically affect unspecified variables that should remain constant,
violating the unconfoundedness assumption. Using demand estimation as a context
and an actual experiment as a benchmark, we show this can lead to implausible
results. While confounding may in principle be addressed by controlling for
covariates, this can compromise ecological validity in the context of LLM
simulations: controlled covariates become artificially salient in the simulated
decision process, which introduces focalism. This trade-off between
unconfoundedness and ecological validity is usually absent in traditional
experimental design and represents a unique challenge in LLM simulations. We
formalize this challenge theoretically, showing it stems from ambiguous
prompting strategies, and hence cannot be fully addressed by improving training
data or by fine-tuning. Alternative approaches that unblind the experimental
design to the LLM show promise. Our findings suggest that effectively
leveraging LLMs for experimental simulations requires fundamentally rethinking
established experimental design practices rather than simply adapting protocols
developed for human subjects.
arXiv link: http://arxiv.org/abs/2312.15524v2
Variable Selection in High Dimensional Linear Regressions with Parameter Instability
instability. It distinguishes between signal and pseudo-signal variables that
are correlated with the target variable, and noise variables that are not, and
investigate the asymptotic properties of the One Covariate at a Time Multiple
Testing (OCMT) method proposed by Chudik et al. (2018) under parameter
insatiability. It is established that OCMT continues to asymptotically select
an approximating model that includes all the signals and none of the noise
variables. Properties of post selection regressions are also investigated, and
in-sample fit of the selected regression is shown to have the oracle property.
The theoretical results support the use of unweighted observations at the
selection stage of OCMT, whilst applying down-weighting of observations only at
the forecasting stage. Monte Carlo and empirical applications show that OCMT
without down-weighting at the selection stage yields smaller mean squared
forecast errors compared to Lasso, Adaptive Lasso, and boosting.
arXiv link: http://arxiv.org/abs/2312.15494v2
Stochastic Equilibrium the Lucas Critique and Keynesian Economics
regarding New Keynesian Dynamic Stochastic General Equilibrium. I develop a
formal concept of stochastic equilibrium. I prove uniqueness and necessity,
when agents are patient, with general application. Existence depends on
appropriately specified eigenvalue conditions. Otherwise, no solution of any
kind exists. I construct the equilibrium with Calvo pricing. I provide novel
comparative statics with the non-stochastic model of mathematical significance.
I uncover a bifurcation between neighbouring stochastic systems and
approximations taken from the Zero Inflation Non-Stochastic Steady State
(ZINSS). The correct Phillips curve agrees with the zero limit from the trend
inflation framework. It contains a large lagged inflation coefficient and a
small response to expected inflation. Price dispersion can be first or second
order depending how shocks are scaled. The response to the output gap is always
muted and is zero at standard parameters. A neutrality result is presented to
explain why and align Calvo with Taylor pricing. Present and lagged demand
shocks enter the Phillips curve so there is no Divine Coincidence and the
system is identified from structural shocks alone. The lagged inflation slope
is increasing in the inflation response, embodying substantive policy
trade-offs. The Taylor principle is reversed, inactive settings are necessary,
pointing towards inertial policy. The observational equivalence idea of the
Lucas critique is disproven. The bifurcation results from the breakdown of the
constraints implied by lagged nominal rigidity, associated with cross-equation
cancellation possible only at ZINSS. There is a dual relationship between
restrictions on the econometrician and constraints on repricing firms. Thus, if
the model is correct, goodness of fit will jump.
arXiv link: http://arxiv.org/abs/2312.16214v4
Functional CLTs for subordinated Lévy models in physics, finance, and econometrics
statistical mechanics, econometrics, mathematical finance, and insurance
mathematics, where (possibly subordinated) L\'evy noise arises as a scaling
limit of some form of continuous-time random walk (CTRW). For each application,
it is natural to rely on weak convergence results for stochastic integrals on
Skorokhod space in Skorokhod's J1 or M1 topologies. As compared to earlier and
entirely separate works, we are able to give a more streamlined account while
also allowing for greater generality and providing important new insights. For
each application, we first elucidate how the fundamental conclusions for J1
convergent CTRWs emerge as special cases of the same general principles, and we
then illustrate how the specific settings give rise to different results for
strictly M1 convergent CTRWs.
arXiv link: http://arxiv.org/abs/2312.15119v2
Exploring Distributions of House Prices and House Price Indices
distribution. Specifically, we analyze sale prices in the 1970-2010 window of
over 116,000 single-family homes in Hamilton County, Ohio, including Cincinnati
metro area of about 2.2 million people. We also analyze HPI, published by
Federal Housing Finance Agency (FHFA), for nearly 18,000 US ZIP codes that
cover a period of over 40 years starting in 1980's. If HP can be viewed as a
first derivative of income, HPI can be viewed as its second derivative. We use
generalized beta (GB) family of functions to fit distributions of HP and HPI
since GB naturally arises from the models of economic exchange described by
stochastic differential equations. Our main finding is that HP and multi-year
HPI exhibit a negative Dragon King (nDK) behavior, wherein power-law
distribution tail gives way to an abrupt decay to a finite upper limit value,
which is similar to our recent findings for realized volatility of S&P500
index in the US stock market. This type of tail behavior is best fitted by a
modified GB (mGB) distribution. Tails of single-year HPI appear to show more
consistency with power-law behavior, which is better described by a GB Prime
(GB2) distribution. We supplement full distribution fits by mGB and GB2 with
direct linear fits (LF) of the tails. Our numerical procedure relies on
evaluation of confidence intervals (CI) of the fits, as well as of p-values
that give the likelihood that data come from the fitted distributions.
arXiv link: http://arxiv.org/abs/2312.14325v1
RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation
personalized pricing, promotions, and product recommendation algorithms that
can leverage rich customer data to learn and earn. Systematic benchmarking and
evaluation of these causal learning systems remains a critical challenge, due
to the lack of suitable datasets and simulation environments. In this work, we
propose a multi-stage model for simulating customer shopping behavior that
captures important sources of heterogeneity, including price sensitivity and
past experiences. We embedded this model into a working simulation environment
-- RetailSynth. RetailSynth was carefully calibrated on publicly available
grocery data to create realistic synthetic shopping transactions. Multiple
pricing policies were implemented within the simulator and analyzed for impact
on revenue, category penetration, and customer retention. Applied researchers
can use RetailSynth to validate causal demand models for multi-category retail
and to incorporate realistic price sensitivity into emerging benchmarking
suites for personalized pricing, promotions, and product recommendations.
arXiv link: http://arxiv.org/abs/2312.14095v1
Binary Endogenous Treatment in Stochastic Frontier Models with an Application to Soil Conservation in El Salvador
Sustainable Development Goals set by the United Nations. To this end, many
international organizations have funded training and technology transfer
programs that aim to promote productivity and income growth, fight poverty and
enhance food security among smallholder farmers in developing countries.
Stochastic production frontier analysis can be a useful tool when evaluating
the effectiveness of these programs. However, accounting for treatment
endogeneity, often intrinsic to these interventions, only recently has received
any attention in the stochastic frontier literature. In this work, we extend
the classical maximum likelihood estimation of stochastic production frontier
models by allowing both the production frontier and inefficiency to depend on a
potentially endogenous binary treatment. We use instrumental variables to
define an assignment mechanism for the treatment, and we explicitly model the
density of the first and second-stage composite error terms. We provide
empirical evidence of the importance of controlling for endogeneity in this
setting using farm-level data from a soil conservation program in El Salvador.
arXiv link: http://arxiv.org/abs/2312.13939v1
Principal Component Copulas for Capital Modelling and Systemic Risk
(PCCs). This class combines the strong points of copula-based techniques with
principal component analysis (PCA), which results in flexibility when modelling
tail dependence along the most important directions in high-dimensional data.
We obtain theoretical results for PCCs that are important for practical
applications. In particular, we derive tractable expressions for the
high-dimensional copula density, which can be represented in terms of
characteristic functions. We also develop algorithms to perform Maximum
Likelihood and Generalized Method of Moment estimation in high-dimensions and
show very good performance in simulation experiments. Finally, we apply the
copula to the international stock market to study systemic risk. We find that
PCCs lead to excellent performance on measures of systemic risk due to their
ability to distinguish between parallel and orthogonal movements in the global
market, which have a different impact on systemic risk and diversification. As
a result, we consider the PCC promising for capital models, which financial
institutions use to protect themselves against systemic risk.
arXiv link: http://arxiv.org/abs/2312.13195v3
Noisy Measurements Are Important, the Design of Census Products Is Much More Important
data users." This commentary explains why the 2020 Census Noisy Measurement
Files (NMFs) are not the best focus for that plea. The August 2021 letter from
62 prominent researchers asking for production of the direct output of the
differential privacy system deployed for the 2020 Census signaled the
engagement of the scholarly community in the design of decennial census data
products. NMFs, the raw statistics produced by the 2020 Census Disclosure
Avoidance System before any post-processing, are one component of that
design-the query strategy output. The more important component is the query
workload output-the statistics released to the public. Optimizing the query
workload-the Redistricting Data (P.L. 94-171) Summary File, specifically-could
allow the privacy-loss budget to be more effectively managed. There could be
fewer noisy measurements, no post-processing bias, and direct estimates of the
uncertainty from disclosure avoidance for each published statistic.
arXiv link: http://arxiv.org/abs/2312.14191v2
Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances
for two-armed Gaussian bandits. In BAI, given multiple arms, we aim to find the
best arm, an arm with the highest expected reward, through an adaptive
experiment. Kaufmann et al. (2016) develops a lower bound for the probability
of misidentifying the best arm. They also propose a strategy, assuming that the
variances of rewards are known, and show that it is asymptotically optimal in
the sense that its probability of misidentification matches the lower bound as
the budget approaches infinity. However, an asymptotically optimal strategy is
unknown when the variances are unknown. For this open issue, we propose a
strategy that estimates variances during an adaptive experiment and draws arms
with a ratio of the estimated standard deviations. We refer to this strategy as
the Neyman Allocation (NA)-Augmented Inverse Probability weighting (AIPW)
strategy. We then demonstrate that this strategy is asymptotically optimal by
showing that its probability of misidentification matches the lower bound when
the budget approaches infinity, and the gap between the expected rewards of two
arms approaches zero (small-gap regime). Our results suggest that under the
worst-case scenario characterized by the small-gap regime, our strategy, which
employs estimated variance, is asymptotically optimal even when the variances
are unknown.
arXiv link: http://arxiv.org/abs/2312.12741v2
Real-time monitoring with RCA models
WLS residuals for the online detection of changepoints in a Random Coefficient
Autoregressive model, using both the standard CUSUM and the Page-CUSUM process.
We derive the asymptotics under the null of no changepoint for all possible
weighing schemes, including the case of the standardised CUSUM, for which we
derive a Darling-Erdos-type limit theorem; our results guarantee the
procedure-wise size control under both an open-ended and a closed-ended
monitoring. In addition to considering the standard RCA model with no
covariates, we also extend our results to the case of exogenous regressors. Our
results can be applied irrespective of (and with no prior knowledge required as
to) whether the observations are stationary or not, and irrespective of whether
they change into a stationary or nonstationary regime. Hence, our methodology
is particularly suited to detect the onset, or the collapse, of a bubble or an
epidemic. Our simulations show that our procedures, especially when
standardising the CUSUM process, can ensure very good size control and short
detection delays. We complement our theory by studying the online detection of
breaks in epidemiological and housing prices series.
arXiv link: http://arxiv.org/abs/2312.11710v1
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census
Census of Population and Housing can be accurately reconstructed from the
published tabular summaries. Ninety-seven million person records (every
resident in 70% of all census blocks) are exactly reconstructed with provable
certainty using only public information. We further show that a hypothetical
attacker using our methods can reidentify with 95% accuracy population unique
individuals who are perfectly reconstructed and not in the modal race and
ethnicity category in their census block (3.4 million persons)--a result that
is only possible because their confidential records were used in the published
tabulations. Finally, we show that the methods used for the 2020 Census, based
on a differential privacy framework, provide better protection against this
type of attack, with better published data accuracy, than feasible
alternatives.
arXiv link: http://arxiv.org/abs/2312.11283v3
Predicting Financial Literacy via Semi-supervised Learning
income, and understanding digital currencies has been added to the modern
definition. FL can be predicted by exploiting unlabelled recorded data in
financial networks via semi-supervised learning (SSL). Measuring and predicting
FL has not been widely studied, resulting in limited understanding of customer
financial engagement consequences. Previous studies have shown that low FL
increases the risk of social harm. Therefore, it is important to accurately
estimate FL to allocate specific intervention programs to less financially
literate groups. This will not only increase company profitability, but will
also reduce government spending. Some studies considered predicting FL in
classification tasks, whereas others developed FL definitions and impacts. The
current paper investigated mechanisms to learn customer FL level from their
financial data using sampling by synthetic minority over-sampling techniques
for regression with Gaussian noise (SMOGN). We propose the SMOGN-COREG model
for semi-supervised regression, applying SMOGN to deal with unbalanced datasets
and a nonparametric multi-learner co-regression (COREG) algorithm for labeling.
We compared the SMOGN-COREG model with six well-known regressors on five
datasets to evaluate the proposed models effectiveness on unbalanced and
unlabelled financial data. Experimental results confirmed that the proposed
method outperformed the comparator models for unbalanced and unlabelled
financial data. Therefore, SMOGN-COREG is a step towards using unlabelled data
to estimate FL level.
arXiv link: http://arxiv.org/abs/2312.10984v1
Some Finite-Sample Results on the Hausman Test
approach in linear instrumental variable models is a variant of the Hausman
test. Moreover, we find that the test statistics used in these tests can be
numerically ordered, indicating their relative power properties in finite
samples.
arXiv link: http://arxiv.org/abs/2312.10558v1
The Dynamic Triple Gamma Prior as a Shrinkage Process Prior for Time-Varying Parameter Models
assume constant innovation variances across time points, inducing sparsity by
shrinking these variances toward zero. However, this assumption falls short
when states exhibit large jumps or structural changes, as often seen in
empirical time series analysis. To address this, we propose the dynamic triple
gamma prior -- a stochastic process that induces time-dependent shrinkage by
modeling dependence among innovations while retaining a well-known triple gamma
marginal distribution. This framework encompasses various special and limiting
cases, including the horseshoe shrinkage prior, making it highly flexible. We
derive key properties of the dynamic triple gamma that highlight its dynamic
shrinkage behavior and develop an efficient Markov chain Monte Carlo algorithm
for posterior sampling. The proposed approach is evaluated through sparse
covariance modeling and forecasting of the returns of the EURO STOXX 50 index,
demonstrating favorable forecasting performance.
arXiv link: http://arxiv.org/abs/2312.10487v2
Logit-based alternatives to two-stage least squares
as alternatives to the traditionally used 2SLS estimator in the model where
both the endogenous treatment variable and the corresponding instrument are
binary. Our novel estimators are as easy to compute as the 2SLS estimator but
have an advantage over the 2SLS estimator in terms of causal interpretability.
In particular, in certain cases where the probability limits of both our
estimators and the 2SLS estimator take the form of weighted-average treatment
effects, our estimators are guaranteed to yield non-negative weights whereas
the 2SLS estimator is not.
arXiv link: http://arxiv.org/abs/2312.10333v1
Double Machine Learning for Static Panel Models with Fixed Effects
which make use of the predictive power of machine learning algorithms. In this
paper, we develop novel double machine learning (DML) procedures for panel data
in which these algorithms are used to approximate high-dimensional and
nonlinear nuisance functions of the covariates. Our new procedures are
extensions of the well-known correlated random effects, within-group and
first-difference estimators from linear to nonlinear panel models,
specifically, Robinson (1988)'s partially linear regression model with fixed
effects and unspecified nonlinear confounding. Our simulation study assesses
the performance of these procedures using different machine learning
algorithms. We use our procedures to re-estimate the impact of minimum wage on
voting behaviour in the UK. From our results, we recommend the use of
first-differencing because it imposes the fewest constraints on the
distribution of the fixed effects, and an ensemble learning strategy to ensure
optimum estimator accuracy.
arXiv link: http://arxiv.org/abs/2312.08174v5
Individual Updating of Subjective Probability of Homicide Victimization: a "Natural Experiment'' on Risk Communication
victimization risk after an informational shock by developing two econometric
models able to accommodate both optimal decisions of changing prior
expectations which enable us to rationalize skeptical Bayesian agents with
their disregard to new information. We apply our models to a unique household
data (N = 4,030) that consists of socioeconomic and victimization expectation
variables in Brazil, coupled with an informational “natural experiment”
brought by the sample design methodology, which randomized interviewers to
interviewees. The higher priors about their own subjective homicide
victimization risk are set, the more likely individuals are to change their
initial perceptions. In case of an update, we find that elders and females are
more reluctant to change priors and choose the new response level. In addition,
even though the respondents' level of education is not significant, the
interviewers' level of education has a key role in changing and updating
decisions. The results show that our econometric approach fits reasonable well
the available empirical evidence, stressing the salient role heterogeneity
represented by individual characteristics of interviewees and interviewers have
on belief updating and lack of it, say, skepticism. Furthermore, we can
rationalize skeptics through an informational quality/credibility argument.
arXiv link: http://arxiv.org/abs/2312.08171v1
Efficiency of QMLE for dynamic panel data models with interactive effects
in the presence of an increasing number of incidental parameters. We formulate
the dynamic panel as a simultaneous equations system, and derive the efficiency
bound under the normality assumption. We then show that the Gaussian
quasi-maximum likelihood estimator (QMLE) applied to the system achieves the
normality efficiency bound without the normality assumption. Comparison of QMLE
with the fixed effects approach is made.
arXiv link: http://arxiv.org/abs/2312.07881v3
On Rosenbaum's Rank-based Matching Estimator
the distances between component-wise ranks, instead of the original data
values, to measure covariate similarity when constructing matching estimators
of average treatment effects. While the intuitive benefits of using covariate
ranks for matching estimation are apparent, there is no theoretical
understanding of such procedures in the literature. We fill this gap by
demonstrating that Rosenbaum's rank-based matching estimator, when coupled with
a regression adjustment, enjoys the properties of double robustness and
semiparametric efficiency without the need to enforce restrictive covariate
moment assumptions. Our theoretical findings further emphasize the statistical
virtues of employing ranks for estimation and inference, more broadly aligning
with the insights put forth by Peter Bickel in his 2004 Rietz lecture (Bickel,
2004).
arXiv link: http://arxiv.org/abs/2312.07683v2
Estimating Counterfactual Matrix Means with Short Panel Data
counterfactual outcomes under a low-rank factor model with short panel data and
general outcome missingness patterns. Applications include event studies and
studies of outcomes of "matches" between agents of two types, e.g. workers and
firms, typically conducted under less-flexible Two-Way-Fixed-Effects (TWFE)
models of outcomes. Given an infinite population of units and a finite number
of outcomes, we show our approach identifies all counterfactual outcome means,
including those not estimable by existing methods, if a particular graph
constructed based on overlaps in observed outcomes between subpopulations is
connected. Our analogous, computationally efficient estimation procedure yields
consistent, asymptotically normal estimates of counterfactual outcome means
under fixed-$T$ (number of outcomes), large-$N$ (sample size) asymptotics. In a
semi-synthetic simulation study based on matched employer-employee data, our
estimator has lower bias and only slightly higher variance than a
TWFE-model-based estimator when estimating average log-wages.
arXiv link: http://arxiv.org/abs/2312.07520v2
Structural Analysis of Vector Autoregressive Models
Vector Autoregressive models for the teaching of a course on Applied
Macroeconometrics with Advanced Topics.
arXiv link: http://arxiv.org/abs/2312.06402v9
Trends in Temperature Data: Micro-foundations of Their Nature
of order 1, I(1), or is a stationary process around a trend function is crucial
for detection, attribution, impact and forecasting studies of climate change.
In this paper, we investigate the nature of trends in GAT building on the
analysis of individual temperature grids. Our 'micro-founded' evidence suggests
that GAT is stationary around a non-linear deterministic trend in the form of a
linear function with a one-period structural break. This break can be
attributed to a combination of individual grid breaks and the standard
aggregation method under acceleration in global warming. We illustrate our
findings using simulations.
arXiv link: http://arxiv.org/abs/2312.06379v1
Fused Extended Two-Way Fixed Effects for Difference-in-Differences With Staggered Adoptions
difference-in-differences under staggered adoptions, Wooldridge (2021) proposed
the extended two-way fixed effects estimator, which adds many parameters.
However, this reduces efficiency. Restricting some of these parameters to be
equal (for example, subsequent treatment effects within a cohort) helps, but ad
hoc restrictions may reintroduce bias. We propose a machine learning estimator
with a single tuning parameter, fused extended two-way fixed effects (FETWFE),
that enables automatic data-driven selection of these restrictions. We prove
that under an appropriate sparsity assumption FETWFE identifies the correct
restrictions with probability tending to one, which improves efficiency. We
also prove the consistency, oracle property, and asymptotic normality of FETWFE
for several classes of heterogeneous marginal treatment effect estimators under
either conditional or marginal parallel trends, and we prove the same results
for conditional average treatment effects under conditional parallel trends. We
provide an R package implementing fused extended two-way fixed effects, and we
demonstrate FETWFE in simulation studies and an empirical application.
arXiv link: http://arxiv.org/abs/2312.05985v4
Dynamic Spatiotemporal ARCH Models: Small and Large Sample Results
conditional heteroscedasticity (ARCH) model. The log-volatility term in this
model can depend on (i) the spatial lag of the log-squared outcome variable,
(ii) the time-lag of the log-squared outcome variable, (iii) the spatiotemporal
lag of the log-squared outcome variable, (iv) exogenous variables, and (v) the
unobserved heterogeneity across regions and time, i.e., the regional and time
fixed effects. We examine the small and large sample properties of two
quasi-maximum likelihood estimators and a generalized method of moments
estimator for this model. We first summarize the theoretical properties of
these estimators and then compare their finite sample properties through Monte
Carlo simulations.
arXiv link: http://arxiv.org/abs/2312.05898v1
Causal inference and policy evaluation without a control group
causal effects cannot be applied. To fill this gap, we propose the Machine
Learning Control Method, a new approach for causal panel analysis that
estimates causal parameters without relying on untreated units. We formalize
identification within the potential outcomes framework and then provide
estimation based on machine learning algorithms. To illustrate the practical
relevance of our method, we present simulation evidence, a replication study,
and an empirical application on the impact of the COVID-19 crisis on
educational inequality. We implement the proposed approach in the companion R
package MachineControl
arXiv link: http://arxiv.org/abs/2312.05858v2
Influence Analysis with Panel Data
variables (i.e., vertical outliers, leveraged data) has the potential to
severely bias regression coefficients and/or standard errors. This is common
with short panel data because the researcher cannot advocate asymptotic theory.
Example include cross-country studies, cell-group analyses, and field or
laboratory experimental studies, where the researcher is forced to use few
cross-sectional observations repeated over time due to the structure of the
data or research design. Available diagnostic tools may fail to properly detect
these anomalies, because they are not designed for panel data. In this paper,
we formalise statistical measures for panel data models with fixed effects to
quantify the degree of leverage and outlyingness of units, and the joint and
conditional influences of pairs of units. We first develop a method to visually
detect anomalous units in a panel data set, and identify their type. Second, we
investigate the effect of these units on LS estimates, and on other units'
influence on the estimated parameters. To illustrate and validate the proposed
method, we use a synthetic data set contaminated with different types of
anomalous units. We also provide an empirical example.
arXiv link: http://arxiv.org/abs/2312.05700v1
Economic Forecasts Using Many Noises
truly lack predictive power? Economists typically conduct variable selection to
eliminate noises from predictors. Yet, we prove a compelling result that in
most economic forecasts, the inclusion of noises in predictions yields greater
benefits than its exclusion. Furthermore, if the total number of predictors is
not sufficiently large, intentionally adding more noises yields superior
forecast performance, outperforming benchmark predictors relying on dimension
reduction. The intuition lies in economic predictive signals being densely
distributed among regression coefficients, maintaining modest forecast bias
while diversifying away overall variance, even when a significant proportion of
predictors constitute pure noises. One of our empirical demonstrations shows
that intentionally adding 300 6,000 pure noises to the Welch and Goyal (2008)
dataset achieves a noteworthy 10% out-of-sample R square accuracy in
forecasting the annual U.S. equity premium. The performance surpasses the
majority of sophisticated machine learning models.
arXiv link: http://arxiv.org/abs/2312.05593v2
GCov-Based Portmanteau Test
residuals of dynamic models based on portmanteau statistics involving nonlinear
autocovariances. A new test with an asymptotic $\chi^2$ distribution is
introduced for testing nonlinear serial dependence (NLSD) in time series. This
test is inspired by the Generalized Covariance (GCov) residual-based
specification test, recently proposed as a diagnostic tool for semi-parametric
dynamic models with i.i.d. non-Gaussian errors. It has a $\chi^2$ distribution
when the model is correctly specified and estimated by the GCov estimator. We
derive new asymptotic results under local alternatives for testing hypotheses
on the parameters of a semi-parametric model. We extend it by introducing a
GCov bootstrap test for residual diagnostics,black which is also
available for models estimated by a different method, such as the maximum
likelihood estimator under a parametric assumption on the error distribution.
black A simulation study shows that the tests perform well in
applications to mixed causal-noncausal autoregressive models. The GCov
specification test is used to assess the fit of a mixed causal-noncausal model
of aluminum prices with locally explosive patterns, i.e. bubbles and spikes
between 2005 and 2024.
arXiv link: http://arxiv.org/abs/2312.05373v2
Occasionally Misspecified
turn out to be heavily misspecified for some observations. This can happen
because of unmodelled idiosyncratic events, such as an abrupt but short-lived
change in policy. These outliers can significantly alter estimates and
inferences. A robust estimation is desirable to limit their influence. For
skewed data, this induces another bias which can also invalidate the estimation
and inferences. This paper proposes a robust GMM estimator with a simple bias
correction that does not degrade robustness significantly. The paper provides
finite-sample robustness bounds, and asymptotic uniform equivalence with an
oracle that discards all outliers. Consistency and asymptotic normality ensue
from that result. An application to the "Price-Puzzle," which finds inflation
increases when monetary policy tightens, illustrates the concerns and the
method. The proposed estimator finds the intuitive result: tighter monetary
policy leads to a decline in inflation.
arXiv link: http://arxiv.org/abs/2312.05342v1
Probabilistic Scenario-Based Assessment of National Food Security Risks with Application to Egypt and Ethiopia
national level, employing a probabilistic scenario-based framework that
integrates both Shared Socioeconomic Pathways (SSP) and Representative
Concentration Pathways (RCP). This innovative method allows each scenario,
encompassing socio-economic and climate factors, to be treated as a model
capable of generating diverse trajectories. This approach offers a more dynamic
understanding of food security risks under varying future conditions. The paper
details the methodologies employed, showcasing their applicability through a
focused analysis of food security challenges in Egypt and Ethiopia, and
underscores the importance of considering a spectrum of socio-economic and
climatic factors in national food security assessments.
arXiv link: http://arxiv.org/abs/2312.04428v2
Alternative models for FX, arbitrage opportunities and efficient pricing of double barrier options in Lévy models
no-touch options in the Heston model and pure jump KoBoL model calibrated to
the same set of the empirical data, and discuss the potential for arbitrage
opportunities if the correct model is a pure jump model. We explain and
demonstrate with numerical examples that accurate and fast calculations of
prices of double barrier options in jump models are extremely difficult using
the numerical methods available in the literature. We develop a new efficient
method (GWR-SINH method) based of the Gaver-Wynn-Rho acceleration applied to
the Bromwich integral; the SINH-acceleration and simplified trapezoid rule are
used to evaluate perpetual double barrier options for each value of the
spectral parameter in GWR-algorithm. The program in Matlab running on a Mac
with moderate characteristics achieves the precision of the order of E-5 and
better in several several dozen of milliseconds; the precision E-07 is
achievable in about 0.1 sec. We outline the extension of GWR-SINH method to
regime-switching models and models with stochastic parameters and stochastic
interest rates.
arXiv link: http://arxiv.org/abs/2312.03915v1
A Theory Guide to Using Control Functions to Instrument Hazard Models
models, allowing the inclusion of endogenous (e.g., mismeasured) regressors.
Simple discrete-data hazard models can be expressed as binary choice panel data
models, and the widespread Prentice and Gloeckler (1978) discrete-data
proportional hazards model can specifically be expressed as a complementary
log-log model with time fixed effects. This allows me to recast it as GMM
estimation and its instrumented version as sequential GMM estimation in a
Z-estimation (non-classical GMM) framework; this framework can then be
leveraged to establish asymptotic properties and sufficient conditions. Whilst
this paper focuses on the Prentice and Gloeckler (1978) model, the methods and
discussion developed here can be applied more generally to other hazard models
and binary choice models. I also introduce my Stata command for estimating a
complementary log-log model instrumented via control functions (available as
ivcloglog on SSC), which allows practitioners to easily instrument the Prentice
and Gloeckler (1978) model.
arXiv link: http://arxiv.org/abs/2312.03165v1
Almost Dominance: Inference and Application
almost dominances: almost Lorenz dominance, almost inverse stochastic
dominance, and almost stochastic dominance. We first generalize almost Lorenz
dominance to almost upward and downward Lorenz dominances. We then provide a
bootstrap inference procedure for the Lorenz dominance coefficients, which
measure the degrees of almost Lorenz dominance. Furthermore, we propose almost
upward and downward inverse stochastic dominances and provide inference on the
inverse stochastic dominance coefficients. We also show that our results can
easily be extended to almost stochastic dominance. Simulation studies
demonstrate the finite sample properties of the proposed estimators and the
bootstrap confidence intervals. This framework can be applied to economic
analysis, particularly in the areas of social welfare, inequality, and decision
making under uncertainty. As an empirical example, we apply the methods to the
inequality growth in the United Kingdom and find evidence for almost upward
inverse stochastic dominance.
arXiv link: http://arxiv.org/abs/2312.02288v2
Bayesian Nonlinear Regression using Sums of Simple Functions
to large datasets arising in macroeconomics. Our framework sums over many
simple two-component location mixtures. The transition between components is
determined by a logistic function that depends on a single threshold variable
and two hyperparameters. Each of these individual models only accounts for a
minor portion of the variation in the endogenous variables. But many of them
are capable of capturing arbitrary nonlinear conditional mean relations.
Conjugate priors enable fast and efficient inference. In simulations, we show
that our approach produces accurate point and density forecasts. In a real-data
exercise, we forecast US macroeconomic aggregates and consider the nonlinear
effects of financial shocks in a large-scale nonlinear VAR.
arXiv link: http://arxiv.org/abs/2312.01881v1
A Method of Moments Approach to Asymptotically Unbiased Synthetic Controls
outcome variable and covariates in pre-treatment time periods, but it has been
shown by Ferman and Pinto (2019) that this approach does not provide asymptotic
unbiasedness when the fit is imperfect and the number of controls is fixed.
Many related panel methods have a similar limitation when the number of units
is fixed. I introduce and evaluate a new method in which the Synthetic Control
is constructed using a General Method of Moments approach where units not being
included in the Synthetic Control are used as instruments. I show that a
Synthetic Control Estimator of this form will be asymptotically unbiased as the
number of pre-treatment time periods goes to infinity, even when pre-treatment
fit is imperfect and the number of units is fixed. Furthermore, if both the
number of pre-treatment and post-treatment time periods go to infinity, then
averages of treatment effects can be consistently estimated. I conduct
simulations and an empirical application to compare the performance of this
method with existing approaches in the literature.
arXiv link: http://arxiv.org/abs/2312.01209v2
Inference on many jumps in nonparametric panel regression models
regression contexts, with a particular focus on panel data where data
generation processes vary across units, and error terms may display complex
dependency structures. In our setting the threshold effect depends on one
specific covariate, and we permit the true nonparametric regression to vary
based on additional (latent) variables. We propose two uniform testing
procedures: one to assess the existence of change-points and another to
evaluate the uniformity of such effects across units. Our approach involves
deriving a straightforward analytical expression to approximate the
variance-covariance structure of change-point effects under general dependency
conditions. Notably, when Gaussian approximations are made to these test
statistics, the intricate dependency structures within the data can be safely
disregarded owing to the localized nature of the statistics. This finding bears
significant implications for obtaining critical values. Through extensive
simulations, we demonstrate that our tests exhibit excellent control over size
and reasonable power performance in finite samples, irrespective of strong
cross-sectional and weak serial dependency within the data. Furthermore,
applying our tests to two datasets reveals the existence of significant
nonsmooth effects in both cases.
arXiv link: http://arxiv.org/abs/2312.01162v3
Identification and Inference for Synthetic Controls with Confounding
unobserved confounding. We model outcome variables through a factor model with
random factors and loadings. Such factors and loadings may act as unobserved
confounders: when the treatment is implemented depends on time-varying factors,
and who receives the treatment depends on unit-level confounders. We study the
identification of treatment effects and illustrate the presence of a trade-off
between time and unit-level confounding. We provide asymptotic results for
inference for several Synthetic Control estimators and show that different
sources of randomness should be considered for inference, depending on the
nature of confounding. We conclude with a comparison of Synthetic Control
estimators with alternatives for factor models.
arXiv link: http://arxiv.org/abs/2312.00955v1
Inference on common trends in functional time series
series in a Hilbert space. We develop statistical inference on the number of
common stochastic trends embedded in the time series, i.e., the dimension of
the nonstationary subspace. We also consider tests of hypotheses on the
nonstationary and stationary subspaces themselves. The Hilbert space can be of
an arbitrarily large dimension, and our methods remain asymptotically valid
even when the time series of interest takes values in a subspace of possibly
unknown dimension. This has wide applicability in practice; for example, to the
case of cointegrated vector time series that are either high-dimensional or of
finite dimension, to high-dimensional factor model that includes a finite
number of nonstationary factors, to cointegrated curve-valued (or
function-valued) time series, and to nonstationary dynamic functional factor
models. We include two empirical illustrations to the term structure of
interest rates and labor market indices, respectively.
arXiv link: http://arxiv.org/abs/2312.00590v4
GMM-lev estimation and individual heterogeneity: Monte Carlo evidence and empirical applications
effects (CRE) approach within the generalised method of moments (GMM),
specifically applied to level equations, GMM-lev. It has the advantage of
estimating the effect of measurable time-invariant covariates using all
available information. This is not possible with GMM-dif, applied to the
equations of each period transformed into first differences, while GMM-sys uses
little information as it adds the equation in levels for only one period. The
GMM-lev, by implying a two-component error term containing individual
heterogeneity and shock, exposes the explanatory variables to possible double
endogeneity. For example, the estimation of actual persistence could suffer
from bias if instruments were correlated with the unit-specific error
component. The CRE-GMM deals with double endogeneity, captures initial
conditions and enhance inference. Monte Carlo simulations for different panel
types and under different double endogeneity assumptions show the advantage of
our approach. The empirical applications on production and R&D contribute to
clarify the advantages of using CRE-GMM.
arXiv link: http://arxiv.org/abs/2312.00399v2
Stochastic volatility models with skewness selection
time-varying skewness without imposing it. While dynamic asymmetry may capture
the likely direction of future asset returns, it comes at the risk of leading
to overparameterization. Our proposed approach mitigates this concern by
leveraging sparsity-inducing priors to automatically selects the skewness
parameter as being dynamic, static or zero in a data-driven framework. We
consider two empirical applications. First, in a bond yield application,
dynamic skewness captures interest rate cycles of monetary easing and
tightening being partially explained by central banks' mandates. In an currency
modeling framework, our model indicates no skewness in the carry factor after
accounting for stochastic volatility which supports the idea of carry crashes
being the result of volatility surges instead of dynamic skewness.
arXiv link: http://arxiv.org/abs/2312.00282v1
Bootstrap Inference on Partially Linear Binary Choice Model
structural equations where nonlinearity may appear due to diminishing marginal
returns, different life cycle regimes, or hectic physical phenomena. The
inference procedure for this model based on the analytic asymptotic
approximation could be unreliable in finite samples if the sample size is not
sufficiently large. This paper proposes a bootstrap inference approach for the
model. Monte Carlo simulations show that the proposed inference method performs
well in finite samples compared to the procedure based on the asymptotic
approximation.
arXiv link: http://arxiv.org/abs/2311.18759v1
Identification in Endogenous Sequential Treatment Regimes
effects in settings where individuals self-select into treatment sequences. I
propose an identification strategy which relies on a dynamic version of
standard Instrumental Variables (IV) assumptions and builds on a dynamic
version of the Marginal Treatment Effects (MTE) as the fundamental building
block for treatment effects. The main contribution of the paper is to relax
assumptions on the support of the observed variables and on unobservable gains
of treatment that are present in the dynamic treatment effects literature.
Monte Carlo simulation studies illustrate the desirable finite-sample
performance of a sieve estimator for MTEs and Average Treatment Effects (ATEs)
on a close-to-application simulation study.
arXiv link: http://arxiv.org/abs/2311.18555v1
Extrapolating Away from the Cutoff in Regression Discontinuity Designs
at the cutoff under mild continuity assumptions, but they fail to identify
treatment effects away from the cutoff without additional assumptions. The
fundamental challenge of identifying treatment effects away from the cutoff is
that the counterfactual outcome under the alternative treatment status is never
observed. This paper aims to provide a methodological blueprint to identify
treatment effects away from the cutoff in various empirical settings by
offering a non-exhaustive list of assumptions on the counterfactual outcome.
Instead of assuming the exact evolution of the counterfactual outcome, this
paper bounds its variation using the data and sensitivity parameters. The
proposed assumptions are weaker than those introduced previously in the
literature, resulting in partially identified treatment effects that are less
susceptible to assumption violations. This approach accommodates both single
cutoff and multi-cutoff designs. The specific choice of the extrapolation
assumption depends on the institutional background of each empirical
application. Additionally, researchers are recommended to conduct sensitivity
analysis on the chosen parameter and assess resulting shifts in conclusions.
The paper compares the proposed identification results with results using
previous methods via an empirical application and simulated data. It
demonstrates that set identification yields a more credible conclusion about
the sign of the treatment effect.
arXiv link: http://arxiv.org/abs/2311.18136v1
On the Limits of Regression Adjustment
Pre-Experiment Data (CUPED), is an important technique in internet
experimentation. It decreases the variance of effect size estimates, often
cutting confidence interval widths in half or more while never making them
worse. It does so by carefully regressing the goal metric against
pre-experiment features to reduce the variance. The tremendous gains of
regression adjustment begs the question: How much better can we do by
engineering better features from pre-experiment data, for example by using
machine learning techniques or synthetic controls? Could we even reduce the
variance in our effect sizes arbitrarily close to zero with the right
predictors? Unfortunately, our answer is negative. A simple form of regression
adjustment, which uses just the pre-experiment values of the goal metric,
captures most of the benefit. Specifically, under a mild assumption that
observations closer in time are easier to predict that ones further away in
time, we upper bound the potential gains of more sophisticated feature
engineering, with respect to the gains of this simple form of regression
adjustment. The maximum reduction in variance is $50%$ in Theorem 1, or
equivalently, the confidence interval width can be reduced by at most an
additional $29%$.
arXiv link: http://arxiv.org/abs/2311.17858v1
Identifying Causal Effects of Discrete, Ordered and ContinuousTreatments using Multiple Instrumental Variables
due to endogeneity. This paper provides new identification results for causal
effects of discrete, ordered and continuous treatments using multiple binary
instruments. The key contribution is the identification of a new causal
parameter that has a straightforward interpretation with a positive weighting
scheme and is applicable in many settings due to a mild monotonicity
assumption. This paper further leverages recent advances in causal machine
learning for both estimation and the detection of local violations of the
underlying monotonicity assumption. The methodology is applied to estimate the
returns to education and assess the impact of having an additional child on
female labor market outcomes.
arXiv link: http://arxiv.org/abs/2311.17575v3
Optimal Categorical Instrumental Variables
settings with potentially few observations per category. The proposed
categorical instrumental variable estimator (CIV) leverages a regularization
assumption that implies existence of a latent categorical variable with fixed
finite support achieving the same first stage fit as the observed instrument.
In asymptotic regimes that allow the number of observations per category to
grow at arbitrary small polynomial rate with the sample size, I show that when
the cardinality of the support of the optimal instrument is known, CIV is
root-n asymptotically normal, achieves the same asymptotic variance as the
oracle IV estimator that presumes knowledge of the optimal instrument, and is
semiparametrically efficient under homoskedasticity. Under-specifying the
number of support points reduces efficiency but maintains asymptotic normality.
In an application that leverages judge fixed effects as instruments, CIV
compares favorably to commonly used jackknife-based instrumental variable
estimators.
arXiv link: http://arxiv.org/abs/2311.17021v2
On the adaptation of causal forests to manifold data
world's ills" (Bickel, 2010). But how exactly do they achieve this? Focused on
the recently introduced causal forests (Athey and Imbens, 2016; Wager and
Athey, 2018), this manuscript aims to contribute to an ongoing research trend
towards answering this question, proving that causal forests can adapt to the
unknown covariate manifold structure. In particular, our analysis shows that a
causal forest estimator can achieve the optimal rate of convergence for
estimating the conditional average treatment effect, with the covariate
dimension automatically replaced by the manifold dimension. These findings
align with analogous observations in the realm of deep learning and resonate
with the insights presented in Peter Bickel's 2004 Rietz lecture.
arXiv link: http://arxiv.org/abs/2311.16486v2
Inference for Low-rank Models without Estimating the Rank
low-rank matrices. While most existing inference methods would require
consistent estimation of the true rank, our procedure is robust to rank
misspecification, making it a promising approach in applications where rank
estimation can be unreliable. We estimate the low-rank spaces using
pre-specified weighting matrices, known as diversified projections. A novel
statistical insight is that, unlike the usual statistical wisdom that
overfitting mainly introduces additional variances, the over-estimated low-rank
space also gives rise to a non-negligible bias due to an implicit ridge-type
regularization. We develop a new inference procedure and show that the central
limit theorem holds as long as the pre-specified rank is no smaller than the
true rank. In one of our applications, we study multiple testing with
incomplete data in the presence of confounding factors and show that our method
remains valid as long as the number of controlled confounding factors is at
least as large as the true number, even when no confounding factors are
present.
arXiv link: http://arxiv.org/abs/2311.16440v2
From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks
forecasting through a novel neural network architecture with dedicated mean and
variance hemispheres. Our architecture features several key ingredients making
MLE work in this context. First, the hemispheres share a common core at the
entrance of the network which accommodates for various forms of time variation
in the error variance. Second, we introduce a volatility emphasis constraint
that breaks mean/variance indeterminacy in this class of overparametrized
nonlinear models. Third, we conduct a blocked out-of-bag reality check to curb
overfitting in both conditional moments. Fourth, the algorithm utilizes
standard deep learning software and thus handles large data sets - both
computationally and statistically. Ergo, our Hemisphere Neural Network (HNN)
provides proactive volatility forecasts based on leading indicators when it
can, and reactive volatility based on the magnitude of previous prediction
errors when it must. We evaluate point and density forecasts with an extensive
out-of-sample experiment and benchmark against a suite of models ranging from
classics to more modern machine learning-based offerings. In all cases, HNN
fares well by consistently providing accurate mean/variance forecasts for all
targets and horizons. Studying the resulting volatility paths reveals its
versatility, while probabilistic forecasting evaluation metrics showcase its
enviable reliability. Finally, we also demonstrate how this machinery can be
merged with other structured deep learning models by revisiting Goulet Coulombe
(2022)'s Neural Phillips Curve.
arXiv link: http://arxiv.org/abs/2311.16333v2
Using Multiple Outcomes to Improve the Synthetic Control Method
analyses typically proceed by estimating separate weights for each outcome. In
this paper, we instead propose estimating a common set of weights across
outcomes, by balancing either a vector of all outcomes or an index or average
of them. Under a low-rank factor model, we show that these approaches lead to
lower bias bounds than separate weights, and that averaging leads to further
gains when the number of outcomes grows. We illustrate this via a re-analysis
of the impact of the Flint water crisis on educational outcomes.
arXiv link: http://arxiv.org/abs/2311.16260v3
Robust Conditional Wald Inference for Over-Identified IV
commonly report the 2SLS estimate along with the robust standard error and seek
to conduct inference with these quantities. If errors are homoskedastic, one
can control the degree of inferential distortion using the first-stage F
critical values from Stock and Yogo (2005), or use the robust-to-weak
instruments Conditional Wald critical values of Moreira (2003). If errors are
non-homoskedastic, these methods do not apply. We derive the generalization of
Conditional Wald critical values that is robust to non-homoskedastic errors
(e.g., heteroskedasticity or clustered variance structures), which can also be
applied to nonlinear weakly-identified models (e.g. weakly-identified GMM).
arXiv link: http://arxiv.org/abs/2311.15952v1
Valid Wald Inference with Many Weak Instruments
an environment with many weak instrumental variables (MWIV). It is observed
that the t statistic of the jackknife instrumental variable estimator (JIVE)
has an asymptotic distribution that is identical to the two-stage-least squares
(TSLS) t statistic in the just-identified environment. Consequently, test
procedures that were valid for TSLS t are also valid for the JIVE t. Two such
procedures, i.e., VtF and conditional Wald, are adapted directly. By exploiting
a feature of MWIV environments, a third, more powerful, one-sided VtF-based
test procedure can be obtained.
arXiv link: http://arxiv.org/abs/2311.15932v1
Policy Learning with Distributional Welfare
distributional welfare. Most literature on treatment choice has considered
utilitarian welfare based on the conditional average treatment effect (ATE).
While average welfare is intuitive, it may yield undesirable allocations
especially when individuals are heterogeneous (e.g., with outliers) - the very
reason individualized treatments were introduced in the first place. This
observation motivates us to propose an optimal policy that allocates the
treatment based on the conditional quantile of individual treatment effects
(QoTE). Depending on the choice of the quantile probability, this criterion can
accommodate a policymaker who is either prudent or negligent. The challenge of
identifying the QoTE lies in its requirement for knowledge of the joint
distribution of the counterfactual outcomes, which is not generally
point-identified. We introduce minimax policies that are robust to this model
uncertainty. A range of identifying assumptions can be used to yield more
informative policies. For both stochastic and deterministic policies, we
establish the asymptotic bound on the regret of implementing the proposed
policies. The framework can be generalized to any setting where welfare is
defined as a functional of the joint distribution of the potential outcomes.
arXiv link: http://arxiv.org/abs/2311.15878v4
On Quantile Treatment Effects, Rank Similarity, and Variation of Instrumental Variables
counterfactual distributions serves as an identifying condition for treatment
effects when the treatment is endogenous, and shows that this condition holds
in a range of nonparametric models for treatment effects. To this end, we first
provide a novel characterization of the prevalent assumption restricting
treatment heterogeneity in the literature, namely rank similarity. Our
characterization demonstrates the stringency of this assumption and allows us
to relax it in an economically meaningful way, resulting in our identifying
condition. It also justifies the quest of richer exogenous variations in the
data (e.g., multi-valued or multiple instrumental variables) in exchange for
weaker identifying conditions. The primary goal of this investigation is to
provide empirical researchers with tools that are robust and easy to implement
but still yield tight policy evaluations.
arXiv link: http://arxiv.org/abs/2311.15871v1
(Frisch-Waugh-Lovell)': On the Estimation of Regression Models by Row
independently in a row-wise fashion. We document a simple procedure which
allows for a wide class of econometric estimators to be implemented
cumulatively, where, in the limit, estimators can be produced without ever
storing more than a single line of data in a computer's memory. This result is
useful in understanding the mechanics of many common regression models. These
procedures can be used to speed up the computation of estimates computed via
OLS, IV, Ridge regression, LASSO, Elastic Net, and Non-linear models including
probit and logit, with all common modes of inference. This has implications for
estimation and inference with `big data', where memory constraints may imply
that working with all data at once is particularly costly. We additionally show
that even with moderately sized datasets, this method can reduce computation
time compared with traditional estimation routines.
arXiv link: http://arxiv.org/abs/2311.15829v1
Causal Models for Longitudinal and Panel Data: A Survey
recent literature has focused on credibly estimating causal effects of binary
interventions in settings with longitudinal data, emphasizing practical advice
for empirical researchers. It pays particular attention to heterogeneity in the
causal effects, often in situations where few units are treated and with
particular structures on the assignment pattern. The literature has extended
earlier work on difference-in-differences or two-way-fixed-effect estimators.
It has more generally incorporated factor models or interactive fixed effects.
It has also developed novel methods using synthetic control approaches.
arXiv link: http://arxiv.org/abs/2311.15458v3
An Identification and Dimensionality Robust Test for Instrumental Variables Models
identification-robust test for the structural parameter in a heteroskedastic
instrumental variables model. While my analysis allows the number of
instruments to be much larger than the sample size, it does not require many
instruments, making my test applicable in settings that have not been well
studied. Instead, the proposed test statistic has a limiting chi-squared
distribution so long as an auxiliary parameter can be consistently estimated.
This is possible using machine learning methods even when the number of
instruments is much larger than the sample size. To improve power, a simple
combination with the sup-score statistic of Belloni et al. (2012) is proposed.
I point out that first-stage F-statistics calculated on LASSO selected
variables may be misleading indicators of identification strength and
demonstrate favorable performance of my proposed methods in both empirical data
and simulation study.
arXiv link: http://arxiv.org/abs/2311.14892v2
A Review of Cross-Sectional Matrix Exponential Spatial Models
conventional spatial autoregressive model in spatial econometrics but offer
analytical, computational, and interpretive advantages. This paper provides a
comprehensive review of the literature on the estimation, inference, and model
selection approaches for the cross-sectional matrix exponential spatial models.
We discuss summary measures for the marginal effects of regressors and detail
the matrix-vector product method for efficient estimation. Our aim is not only
to summarize the main findings from the spatial econometric literature but also
to make them more accessible to applied researchers. Additionally, we
contribute to the literature by introducing some new results. We propose an
M-estimation approach for models with heteroskedastic error terms and
demonstrate that the resulting M-estimator is consistent and has an asymptotic
normal distribution. We also consider some new results for model selection
exercises. In a Monte Carlo study, we examine the finite sample properties of
various estimators from the literature alongside the M-estimator.
arXiv link: http://arxiv.org/abs/2311.14813v1
Reproducible Aggregation of Sample-Split Statistics
simplification comes at the cost of the introduction of randomness not native
to the data. We propose a simple procedure for sequentially aggregating
statistics constructed with multiple splits of the same sample. The user
specifies a bound and a nominal error rate. If the procedure is implemented
twice on the same data, the nominal error rate approximates the chance that the
results differ by more than the bound. We illustrate the application of the
procedure to several widely applied econometric methods.
arXiv link: http://arxiv.org/abs/2311.14204v3
Measurement Error and Counterfactuals in Quantitative Trade and Spatial Models
current state of the world and the model parameters. Common practice treats the
current state of the world as perfectly observed, but there is good reason to
believe that it is measured with error. This paper provides tools for
quantifying uncertainty about counterfactuals when the current state of the
world is measured with error. I recommend an empirical Bayes approach to
uncertainty quantification, and show that it is both practical and
theoretically justified. I apply the proposed method to the settings in Adao,
Costinot, and Donaldson (2017) and Allen and Arkolakis (2022) and find
non-trivial uncertainty about counterfactuals.
arXiv link: http://arxiv.org/abs/2311.14032v4
Was Javert right to be suspicious? Marginal Treatment Effects with Duration Outcomes
functions when the outcome is right-censored. Our method requires a
conditionally exogenous instrument and random censoring. We propose
asymptotically consistent semi-parametric estimators and valid inferential
procedures for the target functions. To illustrate, we evaluate the effect of
alternative sentences (fines and community service vs. no punishment) on
recidivism in Brazil. Our results highlight substantial treatment effect
heterogeneity: we find that people whom most judges would punish take longer to
recidivate, while people who would be punished only by strict judges recidivate
at an earlier date than if they were not punished.
arXiv link: http://arxiv.org/abs/2311.13969v5
Large-Sample Properties of the Synthetic Control Method under Selection on Unobservables
units. We assume the treatment assignment is based on unobserved heterogeneity
and pre-treatment information, allowing for both strictly and sequentially
exogenous assignment processes. We show that the critical property that
determines the behavior of the SC method is the ability of input features to
approximate the unobserved heterogeneity. Our results imply that the SC method
delivers asymptotically normal estimators for a large class of linear panel
data models as long as the number of pre-treatment periods is sufficiently
large, making it a natural alternative to the Difference-in-Differences.
arXiv link: http://arxiv.org/abs/2311.13575v2
Regressions under Adverse Conditions
variable to covariates, under the "adverse condition" that a distress variable
falls in its tail. This allows to tailor classical mean regressions to adverse
scenarios, which receive increasing interest in economics and finance, among
many others. In the terminology of the systemic risk literature, our method can
be interpreted as a regression for the Marginal Expected Shortfall. We propose
a two-step procedure to estimate the new models, show consistency and
asymptotic normality of the estimator, and propose feasible inference under
weak conditions that allow for cross-sectional and time series applications.
Simulations verify the accuracy of the asymptotic approximations of the
two-step estimator. Two empirical applications show that our regressions under
adverse conditions are a valuable tool in such diverse fields as the study of
the relation between systemic risk and asset price bubbles, and dissecting
macroeconomic growth vulnerabilities into individual components.
arXiv link: http://arxiv.org/abs/2311.13327v3
Predictive Density Combination Using a Tree-Based Synthesis Function
predictive distributions based on agent/expert opinion analysis theory and
encompasses a range of existing density forecast pooling methods. The key
ingredient in BPS is a “synthesis” function. This is typically specified
parametrically as a dynamic linear regression. In this paper, we develop a
nonparametric treatment of the synthesis function using regression trees. We
show the advantages of our tree-based approach in two macroeconomic forecasting
applications. The first uses density forecasts for GDP growth from the euro
area's Survey of Professional Forecasters. The second combines density
forecasts of US inflation produced by many regression models involving
different predictors. Both applications demonstrate the benefits -- in terms of
improved forecast accuracy and interpretability -- of modeling the synthesis
function nonparametrically.
arXiv link: http://arxiv.org/abs/2311.12671v1
Learning Causal Representations from General Environments: Identifiability and Intrinsic Ambiguity
latent variables and their causal relationships in the form of a causal graph
from low-level observed data (such as text and images), assuming access to
observations generated from multiple environments. Prior results on the
identifiability of causal representations typically assume access to
single-node interventions which is rather unrealistic in practice, since the
latent variables are unknown in the first place. In this work, we provide the
first identifiability results based on data that stem from general
environments. We show that for linear causal models, while the causal graph can
be fully recovered, the latent variables are only identified up to the
surrounded-node ambiguity (SNA) varici2023score. We provide a
counterpart of our guarantee, showing that SNA is basically unavoidable in our
setting. We also propose an algorithm, LiNGCReL which provably
recovers the ground-truth model up to SNA, and we demonstrate its effectiveness
via numerical experiments. Finally, we consider general non-parametric causal
models and show that the same identification barrier holds when assuming access
to groups of soft single-node interventions.
arXiv link: http://arxiv.org/abs/2311.12267v2
Adaptive Bayesian Learning with Action and State-Dependent Signal Variance
incorporating action and state-dependent signal variances into decision-making
models. This framework is pivotal in understanding complex data-feedback loops
and decision-making processes in various economic systems. Through a series of
examples, we demonstrate the versatility of this approach in different
contexts, ranging from simple Bayesian updating in stable environments to
complex models involving social learning and state-dependent uncertainties. The
paper uniquely contributes to the understanding of the nuanced interplay
between data, actions, outcomes, and the inherent uncertainty in economic
models.
arXiv link: http://arxiv.org/abs/2311.12878v2
Theory coherent shrinkage of Time-Varying Parameters in VARs
Time-Varying Parameter VARs (TVP-VARs). The prior centers the time-varying
parameters on a path implied a priori by an underlying economic theory, chosen
to describe the dynamics of the macroeconomic variables in the system.
Leveraging information from conventional economic theory using this prior
significantly improves inference precision and forecast accuracy compared to
the standard TVP-VAR. In an application, I use this prior to incorporate
information from a New Keynesian model that includes both the Zero Lower Bound
(ZLB) and forward guidance into a medium-scale TVP-VAR model. This approach
leads to more precise estimates of the impulse response functions, revealing a
distinct propagation of risk premium shocks inside and outside the ZLB in US
data.
arXiv link: http://arxiv.org/abs/2311.11858v2
Modeling economies of scope in joint production: Convex regression of input distance function
a radial convex nonparametric least squares (CNLS) approach to estimate the
input distance function with multiple outputs. We document the correct input
distance function transformation and prove that the necessary orthogonality
conditions can be satisfied in radial CNLS. A Monte Carlo study is performed to
compare the finite sample performance of radial CNLS and other deterministic
and stochastic frontier approaches in terms of the input distance function
estimation. We apply our novel approach to the Finnish electricity distribution
network regulation and empirically confirm that the input isoquants become more
curved. In addition, we introduce the weight restriction to radial CNLS to
mitigate the potential overfitting and increase the out-of-sample performance
in energy regulation.
arXiv link: http://arxiv.org/abs/2311.11637v1
High-Throughput Asset Pricing
constructed from accounting ratios, past returns, and ticker symbols. This
“high-throughput asset pricing” matches the out-of-sample performance of top
journals while eliminating look-ahead bias. Naively mining for the largest
Sharpe ratios leads to similar performance, consistent with our theoretical
results, though EB uniquely provides unbiased predictions with transparent
intuition. Predictability is concentrated in accounting strategies, small
stocks, and pre-2004 periods, consistent with limited attention theories.
Multiple testing methods popular in finance fail to identify most out-of-sample
performers. High-throughput methods provide a rigorous, unbiased framework for
understanding asset prices.
arXiv link: http://arxiv.org/abs/2311.10685v3
Inference in Auctions with Many Bidders Using Transaction Prices
with many bidders, using an asymptotic framework where the number of bidders
increases while the number of auctions remains fixed. Relevant applications
include online, treasury, spectrum, and art auctions. Our approach enables
asymptotically exact inference on key features such as the winner's expected
utility, the seller's expected revenue, and the tail of the valuation
distribution using only transaction price data. Our simulations demonstrate the
accuracy of the methods in finite samples. We apply our methods to Hong Kong
vehicle license auctions, focusing on high-priced, single-letter plates.
arXiv link: http://arxiv.org/abs/2311.09972v3
Estimating Functionals of the Joint Distribution of Potential Outcomes with Optimal Transport
potential outcomes. Such parameters are especially relevant in policy
evaluation settings, where noncompliance is common and accommodated through the
model of Imbens & Angrist (1994). This paper shows that the sharp identified
set for these parameters is an interval with endpoints characterized by the
value of optimal transport problems. Sample analogue estimators are proposed
based on the dual problem of optimal transport. These estimators are root-n
consistent and converge in distribution under mild assumptions. Inference
procedures based on the bootstrap are straightforward and computationally
convenient. The ideas and estimators are demonstrated in an application
revisiting the National Supported Work Demonstration job training program. I
find suggestive evidence that workers who would see below average earnings
without treatment tend to see above average benefits from treatment.
arXiv link: http://arxiv.org/abs/2311.09435v1
Incorporating Preferences Into Treatment Assignment Problems
using stated preferences for treatments. If individuals know in advance how the
assignment will be individualized based on their stated preferences, they may
state false preferences. We derive an individualized treatment rule (ITR) that
maximizes welfare when individuals strategically state their preferences. We
also show that the optimal ITR is strategy-proof, that is, individuals do not
have a strong incentive to lie even if they know the optimal ITR a priori.
Constructing the optimal ITR requires information on the distribution of true
preferences and the average treatment effect conditioned on true preferences.
In practice, the information must be identified and estimated from the data. As
true preferences are hidden information, the identification is not
straightforward. We discuss two experimental designs that allow the
identification: strictly strategy-proof randomized controlled trials and doubly
randomized preference trials. Under the presumption that data comes from one of
these experiments, we develop data-dependent procedures for determining ITR,
that is, statistical treatment rules (STRs). The maximum regret of the proposed
STRs converges to zero at a rate of the square root of the sample size. An
empirical application demonstrates our proposed STRs.
arXiv link: http://arxiv.org/abs/2311.08963v1
Locally Asymptotically Minimax Statistical Treatment Rules Under Partial Identification
a treatment assignment rule deployed in a future population from available
data. With the true knowledge of the data generating process, the average
treatment effect (ATE) is the key quantity characterizing the optimal treatment
rule. Unfortunately, the ATE is often not point identified but partially
identified. Presuming the partial identification of the ATE, this study
conducts a local asymptotic analysis and develops the locally asymptotically
minimax (LAM) STR. The analysis does not assume the full differentiability but
the directional differentiability of the boundary functions of the
identification region of the ATE. Accordingly, the study shows that the LAM STR
differs from the plug-in STR. A simulation study also demonstrates that the LAM
STR outperforms the plug-in STR.
arXiv link: http://arxiv.org/abs/2311.08958v1
Estimating Conditional Value-at-Risk with Nonstationary Quantile Predictive Regression Models
instrumentation approach in quantile predictive regressions when both generated
covariates and persistent predictors are used. The generated covariates are
obtained from an auxiliary quantile predictive regression model and the
statistical problem of interest is the robust estimation and inference of the
parameters that correspond to the primary quantile predictive regression in
which this generated covariate is added to the set of nonstationary regressors.
We find that the proposed doubly IVX corrected estimator is robust to the
abstract degree of persistence regardless of the presence of generated
regressor obtained from the first stage procedure. The asymptotic properties of
the two-stage IVX estimator such as mixed Gaussianity are established while the
asymptotic covariance matrix is adjusted to account for the first-step
estimation error.
arXiv link: http://arxiv.org/abs/2311.08218v6
Optimal Estimation of Large-Dimensional Nonlinear Factor Models
models. The key challenge is that the observed variables are possibly nonlinear
functions of some latent variables where the functional forms are left
unspecified. A local principal component analysis method is proposed to
estimate the factor structure and recover information on latent variables and
latent functions, which combines $K$-nearest neighbors matching and principal
component analysis. Large-sample properties are established, including a sharp
bound on the matching discrepancy of nearest neighbors, sup-norm error bounds
for estimated local factors and factor loadings, and the uniform convergence
rate of the factor structure estimator. Under mild conditions our estimator of
the latent factor structure can achieve the optimal rate of uniform convergence
for nonparametric regression. The method is illustrated with a Monte Carlo
experiment and an empirical application studying the effect of tax cuts on
economic growth.
arXiv link: http://arxiv.org/abs/2311.07243v1
High Dimensional Binary Choice Model with Unknown Heteroskedasticity or Instrumental Variables
choice models. We consider a semiparametric model that places no distributional
assumptions on the error term, allows for heteroskedastic errors, and permits
endogenous regressors. Our approaches extend the special regressor estimator
originally proposed by Lewbel (2000). This estimator becomes impractical in
high-dimensional settings due to the curse of dimensionality associated with
high-dimensional conditional density estimation. To overcome this challenge, we
introduce an innovative data-driven dimension reduction method for
nonparametric kernel estimators, which constitutes the main contribution of
this work. The method combines distance covariance-based screening with
cross-validation (CV) procedures, making special regressor estimation feasible
in high dimensions. Using this new feasible conditional density estimator, we
address variable and moment (instrumental variable) selection problems for
these models. We apply penalized least squares (LS) and generalized method of
moments (GMM) estimators with an L1 penalty. A comprehensive analysis of the
oracle and asymptotic properties of these estimators is provided. Finally,
through Monte Carlo simulations and an empirical study on the migration
intentions of rural Chinese residents, we demonstrate the effectiveness of our
proposed methods in finite sample settings.
arXiv link: http://arxiv.org/abs/2311.07067v2
Design-based Estimation Theory for Complex Experiments
experiments with complex experimental designs, including cases with
interference between units. We develop a design-based estimation theory for
arbitrary experimental designs. Our theory facilitates the analysis of many
design-estimator pairs that researchers commonly employ in practice and provide
procedures to consistently estimate asymptotic variance bounds. We propose new
classes of estimators with favorable asymptotic properties from a design-based
point of view. In addition, we propose a scalar measure of experimental
complexity which can be linked to the design-based variance of the estimators.
We demonstrate the performance of our estimators using simulated datasets based
on an actual network experiment studying the effect of social networks on
insurance adoptions.
arXiv link: http://arxiv.org/abs/2311.06891v2
Quasi-Bayes in Latent Variable Models
of economic behavior. This paper introduces a quasi-Bayes approach to
nonparametrically estimate a large class of latent variable models. As an
application, we model U.S. individual log earnings from the Panel Study of
Income Dynamics (PSID) as the sum of latent permanent and transitory
components. Simulations illustrate the favorable performance of quasi-Bayes
estimators relative to common alternatives.
arXiv link: http://arxiv.org/abs/2311.06831v3
Cuánto es demasiada inflación? Una clasificación de regímenes inflacionarios
mostly been based on arbitrary characterizations, subject to value judgments by
researchers. The objective of this study is to propose a new methodological
approach that reduces subjectivity and improves accuracy in the construction of
such regimes. The method is built upon a combination of clustering techniques
and classification trees, which allows for an historical periodization of
Argentina's inflationary history for the period 1943-2022. Additionally, two
procedures are introduced to smooth out the classification over time: a measure
of temporal contiguity of observations and a rolling method based on the simple
majority rule. The obtained regimes are compared against the existing
literature on the inflation-relative price variability relationship, revealing
a better performance of the proposed regimes.
arXiv link: http://arxiv.org/abs/2401.02428v1
Time-Varying Identification of Monetary Policy Shocks
autoregression with data-driven time-varying identification. The model selects
alternative exclusion restrictions over time and, as a condition for the
search, allows to verify identification through heteroskedasticity within each
regime. Based on four alternative monetary policy rules, we show that a monthly
six-variable system supports time variation in US monetary policy shock
identification. In the sample-dominating first regime, systematic monetary
policy follows a Taylor rule extended by the term spread, effectively curbing
inflation. In the second regime, occurring after 2000 and gaining more
persistence after the global financial and COVID crises, it is characterized by
a money-augmented Taylor rule. This regime's unconventional monetary policy
provides economic stimulus, features the liquidity effect, and is complemented
by a pure term spread shock. Absent the specific monetary policy of the second
regime, inflation would be over one percentage point higher on average after
2008.
arXiv link: http://arxiv.org/abs/2311.05883v4
Business Policy Experiments using Fractional Factorial Designs: Consumer Retention on DoorDash
and lower the cost of learning through experimentation by factorizing business
policies and employing fractional factorial experimental designs for their
evaluation. We illustrate how this method integrates with advances in the
estimation of heterogeneous treatment effects, elaborating on its advantages
and foundational assumptions. We empirically demonstrate the implementation and
benefits of our approach and assess its validity in evaluating consumer
promotion policies at DoorDash, which is one of the largest delivery platforms
in the US. Our approach discovers a policy with 5% incremental profit at 67%
lower implementation cost.
arXiv link: http://arxiv.org/abs/2311.14698v2
Debiased Fixed Effects Estimation of Binary Logit Models with Three-Dimensional Panel Data
leads to unreliable inference due to the incidental parameter problem. We study
the case of three-dimensional panel data, where the model includes three sets
of additive and overlapping unobserved effects. This encompasses models for
network panel data, where senders and receivers maintain bilateral
relationships over time, and fixed effects account for unobserved heterogeneity
at the sender-time, receiver-time, and sender-receiver levels. In an asymptotic
framework, where all three panel dimensions grow large at constant relative
rates, we characterize the leading bias of the naive estimator. The inference
problem we identify is particularly severe, as it is not possible to balance
the order of the bias and the standard deviation. As a consequence, the naive
estimator has a degenerating asymptotic distribution, which exacerbates the
inference problem relative to other fixed effects estimators studied in the
literature. To resolve the inference problem, we derive explicit expressions to
debias the fixed effects estimator.
arXiv link: http://arxiv.org/abs/2311.04073v1
Optimal Estimation Methodologies for Panel Data Regression Models
for panel data regression models. In particular, we present current
methodological developments for modeling stationary panel data as well as
robust methods for estimation and inference in nonstationary panel data
regression models. Some applications from the network econometrics and high
dimensional statistics literature are also discussed within a stationary time
series environment.
arXiv link: http://arxiv.org/abs/2311.03471v3
Estimation and Inference for a Class of Generalized Hierarchical Models
parameters and function involved in a class of generalized hierarchical models.
Such models are of great interest in the literature of neural networks (such as
Bauer and Kohler, 2019). We propose a rectified linear unit (ReLU) based deep
neural network (DNN) approach, and contribute to the design of DNN by i)
providing more transparency for practical implementation, ii) defining
different types of sparsity, iii) showing the differentiability, iv) pointing
out the set of effective parameters, and v) offering a new variant of rectified
linear activation function (ReLU), etc. Asymptotic properties are established
accordingly, and a feasible procedure for the purpose of inference is also
proposed. We conduct extensive numerical studies to examine the finite-sample
performance of the estimation methods, and we also evaluate the empirical
relevance and applicability of the proposed models and estimation methods to
real data.
arXiv link: http://arxiv.org/abs/2311.02789v5
Individualized Policy Evaluation and Learning under Clustered Network Interference
much of the prior work assumes that the treatment assignment of one unit does
not affect the outcome of another unit. Unfortunately, ignoring interference
can lead to biased policy evaluation and ineffective learned policies. For
example, treating influential individuals who have many friends can generate
positive spillover effects, thereby improving the overall performance of an
individualized treatment rule (ITR). We consider the problem of evaluating and
learning an optimal ITR under clustered network interference (also known as
partial interference), where clusters of units are sampled from a population
and units may influence one another within each cluster. Unlike previous
methods that impose strong restrictions on spillover effects, such as anonymous
interference, the proposed methodology only assumes a semiparametric structural
model, where each unit's outcome is an additive function of individual
treatments within the cluster. Under this model, we propose an estimator that
can be used to evaluate the empirical performance of an ITR. We show that this
estimator is substantially more efficient than the standard inverse probability
weighting estimator, which does not impose any assumption about spillover
effects. We derive the finite-sample regret bound for a learned ITR, showing
that the use of our efficient evaluation estimator leads to the improved
performance of learned policies. We consider both experimental and
observational studies, and for the latter, we develop a doubly robust estimator
that is semiparametrically efficient and yields an optimal regret bound.
Finally, we conduct simulation and empirical studies to illustrate the
advantages of the proposed methodology.
arXiv link: http://arxiv.org/abs/2311.02467v3
The Fragility of Sparsity
which rely on the assumption of sparsity are fragile in two ways. First, we
document that different choices of the regressor matrix that do not impact
ordinary least squares (OLS) estimates, such as the choice of baseline category
with categorical controls, can move sparsity-based estimates by two standard
errors or more. Second, we develop two tests of the sparsity assumption based
on comparing sparsity-based estimators with OLS. The tests tend to reject the
sparsity assumption in all three applications. Unless the number of regressors
is comparable to or exceeds the sample size, OLS yields more robust inference
at little efficiency cost.
arXiv link: http://arxiv.org/abs/2311.02299v4
Pooled Bewley Estimator of Long Run Relationships in Dynamic Heterogenous Panels
Bewley, a novel pooled Bewley (PB) estimator of long-run coefficients for
dynamic panels with heterogeneous short-run dynamics is proposed. The PB
estimator is directly comparable to the widely used Pooled Mean Group (PMG)
estimator, and is shown to be consistent and asymptotically normal. Monte Carlo
simulations show good small sample performance of PB compared to the existing
estimators in the literature, namely PMG, panel dynamic OLS (PDOLS), and panel
fully-modified OLS (FMOLS). Application of two bias-correction methods and a
bootstrapping of critical values to conduct inference robust to cross-sectional
dependence of errors are also considered. The utility of the PB estimator is
illustrated in an empirical application to the aggregate consumption function.
arXiv link: http://arxiv.org/abs/2311.02196v1
The learning effects of subsidies to bundled goods: a semiparametric approach
learning about the quality of one of the constituent goods? This paper provides
theoretical support and empirical evidence on this mechanism. Theoretically, we
introduce a model where an agent learns about the quality of an innovation
through repeated consumption. We then assess the predictions of our theory in a
randomised experiment in a ridesharing platform. The experiment subsidised car
trips integrating with a train or metro station, which we interpret as a
bundle. Given the heavy-tailed nature of our data, we propose a semiparametric
specification for treatment effects that enables the construction of more
efficient estimators. We then introduce an efficient estimator for our
specification by relying on L-moments. Our results indicate that a ten-weekday
50% discount on integrated trips leads to a large contemporaneous increase in
the demand for integration, and, consistent with our model, persistent changes
in the mean and dispersion of nonintegrated app rides. These effects last for
over four months. A calibration of our theoretical model suggests that around
40% of the contemporaneous increase in integrated rides may be attributable to
increased incentives to learning. Our results have nontrivial policy
implications for the design of public transit systems.
arXiv link: http://arxiv.org/abs/2311.01217v4
Data-driven fixed-point tuning for truncated realized variations
semimartingales in the presence of jumps require specification of tuning
parameters for their use in practice. In much of the available theory, tuning
parameters are assumed to be deterministic and their values are specified only
up to asymptotic constraints. However, in empirical work and in simulation
studies, they are typically chosen to be random and data-dependent, with
explicit choices often relying entirely on heuristics. In this paper, we
consider novel data-driven tuning procedures for the truncated realized
variations of a semimartingale with jumps based on a type of random fixed-point
iteration. Being effectively automated, our approach alleviates the need for
delicate decision-making regarding tuning parameters in practice and can be
implemented using information regarding sampling frequency alone. We
demonstrate our methods can lead to asymptotically efficient estimation of
integrated volatility and exhibit superior finite-sample performance compared
to popular alternatives in the literature.
arXiv link: http://arxiv.org/abs/2311.00905v3
On Gaussian Process Priors in Conditional Moment Restriction Models
for an unknown function that is identified by a nonparametric conditional
moment restriction. We derive contraction rates for a class of Gaussian process
priors. Furthermore, we provide conditions under which a Bernstein von Mises
theorem holds for the quasi-posterior distribution. As a consequence, we show
that optimally weighted quasi-Bayes credible sets have exact asymptotic
frequentist coverage.
arXiv link: http://arxiv.org/abs/2311.00662v2
Personalized Assignment to One of Many Treatment Arms via Regularized and Clustered Joint Assignment Forests
from a randomized controlled trial. Standard methods that estimate
heterogeneous treatment effects separately for each arm may perform poorly in
this case due to excess variance. We instead propose methods that pool
information across treatment arms: First, we consider a regularized
forest-based assignment algorithm based on greedy recursive partitioning that
shrinks effect estimates across arms. Second, we augment our algorithm by a
clustering scheme that combines treatment arms with consistently similar
outcomes. In a simulation study, we compare the performance of these approaches
to predicting arm-wise outcomes separately, and document gains of directly
optimizing the treatment assignment with regularization and clustering. In a
theoretical model, we illustrate how a high number of treatment arms makes
finding the best arm hard, while we can achieve sizable utility gains from
personalization by regularized optimization.
arXiv link: http://arxiv.org/abs/2311.00577v1
Robustify and Tighten the Lee Bounds: A Sample Selection Model under Stochastic Monotonicity and Symmetry Assumptions
popular tool for estimating a treatment effect. However, the Lee bounds rely on
the monotonicity assumption, whose empirical validity is sometimes unclear.
Furthermore, the bounds are often regarded to be wide and less informative even
under monotonicity. To address these issues, this study introduces a stochastic
version of the monotonicity assumption alongside a nonparametric distributional
shape constraint. The former enhances the robustness of the Lee bounds with
respect to monotonicity, while the latter helps tighten these bounds. The
obtained bounds do not rely on the exclusion restriction and can be root-$n$
consistently estimable, making them practically viable. The potential
usefulness of the proposed methods is illustrated by their application on
experimental data from the after-school instruction programme studied by
Muralidharan, Singh, and Ganimian (2019).
arXiv link: http://arxiv.org/abs/2311.00439v4
Semiparametric Discrete Choice Models for Bundles
for bundles. Our first approach is a kernel-weighted rank estimator based on a
matching-based identification strategy. We establish its complete asymptotic
properties and prove the validity of the nonparametric bootstrap for inference.
We then introduce a new multi-index least absolute deviations (LAD) estimator
as an alternative, of which the main advantage is its capacity to estimate
preference parameters on both alternative- and agent-specific regressors. Both
methods can account for arbitrary correlation in disturbances across choices,
with the former also allowing for interpersonal heteroskedasticity. We also
demonstrate that the identification strategy underlying these procedures can be
extended naturally to panel data settings, producing an analogous localized
maximum score estimator and a LAD estimator for estimating bundle choice models
with fixed effects. We derive the limiting distribution of the former and
verify the validity of the numerical bootstrap as an inference tool. All our
proposed methods can be applied to general multi-index models. Monte Carlo
experiments show that they perform well in finite samples.
arXiv link: http://arxiv.org/abs/2311.00013v3
Robust Estimation of Realized Correlation: New Insight about Intraday Fluctuations in Market Betas
which causes standard correlation estimators to be inconsistent. The quadrant
correlation estimator is consistent but very inefficient. We propose a novel
subsampled quadrant estimator that improves efficiency while preserving
consistency and robustness. This estimator is particularly well-suited for
high-frequency financial data and we apply it to a large panel of US stocks.
Our empirical analysis sheds new light on intra-day fluctuations in market
betas by decomposing them into time-varying correlations and relative
volatility changes. Our results show that intraday variation in betas is
primarily driven by intraday variation in correlations.
arXiv link: http://arxiv.org/abs/2310.19992v1
Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a Fixed Budget
arm with the highest expected outcome, referred to as best arm identification
(BAI). In our experiments, the number of treatment-allocation rounds is fixed.
During each round, a decision-maker allocates an arm and observes a
corresponding outcome, which follows a Gaussian distribution with variances
that can differ among the arms. At the end of the experiment, the
decision-maker recommends one of the arms as an estimate of the best arm. To
design an experiment, we first discuss lower bounds for the probability of
misidentification. Our analysis highlights that the available information on
the outcome distribution, such as means (expected outcomes), variances, and the
choice of the best arm, significantly influences the lower bounds. Because
available information is limited in actual experiments, we develop a lower
bound that is valid under the unknown means and the unknown choice of the best
arm, which are referred to as the worst-case lower bound. We demonstrate that
the worst-case lower bound depends solely on the variances of the outcomes.
Then, under the assumption that the variances are known, we propose the
Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy, an
extension of the Neyman allocation proposed by Neyman (1934). We show that the
GNA-EBA strategy is asymptotically optimal in the sense that its probability of
misidentification aligns with the lower bounds as the sample size increases
infinitely and the differences between the expected outcomes of the best and
other suboptimal arms converge to the same values across arms. We refer to such
strategies as asymptotically worst-case optimal.
arXiv link: http://arxiv.org/abs/2310.19788v3
Characteristics of price related fluctuations in Non-Fungible Token (NFT) market
blockchain technology which parallels the cryptocurrency market. In the present
work we study capitalization, floor price, the number of transactions, the
inter-transaction times, and the transaction volume value of a few selected
popular token collections. The results show that the fluctuations of all these
quantities are characterized by heavy-tailed probability distribution
functions, in most cases well described by the stretched exponentials, with a
trace of power-law scaling at times, long-range memory, and in several cases
even the fractal organization of fluctuations, mostly restricted to the larger
fluctuations, however. We conclude that the NFT market - even though young and
governed by a somewhat different mechanisms of trading - shares several
statistical properties with the regular financial markets. However, some
differences are visible in the specific quantitative indicators.
arXiv link: http://arxiv.org/abs/2310.19747v2
A Bayesian Markov-switching SAR model for time-varying cross-price spillovers
switching dynamics for the weight matrix and spatial autoregressive parameter.
The framework enables the identification of regime-specific connectivity
patterns and strengths and the study of the spatiotemporal propagation of
shocks in a system with a time-varying spatial multiplier matrix. The proposed
model is applied to disaggregated CPI data from 15 EU countries to examine
cross-price dependencies. The analysis identifies distinct connectivity
structures and spatial weights across the states, which capture shifts in
consumer behaviour, with marked cross-country differences in the spillover from
one price category to another.
arXiv link: http://arxiv.org/abs/2310.19557v1
Spectral identification and estimation of mixed causal-noncausal invertible-noninvertible models
simulating mixed causal-noncausal invertible-noninvertible models. We propose a
framework that integrates high-order cumulants, merging both the spectrum and
bispectrum into a single estimation function. The model that most adequately
represents the data under the assumption that the error term is i.i.d. is
selected. Our Monte Carlo study reveals unbiased parameter estimates and a high
frequency with which correct models are identified. We illustrate our strategy
through an empirical analysis of returns from 24 Fama-French emerging market
stock portfolios. The findings suggest that each portfolio displays noncausal
dynamics, producing white noise residuals devoid of conditional heteroscedastic
effects.
arXiv link: http://arxiv.org/abs/2310.19543v1
Popularity, face and voice: Predicting and interpreting livestreamers' retail performance using machine learning techniques
the broad spectrum of traditional sales performance determinants. To
investigate the factors that contribute to the success of livestreaming
commerce, we construct a longitudinal firm-level database with 19,175
observations, covering an entire livestreaming subsector. By comparing the
forecasting accuracy of eight machine learning models, we identify a random
forest model that provides the best prediction of gross merchandise volume
(GMV). Furthermore, we utilize explainable artificial intelligence to open the
black-box of machine learning model, discovering four new facts: 1) variables
representing the popularity of livestreaming events are crucial features in
predicting GMV. And voice attributes are more important than appearance; 2)
popularity is a major determinant of sales for female hosts, while vocal
aesthetics is more decisive for their male counterparts; 3) merits and
drawbacks of the voice are not equally valued in the livestreaming market; 4)
based on changes of comments, page views and likes, sales growth can be divided
into three stages. Finally, we innovatively propose a 3D-SHAP diagram that
demonstrates the relationship between predicting feature importance, target
variable, and its predictors. This diagram identifies bottlenecks for both
beginner and top livestreamers, providing insights into ways to optimize their
sales performance.
arXiv link: http://arxiv.org/abs/2310.19200v1
Cluster-Randomized Trials with Cross-Cluster Interference
within but not across clusters. This may be implausible when units are
irregularly distributed across space without well-separated communities, as
clusters in such cases may not align with significant geographic, social, or
economic divisions. This paper develops methods for reducing bias due to
cross-cluster interference. We first propose an estimation strategy that
excludes units not surrounded by clusters assigned to the same treatment arm.
We show that this substantially reduces bias relative to conventional
difference-in-means estimators without significant cost to variance. Second, we
formally establish a bias-variance trade-off in the choice of clusters:
constructing fewer, larger clusters reduces bias due to interference but
increases variance. We provide a rule for choosing the number of clusters to
balance the asymptotic orders of the bias and variance of our estimator.
Finally, we consider unsupervised learning for cluster construction and provide
theoretical guarantees for $k$-medoids.
arXiv link: http://arxiv.org/abs/2310.18836v4
Covariate Balancing and the Equivalence of Weighting and Doubly Robust Estimators of Average Treatment Effects
score is estimated using a specific covariate balancing approach, inverse
probability weighting (IPW), augmented inverse probability weighting (AIPW),
and inverse probability weighted regression adjustment (IPWRA) estimators are
numerically equivalent for the average treatment effect (ATE), and likewise for
the average treatment effect on the treated (ATT). The resulting weights are
inherently normalized, making normalized and unnormalized IPW and AIPW
identical. We discuss implications for instrumental variables and
difference-in-differences estimators and illustrate with two applications how
these numerical equivalences simplify analysis and interpretation.
arXiv link: http://arxiv.org/abs/2310.18563v2
Doubly Robust Identification of Causal Effects of a Continuous Treatment using Discrete Instruments
endogenous variable (treatment) using a binary instrument. Estimation is
typically done through linear 2SLS. This approach requires a mean treatment
change and causal interpretation requires the LATE-type monotonicity in the
first stage. An alternative approach is to explore distributional changes in
the treatment, where the first-stage restriction is treatment rank similarity.
We propose causal estimands that are doubly robust in that they are valid under
either of these two restrictions. We apply the doubly robust estimation to
estimate the impacts of sleep on well-being. Our new estimates corroborate the
usual 2SLS estimates.
arXiv link: http://arxiv.org/abs/2310.18504v3
Inside the black box: Neural network-based real-time prediction of US recessions
model US recessions from 1967 to 2021. Their predictive performances are
compared to those of the traditional linear models. The out-of-sample
performance suggests the application of LSTM and GRU in recession forecasting,
especially for longer-term forecasts. The Shapley additive explanations (SHAP)
method is applied to both groups of models. The SHAP-based different weight
assignments imply the capability of these types of neural networks to capture
the business cycle asymmetries and nonlinearities. The SHAP method delivers key
recession indicators, such as the S&P 500 index for short-term forecasting up
to 3 months and the term spread for longer-term forecasting up to 12 months.
These findings are robust against other interpretation methods, such as the
local interpretable model-agnostic explanations (LIME) and the marginal
effects.
arXiv link: http://arxiv.org/abs/2310.17571v3
Tackling Interference Induced by Data Training Loops in A/B Tests: A Weighted Training Approach
machine learning models on historical data to predict user behaviors and
improve recommendations continuously. However, these data training loops can
introduce interference in A/B tests, where data generated by control and
treatment algorithms, potentially with different distributions, are combined.
To address these challenges, we introduce a novel approach called weighted
training. This approach entails training a model to predict the probability of
each data point appearing in either the treatment or control data and
subsequently applying weighted losses during model training. We demonstrate
that this approach achieves the least variance among all estimators that do not
cause shifts in the training distributions. Through simulation studies, we
demonstrate the lower bias and variance of our approach compared to other
methods.
arXiv link: http://arxiv.org/abs/2310.17496v5
Bayesian SAR model with stochastic volatility and multiple time-varying weights
incorporates multilayer networks and accounts for time-varying relationships.
Moreover, the proposed approach allows the structural variance to evolve
smoothly over time and enables the analysis of shock propagation in terms of
time-varying spillover effects. The framework is applied to analyse the
dynamics of international relationships among the G7 economies and their impact
on stock market returns and volatilities. The findings underscore the
substantial impact of cooperative interactions and highlight discernible
disparities in network exposure across G7 nations, along with nuanced patterns
in direct and indirect spillover effects.
arXiv link: http://arxiv.org/abs/2310.17473v1
Dynamic Factor Models: a Genealogy
forecasting time series in increasingly high dimensions. While mathematical
statisticians faced with inference problems in high-dimensional observation
spaces were focusing on the so-called spiked-model-asymptotics, econometricians
adopted an entirely and considerably more effective asymptotic approach, rooted
in the factor models originally considered in psychometrics. The so-called
dynamic factor model methods, in two decades, has grown into a wide and
successful body of techniques that are widely used in central banks, financial
institutions, economic and statistical institutes. The objective of this
chapter is not an extensive survey of the topic but a sketch of its historical
growth, with emphasis on the various assumptions and interpretations, and a
family tree of its main variants.
arXiv link: http://arxiv.org/abs/2310.17278v2
Causal Q-Aggregation for CATE Model Selection
core of personalized decision making. While there is a plethora of models for
CATE estimation, model selection is a nontrivial task, due to the fundamental
problem of causal inference. Recent empirical work provides evidence in favor
of proxy loss metrics with double robust properties and in favor of model
ensembling. However, theoretical understanding is lacking. Direct application
of prior theoretical work leads to suboptimal oracle model selection rates due
to the non-convexity of the model selection problem. We provide regret rates
for the major existing CATE ensembling approaches and propose a new CATE model
ensembling approach based on Q-aggregation using the doubly robust loss. Our
main result shows that causal Q-aggregation achieves statistically optimal
oracle model selection regret rates of $\log(M){n}$ (with $M$ models and
$n$ samples), with the addition of higher-order estimation error terms related
to products of errors in the nuisance functions. Crucially, our regret rate
does not require that any of the candidate CATE models be close to the truth.
We validate our new method on many semi-synthetic datasets and also provide
extensions of our work to CATE model selection with instrumental variables and
unobserved confounding.
arXiv link: http://arxiv.org/abs/2310.16945v5
CATE Lasso: Conditional Average Treatment Effect Estimation with High-Dimensional Linear Regression
Effects (CATEs) play an important role as a quantity representing an
individualized causal effect, defined as a difference between the expected
outcomes of the two treatments conditioned on covariates. This study assumes
two linear regression models between a potential outcome and covariates of the
two treatments and defines CATEs as a difference between the linear regression
models. Then, we propose a method for consistently estimating CATEs even under
high-dimensional and non-sparse parameters. In our study, we demonstrate that
desirable theoretical properties, such as consistency, remain attainable even
without assuming sparsity explicitly if we assume a weaker assumption called
implicit sparsity originating from the definition of CATEs. In this assumption,
we suppose that parameters of linear models in potential outcomes can be
divided into treatment-specific and common parameters, where the
treatment-specific parameters take difference values between each linear
regression model, while the common parameters remain identical. Thus, in a
difference between two linear regression models, the common parameters
disappear, leaving only differences in the treatment-specific parameters.
Consequently, the non-zero parameters in CATEs correspond to the differences in
the treatment-specific parameters. Leveraging this assumption, we develop a
Lasso regression method specialized for CATE estimation and present that the
estimator is consistent. Finally, we confirm the soundness of the proposed
method by simulation studies.
arXiv link: http://arxiv.org/abs/2310.16819v1
Double Debiased Covariate Shift Adaptation Robust to Density-Ratio Estimation
and outcomes while test data only contains covariates. In this scenario, our
primary aim is to predict the missing outcomes of the test data. With this
objective in mind, we train parametric regression models under a covariate
shift, where covariate distributions are different between the train and test
data. For this problem, existing studies have proposed covariate shift
adaptation via importance weighting using the density ratio. This approach
averages the train data losses, each weighted by an estimated ratio of the
covariate densities between the train and test data, to approximate the
test-data risk. Although it allows us to obtain a test-data risk minimizer, its
performance heavily relies on the accuracy of the density ratio estimation.
Moreover, even if the density ratio can be consistently estimated, the
estimation errors of the density ratio also yield bias in the estimators of the
regression model's parameters of interest. To mitigate these challenges, we
introduce a doubly robust estimator for covariate shift adaptation via
importance weighting, which incorporates an additional estimator for the
regression function. Leveraging double machine learning techniques, our
estimator reduces the bias arising from the density ratio estimation errors. We
demonstrate the asymptotic distribution of the regression parameter estimator.
Notably, our estimator remains consistent if either the density ratio estimator
or the regression function is consistent, showcasing its robustness against
potential errors in density ratio estimation. Finally, we confirm the soundness
of our proposed method via simulation studies.
arXiv link: http://arxiv.org/abs/2310.16638v3
Fair Adaptive Experiments
effectiveness of a treatment or policy. The classical complete randomization
approach assigns treatments based on a prespecified probability and may lead to
inefficient use of data. Adaptive experiments improve upon complete
randomization by sequentially learning and updating treatment assignment
probabilities. However, their application can also raise fairness and equity
concerns, as assignment probabilities may vary drastically across groups of
participants. Furthermore, when treatment is expected to be extremely
beneficial to certain groups of participants, it is more appropriate to expose
many of these participants to favorable treatment. In response to these
challenges, we propose a fair adaptive experiment strategy that simultaneously
enhances data use efficiency, achieves an envy-free treatment assignment
guarantee, and improves the overall welfare of participants. An important
feature of our proposed strategy is that we do not impose parametric modeling
assumptions on the outcome variables, making it more versatile and applicable
to a wider array of applications. Through our theoretical investigation, we
characterize the convergence rate of the estimated treatment effects and the
associated standard deviations at the group level and further prove that our
adaptive treatment assignment algorithm, despite not having a closed-form
expression, approaches the optimal allocation rule asymptotically. Our proof
strategy takes into account the fact that the allocation decisions in our
design depend on sequentially accumulated data, which poses a significant
challenge in characterizing the properties and conducting statistical inference
of our method. We further provide simulation evidence to showcase the
performance of our fair adaptive experiment strategy.
arXiv link: http://arxiv.org/abs/2310.16290v1
Improving Robust Decisions with Data
(DGP), which is only known to belong to a set of sequences of independent but
possibly non-identical distributions. A robust decision maximizes the expected
payoff against the worst possible DGP in this set. This paper characterizes
when and how such robust decisions can be improved with data, measured by the
expected payoff under the true DGP, no matter which possible DGP is the truth.
It further develops novel and simple inference methods to achieve it, as common
methods (e.g., maximum likelihood) may fail to deliver such an improvement.
arXiv link: http://arxiv.org/abs/2310.16281v4
Testing for equivalence of pre-trends in Difference-in-Differences estimation
Difference-in-Differences estimation is usually assessed by a test of the null
hypothesis that the difference between the average outcomes of both groups is
constant over time before the treatment. However, failure to reject the null
hypothesis does not imply the absence of differences in time trends between
both groups. We provide equivalence tests that allow researchers to find
evidence in favor of the parallel trends assumption and thus increase the
credibility of their treatment effect estimates. While we motivate our tests in
the standard two-way fixed effects model, we discuss simple extensions to
settings in which treatment adoption is staggered over time.
arXiv link: http://arxiv.org/abs/2310.15796v1
The impact of the Russia-Ukraine conflict on the extreme risk spillovers between agricultural futures and spots
posed significant threats and challenges to the global food system and world
food security. Focusing on the impact of the conflict on the global
agricultural market, we propose a new analytical framework for tail dependence,
and combine the Copula-CoVaR method with the ARMA-GARCH-skewed Student-t model
to examine the tail dependence structure and extreme risk spillover between
agricultural futures and spots over the pre- and post-outbreak periods. Our
results indicate that the tail dependence structures in the futures-spot
markets of soybean, maize, wheat, and rice have all reacted to the
Russia-Ukraine conflict. Furthermore, the outbreak of the conflict has
intensified risks of the four agricultural markets in varying degrees, with the
wheat market being affected the most. Additionally, all the agricultural
futures markets exhibit significant downside and upside risk spillovers to
their corresponding spot markets before and after the outbreak of the conflict,
whereas the strengths of these extreme risk spillover effects demonstrate
significant asymmetries at the directional (downside versus upside) and
temporal (pre-outbreak versus post-outbreak) levels.
arXiv link: http://arxiv.org/abs/2310.16850v1
Correlation structure analysis of the global agricultural futures market
structure of the global agricultural futures market from 2000 to 2020. It is
found that the distribution of correlation coefficients is asymmetric and right
skewed, and many eigenvalues of the correlation matrix deviate from the RMT
prediction. The largest eigenvalue reflects a collective market effect common
to all agricultural futures, the other largest deviating eigenvalues can be
implemented to identify futures groups, and there are modular structures based
on regional properties or agricultural commodities among the significant
participants of their corresponding eigenvectors. Except for the smallest
eigenvalue, other smallest deviating eigenvalues represent the agricultural
futures pairs with highest correlations. This paper can be of reference and
significance for using agricultural futures to manage risk and optimize asset
allocation.
arXiv link: http://arxiv.org/abs/2310.16849v1
Inference for Rank-Rank Regressions
intergenerational mobility. In this article, we first show that commonly used
inference methods for this slope parameter are invalid. Second, when the
underlying distribution is not continuous, the OLS estimator and its asymptotic
distribution may be highly sensitive to how ties in the ranks are handled.
Motivated by these findings we develop a new asymptotic theory for the OLS
estimator in a general class of rank-rank regression specifications without
imposing any assumptions about the continuity of the underlying distribution.
We then extend the asymptotic theory to other regressions involving ranks that
have been used in empirical work. Finally, we apply our new inference methods
to two empirical studies on intergenerational mobility, highlighting the
practical implications of our theoretical findings.
arXiv link: http://arxiv.org/abs/2310.15512v5
Causal clustering: design of cluster experiments under network interference
treatment effect in the presence of network spillovers. We provide a framework
to choose the clustering that minimizes the worst-case mean-squared error of
the estimated global effect. We show that optimal clustering solves a novel
penalized min-cut optimization problem computed via off-the-shelf semi-definite
programming algorithms. Our analysis also characterizes simple conditions to
choose between any two cluster designs, including choosing between a cluster or
individual-level randomization. We illustrate the method's properties using
unique network data from the universe of Facebook's users and existing data
from a field experiment.
arXiv link: http://arxiv.org/abs/2310.14983v3
BVARs and Stochastic Volatility
forecasting. Research in the last decade has established the importance of
allowing time-varying volatility to capture both secular and cyclical
variations in macroeconomic uncertainty. This recognition, together with the
growing availability of large datasets, has propelled a surge in recent
research in building stochastic volatility models suitable for large BVARs.
Some of these new models are also equipped with additional features that are
especially desirable for large systems, such as order invariance -- i.e.,
estimates are not dependent on how the variables are ordered in the BVAR -- and
robustness against COVID-19 outliers. Estimation of these large, flexible
models is made possible by the recently developed equation-by-equation approach
that drastically reduces the computational cost of estimating large systems.
Despite these recent advances, there remains much ongoing work, such as the
development of parsimonious approaches for time-varying coefficients and other
types of nonlinearities in large BVARs.
arXiv link: http://arxiv.org/abs/2310.14438v1
On propensity score matching with a diverging number of matches
matching for average treatment effect estimation. We explore the asymptotic
behavior of these estimators when the number of nearest neighbors, $M$, grows
with the sample size. It is shown, hardly surprising but technically
nontrivial, that the modified estimators can improve upon the original
fixed-$M$ estimators in terms of efficiency. Additionally, we demonstrate the
potential to attain the semiparametric efficiency lower bound when the
propensity score achieves "sufficient" dimension reduction, echoing Hahn
(1998)'s insight about the role of dimension reduction in propensity
score-based causal inference.
arXiv link: http://arxiv.org/abs/2310.14142v2
Unobserved Grouped Heteroskedasticity and Fixed Effects
allow for heteroskedasticity from a discrete latent group variable. Key
features of GFE are preserved, such as individuals belonging to one of a finite
number of groups and group membership is unrestricted and estimated. Ignoring
group heteroskedasticity may lead to poor classification, which is detrimental
to finite sample bias and standard errors of estimators. I introduce the
"weighted grouped fixed effects" (WGFE) estimator that minimizes a weighted
average of group sum of squared residuals. I establish $NT$-consistency
and normality under a concept of group separation based on second moments. A
test of group homoskedasticity is discussed. A fast computation procedure is
provided. Simulations show that WGFE outperforms alternatives that exclude
second moment information. I demonstrate this approach by considering the link
between income and democracy and the effect of unionization on earnings.
arXiv link: http://arxiv.org/abs/2310.14068v2
Bayesian Estimation of Panel Models under Potentially Sparse Heterogeneity
zero ("spike") and a Normal distribution around zero ("slab") into a dynamic
panel data framework to model coefficient heterogeneity. In addition to
homogeneity and full heterogeneity, our specification can also capture sparse
heterogeneity, that is, there is a core group of units that share common
parameters and a set of deviators with idiosyncratic parameters. We fit a model
with unobserved components to income data from the Panel Study of Income
Dynamics. We find evidence for sparse heterogeneity for balanced panels
composed of individuals with long employment histories.
arXiv link: http://arxiv.org/abs/2310.13785v2
Transparency challenges in policy evaluation with causal machine learning -- improving usability and accountability
evaluation tasks to flexibly estimate treatment effects. One issue with these
methods is that the machine learning models used are generally black boxes,
i.e., there is no globally interpretable way to understand how a model makes
estimates. This is a clear problem in policy evaluation applications,
particularly in government, because it is difficult to understand whether such
models are functioning in ways that are fair, based on the correct
interpretation of evidence and transparent enough to allow for accountability
if things go wrong. However, there has been little discussion of transparency
problems in the causal machine learning literature and how these might be
overcome. This paper explores why transparency issues are a problem for causal
machine learning in public policy evaluation applications and considers ways
these problems might be addressed through explainable AI tools and by
simplifying models in line with interpretable AI principles. It then applies
these ideas to a case-study using a causal forest model to estimate conditional
average treatment effects for a hypothetical change in the school leaving age
in Australia. It shows that existing tools for understanding black-box
predictive models are poorly suited to causal machine learning and that
simplifying the model to make it interpretable leads to an unacceptable
increase in error (in this application). It concludes that new tools are needed
to properly understand causal machine learning models and the algorithms that
fit them.
arXiv link: http://arxiv.org/abs/2310.13240v2
A remark on moment-dependent phase transitions in high-dimensional Gaussian approximations
Gaussian critical values can be used for hypothesis testing but beyond which
they cannot. We are particularly interested in how these growth rates depend on
the number of moments that the observations possess.
arXiv link: http://arxiv.org/abs/2310.12863v3
Nonparametric Regression with Dyadic Data
nonseparable dyadic model where the structural function and the distribution of
the unobservable random terms are assumed to be unknown. The identification and
the estimation of the distribution of the unobservable random term are also
proposed. I assume that the structural function is continuous and strictly
increasing in the unobservable heterogeneity. I propose suitable normalization
for the identification by allowing the structural function to have some
desirable properties such as homogeneity of degree one in the unobservable
random term and some of its observables. The consistency and the asymptotic
distribution of the estimators are proposed. The finite sample properties of
the proposed estimators in a Monte-Carlo simulation are assessed.
arXiv link: http://arxiv.org/abs/2310.12825v1
Survey calibration for causal inference: a simple method to balance covariate distributions
distributions of covariates for causal inference based on observational
studies. The method makes it possible to balance an arbitrary number of
quantiles (e.g., medians, quartiles, or deciles) together with means if
necessary. The proposed approach is based on the theory of calibration
estimators (Deville and S\"arndal 1992), in particular, calibration estimators
for quantiles, proposed by Harms and Duchesne (2006). The method does not
require numerical integration, kernel density estimation or assumptions about
the distributions. Valid estimates can be obtained by drawing on existing
asymptotic theory. An illustrative example of the proposed approach is
presented for the entropy balancing method and the covariate balancing
propensity score method. Results of a simulation study indicate that the method
efficiently estimates average treatment effects on the treated (ATT), the
average treatment effect (ATE), the quantile treatment effect on the treated
(QTT) and the quantile treatment effect (QTE), especially in the presence of
non-linearity and mis-specification of the models. The proposed approach can be
further generalized to other designs (e.g. multi-category, continuous) or
methods (e.g. synthetic control method). An open source software implementing
proposed methods is available.
arXiv link: http://arxiv.org/abs/2310.11969v2
Machine Learning for Staggered Difference-in-Differences and Dynamic Treatment Effect Heterogeneity
methods, extending them to enable the examination of treatment effect
heterogeneity in the staggered adoption setting using machine learning. The
proposed method, machine learning difference-in-differences (MLDID), allows for
estimation of time-varying conditional average treatment effects on the
treated, which can be used to conduct detailed inference on drivers of
treatment effect heterogeneity. We perform simulations to evaluate the
performance of MLDID and find that it accurately identifies the true predictors
of treatment effect heterogeneity. We then use MLDID to evaluate the
heterogeneous impacts of Brazil's Family Health Program on infant mortality,
and find those in poverty and urban locations experienced the impact of the
policy more quickly than other subgroups.
arXiv link: http://arxiv.org/abs/2310.11962v1
Trimmed Mean Group Estimation of Average Effects in Ultra Short T Panels under Correlated Heterogeneity
heterogeneity and can lead to misleading inference. This paper proposes a new
trimmed mean group (TMG) estimator which is consistent at the irregular rate of
n^{1/3} even if the time dimension of the panel is as small as the number of
its regressors. Extensions to panels with time effects are provided, and a
Hausman test of correlated heterogeneity is proposed. Small sample properties
of the TMG estimator (with and without time effects) are investigated by Monte
Carlo experiments and shown to be satisfactory and perform better than other
trimmed estimators proposed in the literature. The proposed test of correlated
heterogeneity is also shown to have the correct size and satisfactory power.
The utility of the TMG approach is illustrated with an empirical application.
arXiv link: http://arxiv.org/abs/2310.11680v2
Adaptive maximization of social welfare
welfare. Welfare is a weighted sum of private utility and public revenue.
Earlier outcomes inform later policies. Utility is not observed, but indirectly
inferred. Response functions are learned through experimentation. We derive a
lower bound on regret, and a matching adversarial upper bound for a variant of
the Exp3 algorithm. Cumulative regret grows at a rate of $T^{2/3}$. This
implies that (i) welfare maximization is harder than the multi-armed bandit
problem (with a rate of $T^{1/2}$ for finite policy sets), and (ii) our
algorithm achieves the optimal rate. For the stochastic setting, if social
welfare is concave, we can achieve a rate of $T^{1/2}$ (for continuous policy
sets), using a dyadic search algorithm. We analyze an extension to nonlinear
income taxation, and sketch an extension to commodity taxation. We compare our
setting to monopoly pricing (which is easier), and price setting for bilateral
trade (which is harder).
arXiv link: http://arxiv.org/abs/2310.09597v2
A Semiparametric Instrumented Difference-in-Differences Approach to Policy Learning
difference-in-differences (DiD) approach to evaluate causal effects. Standard
methods in the literature rely on the parallel trends assumption to identify
the average treatment effect on the treated. However, the parallel trends
assumption may be violated in the presence of unmeasured confounding, and the
average treatment effect on the treated may not be useful in learning a
treatment assignment policy for the entire population. In this article, we
propose a general instrumented DiD approach for learning the optimal treatment
policy. Specifically, we establish identification results using a binary
instrumental variable (IV) when the parallel trends assumption fails to hold.
Additionally, we construct a Wald estimator, novel inverse probability
weighting (IPW) estimators, and a class of semiparametric efficient and
multiply robust estimators, with theoretical guarantees on consistency and
asymptotic normality, even when relying on flexible machine learning algorithms
for nuisance parameters estimation. Furthermore, we extend the instrumented DiD
to the panel data setting. We evaluate our methods in extensive simulations and
a real data application.
arXiv link: http://arxiv.org/abs/2310.09545v1
An In-Depth Examination of Requirements for Disclosure Risk Assessment
2020 Decennial Census of Population and Housing has triggered renewed interest
and debate over how to measure the disclosure risks and societal benefits of
the published data products. Following long-established precedent in economics
and statistics, we argue that any proposal for quantifying disclosure risk
should be based on pre-specified, objective criteria. Such criteria should be
used to compare methodologies to identify those with the most desirable
properties. We illustrate this approach, using simple desiderata, to evaluate
the absolute disclosure risk framework, the counterfactual framework underlying
differential privacy, and prior-to-posterior comparisons. We conclude that
satisfying all the desiderata is impossible, but counterfactual comparisons
satisfy the most while absolute disclosure risk satisfies the fewest.
Furthermore, we explain that many of the criticisms levied against differential
privacy would be levied against any technology that is not equivalent to
direct, unrestricted access to confidential data. Thus, more research is
needed, but in the near-term, the counterfactual approach appears best-suited
for privacy-utility analysis.
arXiv link: http://arxiv.org/abs/2310.09398v1
Estimating Individual Responses when Tomorrow Matters
expectations influence their responses to a counterfactual change. We provide
conditions under which average partial effects based on regression estimates
recover structural effects. We propose a practical three-step estimation method
that relies on panel data on subjective expectations. We illustrate our
approach in a model of consumption and saving, focusing on the impact of an
income tax that not only changes current income but also affects beliefs about
future income. Applying our approach to Italian survey data, we find that
individuals' beliefs matter for evaluating the impact of tax policies on
consumption decisions.
arXiv link: http://arxiv.org/abs/2310.09105v3
Smoothed instrumental variables quantile regression
coefficients of the instrumental variables (IV) quantile regression model
introduced by Chernozhukov and Hansen (2005). The sivqr command offers several
advantages over the existing ivqreg and ivqreg2 commands for estimating this IV
quantile regression model, which complements the alternative "triangular model"
behind cqiv and the "local quantile treatment effect" model of ivqte.
Computationally, sivqr implements the smoothed estimator of Kaplan and Sun
(2017), who show that smoothing improves both computation time and statistical
accuracy. Standard errors are computed analytically or by Bayesian bootstrap;
for non-iid sampling, sivqr is compatible with bootstrap. I discuss syntax and
the underlying methodology, and I compare sivqr with other commands in an
example.
arXiv link: http://arxiv.org/abs/2310.09013v1
Machine Learning Who to Nudge: Causal vs Predictive Targeting in a Field Experiment on Student Financial Aid Renewal
than others, so that targeting interventions may be beneficial. We analyze the
value of targeting in the context of a large-scale field experiment with over
53,000 college students, where the goal was to use "nudges" to encourage
students to renew their financial-aid applications before a non-binding
deadline. We begin with baseline approaches to targeting. First, we target
based on a causal forest that estimates heterogeneous treatment effects and
then assigns students to treatment according to those estimated to have the
highest treatment effects. Next, we evaluate two alternative targeting
policies, one targeting students with low predicted probability of renewing
financial aid in the absence of the treatment, the other targeting those with
high probability. The predicted baseline outcome is not the ideal criterion for
targeting, nor is it a priori clear whether to prioritize low, high, or
intermediate predicted probability. Nonetheless, targeting on low baseline
outcomes is common in practice, for example because the relationship between
individual characteristics and treatment effects is often difficult or
impossible to estimate with historical data. We propose hybrid approaches that
incorporate the strengths of both predictive approaches (accurate estimation)
and causal approaches (correct criterion); we show that targeting intermediate
baseline outcomes is most effective in our specific application, while
targeting based on low baseline outcomes is detrimental. In one year of the
experiment, nudging all students improved early filing by an average of 6.4
percentage points over a baseline average of 37% filing, and we estimate that
targeting half of the students using our preferred policy attains around 75% of
this benefit.
arXiv link: http://arxiv.org/abs/2310.08672v2
Real-time Prediction of the Great Recession and the Covid-19 Recession
the Great Recession and the Covid-19 recession in the US in real time. It
examines the predictability of various macroeconomic and financial indicators
with respect to the NBER recession indicator. The findings strongly support the
use of penalized logistic regression models in recession forecasting. These
models, particularly the ridge logistic regression model, outperform the
standard logistic regression model in predicting the Great Recession in the US
across different forecast horizons. The study also confirms the traditional
significance of the term spread as an important recession indicator. However,
it acknowledges that the Covid-19 recession remains unpredictable due to the
unprecedented nature of the pandemic. The results are validated by creating a
recession indicator through principal component analysis (PCA) on selected
variables, which strongly correlates with the NBER recession indicator and is
less affected by publication lags.
arXiv link: http://arxiv.org/abs/2310.08536v5
Structural Vector Autoregressions and Higher Moments: Challenges and Solutions in Small Samples
conditions derived from independent shocks can be used to identify and estimate
the simultaneous interaction in structural vector autoregressions. This study
highlights two problems that arise when using these estimators in small
samples. First, imprecise estimates of the asymptotically efficient weighting
matrix and the asymptotic variance lead to volatile estimates and inaccurate
inference. Second, many moment conditions lead to a small sample scaling bias
towards innovations with a variance smaller than the normalizing unit variance
assumption. To address the first problem, I propose utilizing the assumption of
independent structural shocks to estimate the efficient weighting matrix and
the variance of the estimator. For the second issue, I propose incorporating a
continuously updated scaling term into the weighting matrix, eliminating the
scaling bias. To demonstrate the effectiveness of these measures, I conducted a
Monte Carlo simulation which shows a significant improvement in the performance
of the estimator.
arXiv link: http://arxiv.org/abs/2310.08173v1
Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects
the unobservable joint distribution between potential outcomes. Stratification
on pretreatment covariates can yield sharper bounds; however, unless the
covariates are discrete with relatively small support, this approach typically
requires binning covariates or estimating the conditional distributions of the
potential outcomes given the covariates. Binning can result in substantial
efficiency loss and become challenging to implement, even with a moderate
number of covariates. Estimating conditional distributions, on the other hand,
may yield invalid inference if the distributions are inaccurately estimated,
such as when a misspecified model is used or when the covariates are
high-dimensional. In this paper, we propose a unified and model-agnostic
inferential approach for a wide class of partially identified estimands. Our
method, based on duality theory for optimal transport problems, has four key
properties. First, in randomized experiments, our approach can wrap around any
estimates of the conditional distributions and provide uniformly valid
inference, even if the initial estimates are arbitrarily inaccurate. A simple
extension of our method to observational studies is doubly robust in the usual
sense. Second, if nuisance parameters are estimated at semiparametric rates,
our estimator is asymptotically unbiased for the sharp partial identification
bound. Third, we can apply the multiplier bootstrap to select covariates and
models without sacrificing validity, even if the true model is not selected.
Finally, our method is computationally efficient. Overall, in three empirical
applications, our method consistently reduces the width of estimated identified
sets and confidence intervals without making additional structural assumptions.
arXiv link: http://arxiv.org/abs/2310.08115v2
Inference for Nonlinear Endogenous Treatment Effects Accounting for High-Dimensional Covariate Complexity
using observational data. This paper proposes an inference procedure for a
nonlinear and endogenous marginal effect function, defined as the derivative of
the nonparametric treatment function, with a primary focus on an additive model
that includes high-dimensional covariates. Using the control function approach
for identification, we implement a regularized nonparametric estimation to
obtain an initial estimator of the model. Such an initial estimator suffers
from two biases: the bias in estimating the control function and the
regularization bias for the high-dimensional outcome model. Our key innovation
is to devise the double bias correction procedure that corrects these two
biases simultaneously. Building on this debiased estimator, we further provide
a confidence band of the marginal effect function. Simulations and an empirical
study of air pollution and migration demonstrate the validity of our
procedures.
arXiv link: http://arxiv.org/abs/2310.08063v3
Marital Sorting, Household Inequality and Selection
for married couples with both spouses working full time full year, and its
impact on household income inequality. We also investigate how marriage sorting
patterns have changed over this period. To determine the factors driving income
inequality we estimate a model explaining the joint distribution of wages which
accounts for the spouses' employment decisions. We find that income inequality
has increased for these households and increased assortative matching of wages
has exacerbated the inequality resulting from individual wage growth. We find
that positive sorting partially reflects the correlation across unobservables
influencing both members' of the marriage wages. We decompose the changes in
sorting patterns over the 47 years comprising our sample into structural,
composition and selection effects and find that the increase in positive
sorting primarily reflects the increased skill premia for both observed and
unobserved characteristics.
arXiv link: http://arxiv.org/abs/2310.07839v1
Integration or fragmentation? A closer look at euro area financial markets
To that end, we estimate overall and country-specific integration indices based
on a panel vector-autoregression with factor stochastic volatility. Our results
indicate a more heterogeneous bond market compared to the market for lending
rates. At both markets, the global financial crisis and the sovereign debt
crisis led to a severe decline in financial integration, which fully recovered
since then. We furthermore identify countries that deviate from their peers
either by responding differently to crisis events or by taking on different
roles in the spillover network. The latter analysis reveals two set of
countries, namely a main body of countries that receives and transmits
spillovers and a second, smaller group of spillover absorbing economies.
Finally, we demonstrate by estimating an augmented Taylor rule that euro area
short-term interest rates are positively linked to the level of integration on
the bond market.
arXiv link: http://arxiv.org/abs/2310.07790v1
Smoothness-Adaptive Dynamic Pricing with Nonparametric Demand Learning
nonparametric and H\"older smooth, and we focus on adaptivity to the unknown
H\"older smoothness parameter $\beta$ of the demand function. Traditionally the
optimal dynamic pricing algorithm heavily relies on the knowledge of $\beta$ to
achieve a minimax optimal regret of
$O(T^{\beta+1{2\beta+1}})$. However, we highlight the
challenge of adaptivity in this dynamic pricing problem by proving that no
pricing policy can adaptively achieve this minimax optimal regret without
knowledge of $\beta$. Motivated by the impossibility result, we propose a
self-similarity condition to enable adaptivity. Importantly, we show that the
self-similarity condition does not compromise the problem's inherent complexity
since it preserves the regret lower bound
$\Omega(T^{\beta+1{2\beta+1}})$. Furthermore, we develop a
smoothness-adaptive dynamic pricing algorithm and theoretically prove that the
algorithm achieves this minimax optimal regret bound without the prior
knowledge $\beta$.
arXiv link: http://arxiv.org/abs/2310.07558v2
Identification and Estimation of a Semiparametric Logit Model using Network Data
binary network model in which the unobserved social characteristic is
endogenous, that is, the unobserved individual characteristic influences both
the binary outcome of interest and how links are formed within the network. The
exact functional form of the latent social characteristic is not known. The
proposed estimators are obtained based on matching pairs of agents whose
network formation distributions are the same. The consistency and the
asymptotic distribution of the estimators are proposed. The finite sample
properties of the proposed estimators in a Monte-Carlo simulation are assessed.
We conclude this study with an empirical application.
arXiv link: http://arxiv.org/abs/2310.07151v2
Treatment Choice, Mean Square Regret and Partial Identification
welfare is only partially identified from data. We contribute to the literature
by anchoring our finite-sample analysis on mean square regret, a decision
criterion advocated by Kitagawa, Lee, and Qiu (2022). We find that optimal
rules are always fractional, irrespective of the width of the identified set
and precision of its estimate. The optimal treatment fraction is a simple
logistic transformation of the commonly used t-statistic multiplied by a factor
calculated by a simple constrained optimization. This treatment fraction gets
closer to 0.5 as the width of the identified set becomes wider, implying the
decision maker becomes more cautious against the adversarial Nature.
arXiv link: http://arxiv.org/abs/2310.06242v1
Robust Minimum Distance Inference in Structural Models
interest, which is robust to the lack of identification of other structural
nuisance parameters. Some choices of the weighting matrix lead to asymptotic
chi-squared distributions with degrees of freedom that can be consistently
estimated from the data, even under partial identification. In any case,
knowledge of the level of under-identification is not required. We study the
power of our robust test. Several examples show the wide applicability of the
procedure and a Monte Carlo investigates its finite sample performance. Our
identification-robust inference method can be applied to make inferences on
both calibrated (fixed) parameters and any other structural parameter of
interest. We illustrate the method's usefulness by applying it to a structural
model on the non-neutrality of monetary policy, as in nakamura2018high,
where we empirically evaluate the validity of the calibrated parameters and we
carry out robust inference on the slope of the Phillips curve and the
information effect.
arXiv link: http://arxiv.org/abs/2310.05761v1
Identification and Estimation in a Class of Potential Outcomes Models
three main features: (i) Unobserved heterogeneity can be represented by a
vector of potential outcomes and a type describing the manner in which an
instrument determines the choice of treatment; (ii) The availability of an
instrumental variable that is conditionally independent of unobserved
heterogeneity; and (iii) The imposition of convex restrictions on the
distribution of unobserved heterogeneity. The proposed class of models
encompasses multiple classical and novel research designs, yet possesses a
common structure that permits a unifying analysis of identification and
estimation. In particular, we establish that these models share a common
necessary and sufficient condition for identifying certain causal parameters.
Our identification results are constructive in that they yield estimating
moment conditions for the parameters of interest. Focusing on a leading special
case of our framework, we further show how these estimating moment conditions
may be modified to be doubly robust. The corresponding double robust estimators
are shown to be asymptotically normally distributed, bootstrap based inference
is shown to be asymptotically valid, and the semi-parametric efficiency bound
is derived for those parameters that are root-n estimable. We illustrate the
usefulness of our results for developing, identifying, and estimating causal
models through an empirical evaluation of the role of mental health as a
mediating variable in the Moving To Opportunity experiment.
arXiv link: http://arxiv.org/abs/2310.05311v1
On changepoint detection in functional data using empirical energy distance
changepoints in a sequence of dependent, possibly multivariate,
functional-valued observations. Our approach allows to test for a very general
class of changepoints, including the "classical" case of changes in the mean,
and even changes in the whole distribution. Our statistics are based on a
generalisation of the empirical energy distance; we propose weighted
functionals of the energy distance process, which are designed in order to
enhance the ability to detect breaks occurring at sample endpoints. The
limiting distribution of the maximally selected version of our statistics
requires only the computation of the eigenvalues of the covariance function,
thus being readily implementable in the most commonly employed packages, e.g.
R. We show that, under the alternative, our statistics are able to detect
changepoints occurring even very close to the beginning/end of the sample. In
the presence of multiple changepoints, we propose a binary segmentation
algorithm to estimate the number of breaks and the locations thereof.
Simulations show that our procedures work very well in finite samples. We
complement our theory with applications to financial and temperature data.
arXiv link: http://arxiv.org/abs/2310.04853v1
Challenges in Statistically Rejecting the Perfect Competition Hypothesis Using Imperfect Competition Data
perfect competition is challenging, known as a common problem in the
literature. We also assess the finite sample performance of the conduct
parameter test in homogeneous goods markets, showing that statistical power
increases with the number of markets, a larger conduct parameter, and a
stronger demand rotation instrument. However, even with a moderate number of
markets and five firms, rejecting the null hypothesis of perfect competition
remains difficult, irrespective of instrument strength or the use of optimal
instruments. Our findings suggest that empirical results failing to reject
perfect competition are due to the limited number of markets rather than
methodological shortcomings.
arXiv link: http://arxiv.org/abs/2310.04576v4
Cutting Feedback in Misspecified Copula Models
separately. We treat these as two modules in a modular Bayesian inference
framework, and propose conducting modified Bayesian inference by "cutting
feedback". Cutting feedback limits the influence of potentially misspecified
modules in posterior inference. We consider two types of cuts. The first limits
the influence of a misspecified copula on inference for the marginals, which is
a Bayesian analogue of the popular Inference for Margins (IFM) estimator. The
second limits the influence of misspecified marginals on inference for the
copula parameters by using a pseudo likelihood of the ranks to define the cut
model. We establish that if only one of the modules is misspecified, then the
appropriate cut posterior gives accurate uncertainty quantification
asymptotically for the parameters in the other module. Computation of the cut
posteriors is difficult, and new variational inference methods to do so are
proposed. The efficacy of the new methodology is demonstrated using both
simulated data and a substantive multivariate time series application from
macroeconomic forecasting. In the latter, cutting feedback from misspecified
marginals to a 1096 dimension copula improves posterior inference and
predictive accuracy greatly, compared to conventional Bayesian inference.
arXiv link: http://arxiv.org/abs/2310.03521v2
Variational Inference for GARCH-family Models
through Monte Carlo sampling. Variational Inference is gaining popularity and
attention as a robust approach for Bayesian inference in complex machine
learning models; however, its adoption in econometrics and finance is limited.
This paper discusses the extent to which Variational Inference constitutes a
reliable and feasible alternative to Monte Carlo sampling for Bayesian
inference in GARCH-like models. Through a large-scale experiment involving the
constituents of the S&P 500 index, several Variational Inference optimizers, a
variety of volatility models, and a case study, we show that Variational
Inference is an attractive, remarkably well-calibrated, and competitive method
for Bayesian learning.
arXiv link: http://arxiv.org/abs/2310.03435v1
Moran's I Lasso for models with spatially correlated data
in the Moran statistic to develop a selection procedure called Moran's I Lasso
(Mi-Lasso) to solve the Eigenvector Spatial Filtering (ESF) eigenvector
selection problem. ESF uses a subset of eigenvectors from a spatial weights
matrix to efficiently account for any omitted cross-sectional correlation terms
in a classical linear regression framework, thus does not require the
researcher to explicitly specify the spatial part of the underlying structural
model. We derive performance bounds and show the necessary conditions for
consistent eigenvector selection. The key advantages of the proposed estimator
are that it is intuitive, theoretically grounded, and substantially faster than
Lasso based on cross-validation or any proposed forward stepwise procedure. Our
main simulation results show the proposed selection procedure performs well in
finite samples. Compared to existing selection procedures, we find Mi-Lasso has
one of the smallest biases and mean squared errors across a range of sample
sizes and levels of spatial correlation. An application on house prices further
demonstrates Mi-Lasso performs well compared to existing procedures.
arXiv link: http://arxiv.org/abs/2310.02773v1
Sharp and Robust Estimation of Partially Identified Discrete Response Models
practical applications. While these models are point identified in the presence
of continuous covariates, they can become partially identified when covariates
are discrete. In this paper we find that classical estimators, including the
maximum score estimator, (Manski (1975)), loose their attractive statistical
properties without point identification. First of all, they are not sharp with
the estimator converging to an outer region of the identified set, (Komarova
(2013)), and in many discrete designs it weakly converges to a random set.
Second, they are not robust, with their distribution limit discontinuously
changing with respect to the parameters of the model. We propose a novel class
of estimators based on the concept of a quantile of a random set, which we show
to be both sharp and robust. We demonstrate that our approach extends from
cross-sectional settings to classical static and dynamic discrete panel data
models.
arXiv link: http://arxiv.org/abs/2310.02414v4
fmeffects: An R Package for Forward Marginal Effects
effective model-agnostic interpretation method particularly suited for
non-linear and non-parametric prediction models. They provide comprehensible
model explanations of the form: if we change feature values by a pre-specified
step size, what is the change in the predicted outcome? We present the R
package fmeffects, the first software implementation of the theory surrounding
forward marginal effects. The relevant theoretical background, package
functionality and handling, as well as the software design and options for
future extensions are discussed in this paper.
arXiv link: http://arxiv.org/abs/2310.02008v2
Specification testing with grouped fixed effects
heterogeneity in both linear and nonlinear fixed-effects panel data models. The
null hypothesis is that heterogeneity is either time-invariant or,
symmetrically, described by homogeneous time effects. We contrast the standard
one-way fixed-effects estimator with the recently developed two-way grouped
fixed-effects estimator, that is consistent in the presence of time-varying
heterogeneity (or heterogeneous time effects) under minimal specification and
distributional assumptions for the unobserved effects. The Hausman test
compares jackknife corrected estimators, removing the leading term of the
incidental parameters and approximation biases, and exploits bootstrap to
obtain the variance of the vector of contrasts. We provide Monte Carlo evidence
on the size and power properties of the test and illustrate its application in
two empirical settings.
arXiv link: http://arxiv.org/abs/2310.01950v3
Impact of Economic Uncertainty, Geopolitical Risk, Pandemic, Financial & Macroeconomic Factors on Crude Oil Returns -- An Empirical Investigation
impact of macroeconomic and financial uncertainty including global pandemic,
geopolitical risk on the futures returns of crude oil (ROC). The data for this
study is sourced from the FRED (Federal Reserve Economic Database) economic
dataset; the importance of the factors have been validated by using variation
inflation factor (VIF) and principal component analysis (PCA). To fully
understand the combined effect of these factors on WTI, study includes
interaction terms in the multi-factor model. Empirical results suggest that
changes in ROC can have varying impacts depending on the specific period and
market conditions. The results can be used for informed investment decisions
and to construct portfolios that are well-balanced in terms of risk and return.
Structural breaks, such as changes in global economic conditions or shifts in
demand for crude oil, can cause return on crude oil to be sensitive to changes
in different time periods. The unique aspect ness of this study also lies in
its inclusion of explanatory factors related to the pandemic, geopolitical
risk, and inflation.
arXiv link: http://arxiv.org/abs/2310.01123v2
Multi-period static hedging of European options
asset follows a single-factor Markovian framework. By working in such a
setting, Carr and Wu carr2014static derived a spanning relation between
a given option and a continuum of shorter-term options written on the same
asset. In this paper, we have extended their approach to simultaneously include
options over multiple short maturities. We then show a practical implementation
of this with a finite set of shorter-term options to determine the hedging
error using a Gaussian Quadrature method. We perform a wide range of
experiments for both the Black-Scholes and Merton Jump
Diffusion models, illustrating the comparative performance of the two methods.
arXiv link: http://arxiv.org/abs/2310.01104v3
Semidiscrete optimal transport with unknown costs
classical transportation problem in linear programming. The goal is to design a
joint distribution for two random variables (one continuous, one discrete) with
fixed marginals, in a way that minimizes expected cost. We formulate a novel
variant of this problem in which the cost functions are unknown, but can be
learned through noisy observations; however, only one function can be sampled
at a time. We develop a semi-myopic algorithm that couples online learning with
stochastic approximation, and prove that it achieves optimal convergence rates,
despite the non-smoothness of the stochastic gradient and the lack of strong
concavity in the objective function.
arXiv link: http://arxiv.org/abs/2310.00786v3
CausalGPS: An R Package for Causal Inference With Continuous Exposures
interest is critical for social, economic, health, and medical research.
However, most existing software packages focus on binary exposures. We develop
the CausalGPS R package that implements a collection of algorithms to provide
algorithmic solutions for causal inference with continuous exposures. CausalGPS
implements a causal inference workflow, with algorithms based on generalized
propensity scores (GPS) as the core, extending propensity scores (the
probability of a unit being exposed given pre-exposure covariates) from binary
to continuous exposures. As the first step, the package implements efficient
and flexible estimations of the GPS, allowing multiple user-specified modeling
options. As the second step, the package provides two ways to adjust for
confounding: weighting and matching, generating weighted and matched data sets,
respectively. Lastly, the package provides built-in functions to fit flexible
parametric, semi-parametric, or non-parametric regression models on the
weighted or matched data to estimate the exposure-response function relating
the outcome with the exposures. The computationally intensive tasks are
implemented in C++, and efficient shared-memory parallelization is achieved by
OpenMP API. This paper outlines the main components of the CausalGPS R package
and demonstrates its application to assess the effect of long-term exposure to
PM2.5 on educational attainment using zip code-level data from the contiguous
United States from 2000-2016.
arXiv link: http://arxiv.org/abs/2310.00561v1
On Sinkhorn's Algorithm and Choice Modeling
data based on Luce's choice axiom, including the Bradley--Terry--Luce and
Plackett--Luce models, we show that the associated maximum likelihood
estimation problems are equivalent to a classic matrix balancing problem with
target row and column sums. This perspective opens doors between two seemingly
unrelated research areas, and allows us to unify existing algorithms in the
choice modeling literature as special instances or analogs of Sinkhorn's
celebrated algorithm for matrix balancing. We draw inspirations from these
connections and resolve some open problems on the study of Sinkhorn's
algorithm. We establish the global linear convergence of Sinkhorn's algorithm
for non-negative matrices whenever finite scaling matrices exist, and
characterize its linear convergence rate in terms of the algebraic connectivity
of a weighted bipartite graph. We further derive the sharp asymptotic rate of
linear convergence, which generalizes a classic result of Knight (2008). To our
knowledge, these are the first quantitative linear convergence results for
Sinkhorn's algorithm for general non-negative matrices and positive marginals.
Our results highlight the importance of connectivity and orthogonality
structures in matrix balancing and Sinkhorn's algorithm, which could be of
independent interest. More broadly, the connections we establish in this paper
between matrix balancing and choice modeling could also help motivate further
transmission of ideas and lead to interesting results in both disciplines.
arXiv link: http://arxiv.org/abs/2310.00260v2
Identification, Impacts, and Opportunities of Three Common Measurement Considerations when using Digital Trace Data
new best practice for measuring media use and content consumption. Despite the
apparent accuracy that comes with greater granularity, however, digital traces
may introduce additional ambiguity and new errors into the measurement of media
use. In this note, we identify three new measurement challenges when using
Digital Trace Data that were recently uncovered using a new measurement
framework - Screenomics - that records media use at the granularity of
individual screenshots obtained every few seconds as people interact with
mobile devices. We label the considerations as follows: (1) entangling - the
common measurement error introduced by proxying exposure to content by exposure
to format; (2) flattening - aggregating unique segments of media interaction
without incorporating temporal information, most commonly intraindividually and
(3) bundling - summation of the durations of segments of media interaction,
indiscriminate with respect to variations across media segments.
arXiv link: http://arxiv.org/abs/2310.00197v1
Smoothing the Nonsmoothness
nonsmooth functions, we propose a sequence of infinitely differentiable
functions to approximate the nonsmooth function under consideration. A rate of
approximation is established and an illustration of its application is then
provided.
arXiv link: http://arxiv.org/abs/2309.16348v1
Causal Panel Analysis under Parallel Trends: Lessons from a Large Reanalysis Study
establish causality, but recent methodological discussions highlight their
limitations under heterogeneous treatment effects (HTE) and violations of the
parallel trends (PT) assumption. This growing literature has introduced
numerous new estimators and procedures, causing confusion among researchers
about the reliability of existing results and best practices. To address these
concerns, we replicated and reanalyzed 49 studies from leading journals using
TWFE models for observational panel data with binary treatments. Using six
HTE-robust estimators, diagnostic tests, and sensitivity analyses, we find: (i)
HTE-robust estimators yield qualitatively similar but highly variable results;
(ii) while a few studies show clear signs of PT violations, many lack evidence
to support this assumption; and (iii) many studies are underpowered when
accounting for HTE and potential PT violations. We emphasize the importance of
strong research designs and rigorous validation of key identifying assumptions.
arXiv link: http://arxiv.org/abs/2309.15983v6
Sluggish news reactions: A combinatorial approach for synchronizing stock jumps
delays. Econometricians typically treat these sluggish reactions as
microstructure effects and settle for a coarse sampling grid to guard against
them. Synchronizing mistimed stock returns on a fine sampling grid allows us to
automatically detect noisy jumps and better approximate the true common jumps
in related stock prices.
arXiv link: http://arxiv.org/abs/2309.15705v1
Double machine learning and design in batch adaptive experiments
subjects per batch. First, we propose a semiparametric treatment effect
estimator that efficiently pools information across the batches, and show it
asymptotically dominates alternatives that aggregate single batch estimates.
Then, we consider the design problem of learning propensity scores for
assigning treatment in the later batches of the experiment to maximize the
asymptotic precision of this estimator. For two common causal estimands, we
estimate this precision using observations from previous batches, and then
solve a finite-dimensional concave maximization problem to adaptively learn
flexible propensity scores that converge to suitably defined optima in each
batch at rate $O_p(N^{-1/4})$. By extending the framework of double machine
learning, we show this rate suffices for our pooled estimator to attain the
targeted precision after each batch, as long as nuisance function estimates
converge at rate $o_p(N^{-1/4})$. These relatively weak rate requirements
enable the investigator to avoid the common practice of discretizing the
covariate space for design and estimation in batch adaptive experiments while
maintaining the advantages of pooling. Our numerical study shows that such
discretization often leads to substantial asymptotic and finite sample
precision losses outweighing any gains from design.
arXiv link: http://arxiv.org/abs/2309.15297v1
Free Discontinuity Regression: With an Application to the Economic Effects of Internet Shutdowns
whose locations and magnitudes are unknown-arise in settings as varied as
gene-expression profiling, financial covariance breaks, climate-regime
detection, and urban socioeconomic mapping. Despite their prevalence, there are
no current approaches that jointly estimate the location and size of the
discontinuity set in a one-shot approach with statistical guarantees. We
therefore introduce Free Discontinuity Regression (FDR), a fully nonparametric
estimator that simultaneously (i) smooths a regression surface, (ii) segments
it into contiguous regions, and (iii) provably recovers the precise locations
and sizes of its jumps. By extending a convex relaxation of the Mumford-Shah
functional to random spatial sampling and correlated noise, FDR overcomes the
fixed-grid and i.i.d. noise assumptions of classical image-segmentation
approaches, thus enabling its application to real-world data of any dimension.
This yields the first identification and uniform consistency results for
multivariate jump surfaces: under mild SBV regularity, the estimated function,
its discontinuity set, and all jump sizes converge to their true population
counterparts. Hyperparameters are selected automatically from the data using
Stein's Unbiased Risk Estimate, and large-scale simulations up to three
dimensions validate the theoretical results and demonstrate good finite-sample
performance. Applying FDR to an internet shutdown in India reveals a 25-35%
reduction in economic activity around the estimated shutdown boundaries-much
larger than previous estimates. By unifying smoothing, segmentation, and
effect-size recovery in a general statistical setting, FDR turns
free-discontinuity ideas into a practical tool with formal guarantees for
modern multivariate data.
arXiv link: http://arxiv.org/abs/2309.14630v3
Assessing Utility of Differential Privacy for RCTs
the impact of interventions and policies in many contexts. They are considered
the gold-standard for inference in the biomedical fields and in many social
sciences. Researchers have published an increasing number of studies that rely
on RCTs for at least part of the inference, and these studies typically include
the response data collected, de-identified and sometimes protected through
traditional disclosure limitation methods. In this paper, we empirically assess
the impact of strong privacy-preservation methodology (with DP
guarantees), on published analyses from RCTs, leveraging the availability of
replication packages (research compendia) in economics and policy analysis. We
provide simulations studies and demonstrate how we can replicate the analysis
in a published economics article on privacy-protected data under various
parametrizations. We find that relatively straightforward DP-based methods
allow for inference-valid protection of the published data, though
computational issues may limit more complex analyses from using these methods.
The results have applicability to researchers wishing to share RCT data,
especially in the context of low- and middle-income countries, with strong
privacy protection.
arXiv link: http://arxiv.org/abs/2309.14581v1
Unified Inference for Dynamic Quantile Predictive Regression
quantile predictive regressions which is useful when examining quantile
predictability in stock returns under possible presence of nonstationarity.
arXiv link: http://arxiv.org/abs/2309.14160v2
Nonparametric estimation of conditional densities by generalized random forests
vector X, I propose a nonparametric estimator f^(.|x) for the conditional
density of Y given X=x. This estimator takes the form of an exponential series
whose coefficients Tx = (Tx1,...,TxJ) are the solution of a system of nonlinear
equations that depends on an estimator of the conditional expectation
E[p(Y)|X=x], where p is a J-dimensional vector of basis functions. The
distinguishing feature of the proposed estimator is that E[p(Y)|X=x] is
estimated by generalized random forest (Athey, Tibshirani, and Wager, Annals of
Statistics, 2019), targeting the heterogeneity of Tx across x. I show that
f^(.|x) is uniformly consistent and asymptotically normal, allowing J to grow
to infinity. I also provide a standard error formula to construct
asymptotically valid confidence intervals. Results from Monte Carlo experiments
are provided.
arXiv link: http://arxiv.org/abs/2309.13251v4
Nonparametric mixed logit model with market-level parameters estimated from market share data
market-level choice share data. The model treats each market as an agent and
represents taste heterogeneity through market-specific parameters by solving a
multiagent inverse utility maximization problem, addressing the limitations of
existing market-level choice models with parametric estimation. A simulation
study is conducted to evaluate the performance of our model in terms of
estimation time, estimation accuracy, and out-of-sample predictive accuracy. In
a real data application, we estimate the travel mode choice of 53.55 million
trips made by 19.53 million residents in New York State. These trips are
aggregated based on population segments and census block group-level
origin-destination (OD) pairs, resulting in 120,740 markets. We benchmark our
model against multinomial logit (MNL), nested logit (NL), inverse product
differentiation logit (IPDL), and the BLP models. The results show that the
proposed model improves the out-of-sample accuracy from 65.30% to 81.78%, with
a computation time less than one-tenth of that taken to estimate the BLP model.
The price elasticities and diversion ratios retrieved from our model and
benchmark models exhibit similar substitution patterns. Moreover, the
market-level parameters estimated by our model provide additional insights and
facilitate their seamless integration into supply-side optimization models for
transportation design. By measuring the compensating variation for the driving
mode, we found that a $9 congestion toll would impact roughly 60 % of the total
travelers. As an application of supply-demand integration, we showed that a 50%
discount of transit fare could bring a maximum ridership increase of 9402 trips
per day under a budget of $50,000 per day.
arXiv link: http://arxiv.org/abs/2309.13159v3
Optimal Conditional Inference in Adaptive Experiments
conditional on the realized stopping time, assignment probabilities, and target
parameter, where all of these may be chosen adaptively using information up to
the last batch of the experiment. Absent further restrictions on the
experiment, we show that inference using only the results of the last batch is
optimal. When the adaptive aspects of the experiment are known to be
location-invariant, in the sense that they are unchanged when we shift all
batch-arm means by a constant, we show that there is additional information in
the data, captured by one additional linear function of the batch-arm means. In
the more restrictive case where the stopping time, assignment probabilities,
and target parameter are known to depend on the data only through a collection
of polyhedral events, we derive computationally tractable and optimal
conditional inference procedures.
arXiv link: http://arxiv.org/abs/2309.12162v1
A detection analysis for temporal memory patterns at different time-scales
time-series dependence patterns. A customized statistical test detects memory
dependence in event sequences by analyzing their inter-event time
distributions. Synthetic experiments based on the renewal-aging property assess
the impact of observer latency on the renewal property. Our test uncovers
memory patterns across diverse time scales, emphasizing the event sequence's
probability structure beyond correlations. The time series analysis produces a
statistical test and graphical plots which helps to detect dependence patterns
among events at different time-scales if any. Furthermore, the test evaluates
the renewal assumption through aging experiments, offering valuable
applications in time-series analysis within economics.
arXiv link: http://arxiv.org/abs/2309.12034v1
Transformers versus LSTMs for electronic trading
(LSTM), one kind of recurrent neural network (RNN), has been widely applied in
time series prediction.
Like RNN, Transformer is designed to handle the sequential data. As
Transformer achieved great success in Natural Language Processing (NLP),
researchers got interested in Transformer's performance on time series
prediction, and plenty of Transformer-based solutions on long time series
forecasting have come out recently. However, when it comes to financial time
series prediction, LSTM is still a dominant architecture. Therefore, the
question this study wants to answer is: whether the Transformer-based model can
be applied in financial time series prediction and beat LSTM.
To answer this question, various LSTM-based and Transformer-based models are
compared on multiple financial prediction tasks based on high-frequency limit
order book data. A new LSTM-based model called DLSTM is built and new
architecture for the Transformer-based model is designed to adapt for financial
prediction. The experiment result reflects that the Transformer-based model
only has the limited advantage in absolute price sequence prediction. The
LSTM-based models show better and more robust performance on difference
sequence prediction, such as price difference and price movement.
arXiv link: http://arxiv.org/abs/2309.11400v1
Identifying Causal Effects in Information Provision Experiments
effects. Since standard TSLS and panel specifications in information provision
experiments have weights proportional to belief updating in the first-stage,
this dependence attenuates existing estimates. This is natural if people whose
decisions depend on their beliefs gather information before the experiment. I
propose a local least squares estimator that identifies unweighted average
effects in several classes of experiments under progressively stronger versions
of Bayesian updating. In five of six recent studies, average effects are larger
than-in several cases more than double-estimates in standard specifications.
arXiv link: http://arxiv.org/abs/2309.11387v4
require: Package dependencies for reproducible research
lack of version control for community-contributed packages. This article
introduces the require command, a tool designed to ensure Stata package
dependencies are compatible across users and computer systems. Given a list of
Stata packages, require verifies that each package is installed, checks for a
minimum or exact version or package release date, and optionally installs the
package if prompted by the researcher.
arXiv link: http://arxiv.org/abs/2309.11058v2
Correcting Sample Selection Bias in PISA Rankings
cross-country comparisons based on international assessments such as the
Programme for International Student Assessment (PISA). Although PISA is widely
used to benchmark educational performance across countries, it samples only
students who remain enrolled in school at age 15. This introduces survival
bias, particularly in countries with high dropout rates, potentially leading to
distorted comparisons. To correct for this bias, I develop a simple adjustment
of the classical Heckman selection model tailored to settings with fully
truncated outcome data. My approach exploits the joint normality of latent
errors and leverages information on the selection rate, allowing identification
of the counterfactual mean outcome for the full population of 15-year-olds.
Applying this method to PISA 2018 data, I show that adjusting for selection
bias results in substantial changes in country rankings based on average
performance. These results highlight the importance of accounting for
non-random sample selection to ensure accurate and policy-relevant
international comparisons of educational outcomes.
arXiv link: http://arxiv.org/abs/2309.10642v6
Regressing on distributions: The nonlinear effect of temperature on regional economic growth
for the situation where certain explanatory variables are available at a higher
temporal resolution than the dependent variable. The main idea is to use the
moments of the empirical distribution of these variables to construct
regressors with the correct resolution. As the moments are likely to display
nonlinear marginal and interaction effects, an artificial neural network
regression function is proposed. The corresponding model operates within the
traditional stochastic nonlinear least squares framework. In particular, a
numerical Hessian is employed to calculate confidence intervals. The practical
usefulness is demonstrated by analyzing the influence of daily temperatures in
260 European NUTS2 regions on the yearly growth of gross value added in these
regions in the time period 2000 to 2021. In the particular example, the model
allows for an appropriate assessment of regional economic impacts resulting
from (future) changes in the regional temperature distribution (mean AND
variance).
arXiv link: http://arxiv.org/abs/2309.10481v1
Bounds on Average Effects in Discrete Choice Panel Data Models
for quantifying the effect of covariates, and for policy evaluation and
counterfactual analysis. This task is challenging in short panels with
individual-specific effects due to partial identification and the incidental
parameter problem. In particular, estimation of the sharp identified set is
practically infeasible at realistic sample sizes whenever the number of support
points of the observed covariates is large, such as when the covariates are
continuous. In this paper, we therefore propose estimating outer bounds on the
identified set of average effects. Our bounds are easy to construct, converge
at the parametric rate, and are computationally simple to obtain even in
moderately large samples, independent of whether the covariates are discrete or
continuous. We also provide asymptotically valid confidence intervals on the
identified set.
arXiv link: http://arxiv.org/abs/2309.09299v3
Optimal Estimation under a Semiparametric Density Ratio Model
samples from various interconnected populations that undeniably exhibit common
latent structures. Utilizing a model that incorporates these latent structures
for such data enhances the efficiency of inferences. Recently, many researchers
have been adopting the semiparametric density ratio model (DRM) to address the
presence of latent structures. The DRM enables estimation of each population
distribution using pooled data, resulting in statistically more efficient
estimations in contrast to nonparametric methods that analyze each sample in
isolation. In this article, we investigate the limit of the efficiency
improvement attainable through the DRM. We focus on situations where one
population's sample size significantly exceeds those of the other populations.
In such scenarios, we demonstrate that the DRM-based inferences for populations
with smaller sample sizes achieve the highest attainable asymptotic efficiency
as if a parametric model is assumed. The estimands we consider include the
model parameters, distribution functions, and quantiles. We use simulation
experiments to support the theoretical findings with a specific focus on
quantile estimation. Additionally, we provide an analysis of real revenue data
from U.S. collegiate sports to illustrate the efficacy of our contribution.
arXiv link: http://arxiv.org/abs/2309.09103v1
Least squares estimation in nonstationary nonlinear cohort panels with learning from experience
cohort panels with learning from experience, showing, inter alia, the
consistency and asymptotic normality of the nonlinear least squares estimator
used in empirical practice. Potential pitfalls for hypothesis testing are
identified and solutions proposed. Monte Carlo simulations verify the
properties of the estimator and corresponding test statistics in finite
samples, while an application to a panel of survey expectations demonstrates
the usefulness of the theory developed.
arXiv link: http://arxiv.org/abs/2309.08982v4
Total-effect Test May Erroneously Reject So-called "Full" or "Complete" Mediation
independent variable X affects a dependent variable Y through some mediator M,
has been under debate. The classic causal steps require that a "total effect"
be significant, now also known as statistically acknowledged. It has been shown
that the total-effect test can erroneously reject competitive mediation and is
superfluous for establishing complementary mediation. Little is known about the
last type, indirect-only mediation, aka "full" or "complete" mediation, in
which the indirect (ab) path passes the statistical partition test while the
direct-and-remainder (d) path fails. This study 1) provides proof that the
total-effect test can erroneously reject indirect-only mediation, including
both sub-types, assuming least square estimation (LSE) F-test or Sobel test; 2)
provides a simulation to duplicate the mathematical proofs and extend the
conclusion to LAD-Z test; 3) provides two real-data examples, one for each
sub-type, to illustrate the mathematical conclusion; 4) in view of the
mathematical findings, proposes to revisit concepts, theories, and techniques
of mediation analysis and other causal dissection analyses, and showcase a more
comprehensive alternative, process-and-product analysis (PAPA).
arXiv link: http://arxiv.org/abs/2309.08910v2
Adaptive Neyman Allocation
practice of allocating units into treated and control groups, potentially in
unequal numbers proportional to their respective standard deviations, with the
objective of minimizing the variance of the treatment effect estimator. This
widely recognized approach increases statistical power in scenarios where the
treated and control groups have different standard deviations, as is often the
case in social experiments, clinical trials, marketing research, and online A/B
testing. However, Neyman allocation cannot be implemented unless the standard
deviations are known in advance. Fortunately, the multi-stage nature of the
aforementioned applications allows the use of earlier stage observations to
estimate the standard deviations, which further guide allocation decisions in
later stages. In this paper, we introduce a competitive analysis framework to
study this multi-stage experimental design problem. We propose a simple
adaptive Neyman allocation algorithm, which almost matches the
information-theoretic limit of conducting experiments. We provide theory for
estimation and inference using data collected from our adaptive Neyman
allocation algorithm. We demonstrate the effectiveness of our adaptive Neyman
allocation algorithm using both online A/B testing data from a social media
site and synthetic data.
arXiv link: http://arxiv.org/abs/2309.08808v4
Ordered Correlation Forest
outcomes with inherent ordering, such as self-evaluations of subjective
well-being and self-assessments in health domains. While ordered choice models,
such as the ordered logit and ordered probit, are popular tools for analyzing
these outcomes, they may impose restrictive parametric and distributional
assumptions. This paper introduces a novel estimator, the ordered correlation
forest, that can naturally handle non-linearities in the data and does not
assume a specific error term distribution. The proposed estimator modifies a
standard random forest splitting criterion to build a collection of forests,
each estimating the conditional probability of a single class. Under an
"honesty" condition, predictions are consistent and asymptotically normal. The
weights induced by each forest are used to obtain standard errors for the
predicted probabilities and the covariates' marginal effects. Evidence from
synthetic data shows that the proposed estimator features a superior prediction
performance than alternative forest-based estimators and demonstrates its
ability to construct valid confidence intervals for the covariates' marginal
effects.
arXiv link: http://arxiv.org/abs/2309.08755v1
Fixed-b Asymptotics for Panel Models with Two-Way Clustering
Hansen and Sasaki (2024) for linear panels. First, we show algebraically that
this variance estimator (CHS estimator, hereafter) is a linear combination of
three common variance estimators: the one-way unit cluster estimator, the "HAC
of averages" estimator, and the "average of HACs" estimator. Based on this
finding, we obtain a fixed-$b$ asymptotic result for the CHS estimator and
corresponding test statistics as the cross-section and time sample sizes
jointly go to infinity. Furthermore, we propose two simple bias-corrected
versions of the variance estimator and derive the fixed-$b$ limits. In a
simulation study, we find that the two bias-corrected variance estimators along
with fixed-$b$ critical values provide improvements in finite sample coverage
probabilities. We illustrate the impact of bias-correction and use of the
fixed-$b$ critical values on inference in an empirical example on the
relationship between industry profitability and market concentration.
arXiv link: http://arxiv.org/abs/2309.08707v4
Causal inference in network experiments: regression-based analysis and design-based properties
avoid endogeneity by randomly assigning treatments to units over networks.
However, it is non-trivial to analyze network experiments properly without
imposing strong modeling assumptions. We show that regression-based point
estimators and standard errors can have strong theoretical guarantees if the
regression functions and robust standard errors are carefully specified to
accommodate the interference patterns under network experiments. We first
recall a well-known result that the H\'ajek estimator is numerically identical
to the coefficient from the weighted-least-squares fit based on the inverse
probability of the exposure mapping. Moreover, we demonstrate that the
regression-based approach offers three notable advantages: its ease of
implementation, the ability to derive standard errors through the same
regression fit, and the potential to integrate covariates into the analysis to
improve efficiency. Recognizing that the regression-based network-robust
covariance estimator can be anti-conservative under nonconstant effects, we
propose an adjusted covariance estimator to improve the empirical coverage
rates.
arXiv link: http://arxiv.org/abs/2309.07476v3
From Deep Filtering to Deep Econometrics
management. However, it is made difficult by market microstructure noise.
Particle filtering has been proposed to solve this problem as it favorable
statistical properties, but relies on assumptions about underlying market
dynamics. Machine learning methods have also been proposed but lack
interpretability, and often lag in performance. In this paper we implement the
SV-PF-RNN: a hybrid neural network and particle filter architecture. Our
SV-PF-RNN is designed specifically with stochastic volatility estimation in
mind. We then show that it can improve on the performance of a basic particle
filter.
arXiv link: http://arxiv.org/abs/2311.06256v1
Stochastic Learning of Semiparametric Monotone Index Models with Large Sample Size
scenario where the number of observation points $n$ is extremely large and
conventional approaches fail to work due to heavy computational burdens.
Motivated by the mini-batch gradient descent algorithm (MBGD) that is widely
used as a stochastic optimization tool in the machine learning field, I
proposes a novel subsample- and iteration-based estimation procedure. In
particular, starting from any initial guess of the true parameter, I
progressively update the parameter using a sequence of subsamples randomly
drawn from the data set whose sample size is much smaller than $n$. The update
is based on the gradient of some well-chosen loss function, where the
nonparametric component is replaced with its Nadaraya-Watson kernel estimator
based on subsamples. My proposed algorithm essentially generalizes MBGD
algorithm to the semiparametric setup. Compared with full-sample-based method,
the new method reduces the computational time by roughly $n$ times if the
subsample size and the kernel function are chosen properly, so can be easily
applied when the sample size $n$ is large. Moreover, I show that if I further
conduct averages across the estimators produced during iterations, the
difference between the average estimator and full-sample-based estimator will
be $1/n$-trivial. Consequently, the average estimator is
$1/n$-consistent and asymptotically normally distributed. In other
words, the new estimator substantially improves the computational speed, while
at the same time maintains the estimation accuracy.
arXiv link: http://arxiv.org/abs/2309.06693v2
Sensitivity Analysis for Linear Estimators
identification failures that can be viewed as seeing the wrong outcome
distribution. Our approach measures the degree of identification failure
through the change in measure between the observed distribution and a
hypothetical target distribution that would identify the causal parameter of
interest. The framework yields a sensitivity analysis that generalizes existing
bounds for Average Potential Outcome (APO), Regression Discontinuity (RD), and
instrumental variables (IV) exclusion failure designs. Our partial
identification results extend results from the APO context to allow even
unbounded likelihood ratios. Our proposed sensitivity analysis consistently
estimates sharp bounds under plausible conditions and estimates valid bounds
under mild conditions. We find that our method performs well in simulations
even when targeting a discontinuous and nearly infinite bound.
arXiv link: http://arxiv.org/abs/2309.06305v3
Forecasted Treatment Effects
absence of a control group. We obtain unbiased estimators of individual
(heterogeneous) treatment effects and a consistent and asymptotically normal
estimator of the average treatment effect. Our estimator averages over unbiased
forecasts of individual counterfactuals, based on a (short) time series of
pre-treatment data. The paper emphasizes the importance of focusing on forecast
unbiasedness rather than accuracy when the end goal is estimation of average
treatment effects. We show that simple basis function regressions ensure
forecast unbiasedness for a broad class of data-generating processes for the
counterfactuals, even in short panels. In contrast, model-based forecasting
requires stronger assumptions and is prone to misspecification and estimation
bias. We show that our method can replicate the findings of some previous
empirical studies, but without using a control group.
arXiv link: http://arxiv.org/abs/2309.05639v3
Nonlinear Granger Causality using Kernel Ridge Regression
mlcausality, designed for the identification of nonlinear Granger causal
relationships. This novel algorithm uses a flexible plug-in architecture that
enables researchers to employ any nonlinear regressor as the base prediction
model. Subsequently, I conduct a comprehensive performance analysis of
mlcausality when the prediction regressor is the kernel ridge regressor with
the radial basis function kernel. The results demonstrate that mlcausality
employing kernel ridge regression achieves competitive AUC scores across a
diverse set of simulated data. Furthermore, mlcausality with kernel ridge
regression yields more finely calibrated $p$-values in comparison to rival
algorithms. This enhancement enables mlcausality to attain superior accuracy
scores when using intuitive $p$-value-based thresholding criteria. Finally,
mlcausality with the kernel ridge regression exhibits significantly reduced
computation times compared to existing nonlinear Granger causality algorithms.
In fact, in numerous instances, this innovative approach achieves superior
solutions within computational timeframes that are an order of magnitude
shorter than those required by competing algorithms.
arXiv link: http://arxiv.org/abs/2309.05107v1
Testing for Stationary or Persistent Coefficient Randomness in Predictive Regressions
regressions. Our focus is on how tests for coefficient randomness are
influenced by the persistence of random coefficient. We show that when the
random coefficient is stationary, or I(0), Nyblom's (1989) LM test loses its
optimality (in terms of power), which is established against the alternative of
integrated, or I(1), random coefficient. We demonstrate this by constructing a
test that is more powerful than the LM test when the random coefficient is
stationary, although the test is dominated in terms of power by the LM test
when the random coefficient is integrated. The power comparison is made under
the sequence of local alternatives that approaches the null hypothesis at
different rates depending on the persistence of the random coefficient and
which test is considered. We revisit an earlier empirical research and apply
the tests considered in this study to the U.S. stock returns data. The result
mostly reverses the earlier finding.
arXiv link: http://arxiv.org/abs/2309.04926v5
Structural Econometric Estimation of the Basic Reproduction Number for Covid-19 Across U.S. States and Selected Countries
reproduction number ($R_{0}$) of Covid-19. This approach identifies
$R_{0}$ in a panel regression model by filtering out the effects of
mitigating factors on disease diffusion and is easy to implement. We apply the
method to data from 48 contiguous U.S. states and a diverse set of countries.
Our results reveal a notable concentration of $R_{0}$ estimates with
an average value of 4.5. Through a counterfactual analysis, we highlight a
significant underestimation of the $R_{0}$ when mitigating factors
are not appropriately accounted for.
arXiv link: http://arxiv.org/abs/2309.08619v1
Non-linear dimension reduction in factor-augmented vector autoregressions
vector autoregressions to analyze the effects of different economic shocks. I
argue that controlling for non-linearities between a large-dimensional dataset
and the latent factors is particularly useful during turbulent times of the
business cycle. In simulations, I show that non-linear dimension reduction
techniques yield good forecasting performance, especially when data is highly
volatile. In an empirical application, I identify a monetary policy as well as
an uncertainty shock excluding and including observations of the COVID-19
pandemic. Those two applications suggest that the non-linear FAVAR approaches
are capable of dealing with the large outliers caused by the COVID-19 pandemic
and yield reliable results in both scenarios.
arXiv link: http://arxiv.org/abs/2309.04821v1
Interpreting TSLS Estimators in Information Provision Experiments
information provision experiments. We consider the causal interpretation of
two-stage least squares (TSLS) estimators in these experiments. We characterize
common TSLS estimators as weighted averages of causal effects, and interpret
these weights under general belief updating conditions that nest parametric
models from the literature. Our framework accommodates TSLS estimators for both
passive and active control designs. Notably, we find that some passive control
estimators allow for negative weights, which compromises their causal
interpretation. We give practical guidance on such issues, and illustrate our
results in two empirical applications.
arXiv link: http://arxiv.org/abs/2309.04793v4
Identifying spatial interdependence in panel data with large N and small T
estimate panel spatial autoregressive models, where N, the number of
cross-sectional units, is much larger than T, the number of time periods
without restricting the spatial effects using a predetermined weighting matrix.
We use Dirichlet-Laplace priors for variable selection and parameter shrinkage.
Without imposing any a priori structures on the spatial linkages between
variables, we let the data speak for themselves. Extensive Monte Carlo studies
show that our method is super-fast and our estimated spatial weights matrices
strongly resemble the true spatial weights matrices. As an illustration, we
investigate the spatial interdependence of European Union regional gross value
added growth rates. In addition to a clear pattern of predominant country
clusters, we have uncovered a number of important between-country spatial
linkages which are yet to be documented in the literature. This new procedure
for estimating spatial effects is of particular relevance for researchers and
policy makers alike.
arXiv link: http://arxiv.org/abs/2309.03740v1
A Causal Perspective on Loan Pricing: Investigating the Impacts of Selection Bias on Identifying Bid-Response Functions
a well-functioning personalized pricing policy in place is essential to
effective business making. Typically, such a policy must be derived from
observational data, which introduces several challenges. While the problem of
“endogeneity” is prominently studied in the established pricing literature,
the problem of selection bias (or, more precisely, bid selection bias) is not.
We take a step towards understanding the effects of selection bias by posing
pricing as a problem of causal inference. Specifically, we consider the
reaction of a customer to price a treatment effect. In our experiments, we
simulate varying levels of selection bias on a semi-synthetic dataset on
mortgage loan applications in Belgium. We investigate the potential of
parametric and nonparametric methods for the identification of individual
bid-response functions. Our results illustrate how conventional methods such as
logistic regression and neural networks suffer adversely from selection bias.
In contrast, we implement state-of-the-art methods from causal machine learning
and show their capability to overcome selection bias in pricing data.
arXiv link: http://arxiv.org/abs/2309.03730v1
Instrumental variable estimation of the proportional hazards model by presmoothing
model of Cox (1972). The instrument and the endogenous variable are discrete
but there can be (possibly continuous) exogenous covariables. By making a rank
invariance assumption, we can reformulate the proportional hazards model into a
semiparametric version of the instrumental variable quantile regression model
of Chernozhukov and Hansen (2005). A na\"ive estimation approach based on
conditional moment conditions generated by the model would lead to a highly
nonconvex and nonsmooth objective function. To overcome this problem, we
propose a new presmoothing methodology. First, we estimate the model
nonparametrically - and show that this nonparametric estimator has a
closed-form solution in the leading case of interest of randomized experiments
with one-sided noncompliance. Second, we use the nonparametric estimator to
generate “proxy” observations for which exogeneity holds. Third, we apply the
usual partial likelihood estimator to the “proxy” data. While the paper
focuses on the proportional hazards model, our presmoothing approach could be
applied to estimate other semiparametric formulations of the instrumental
variable quantile regression model. Our estimation procedure allows for random
right-censoring. We show asymptotic normality of the resulting estimator. The
approach is illustrated via simulation studies and an empirical application to
the Illinois
arXiv link: http://arxiv.org/abs/2309.02183v1
On the use of U-statistics for linear dyadic interaction models
(asymptotic) properties of estimation methods only began to be studied recently
in the literature. This paper aims to provide in a step-by-step manner how
U-statistics tools can be applied to obtain the asymptotic properties of
pairwise differences estimators for a two-way fixed effects model of dyadic
interactions. More specifically, we first propose an estimator for the model
that relies on pairwise differencing such that the fixed effects are
differenced out. As a result, the summands of the influence function will not
be independent anymore, showing dependence on the individual level and
translating to the fact that the usual law of large numbers and central limit
theorems do not straightforwardly apply. To overcome such obstacles, we show
how to generalize tools of U-statistics for single-index variables to the
double-indices context of dyadic datasets. A key result is that there can be
different ways of defining the Hajek projection for a directed dyadic
structure, which will lead to distinct, but equivalent, consistent estimators
for the asymptotic variances. The results presented in this paper are easily
extended to non-linear models.
arXiv link: http://arxiv.org/abs/2309.02089v1
Global Neural Networks and The Data Scaling Effect in Financial Time Series Forecasting
application to financial time series forecasting remains controversial. In this
study, we demonstrate that the conventional practice of estimating models
locally in data-scarce environments may underlie the mixed empirical
performance observed in prior work. By focusing on volatility forecasting, we
employ a dataset comprising over 10,000 global stocks and implement a global
estimation strategy that pools information across cross-sections. Our
econometric analysis reveals that forecasting accuracy improves markedly as the
training dataset becomes larger and more heterogeneous. Notably, even with as
little as 12 months of data, globally trained networks deliver robust
predictions for individual stocks and portfolios that are not even in the
training dataset. Furthermore, our interpretation of the model dynamics shows
that these networks not only capture key stylized facts of volatility but also
exhibit resilience to outliers and rapid adaptation to market regime changes.
These findings underscore the importance of leveraging extensive and diverse
datasets in financial forecasting and advocate for a shift from traditional
local training approaches to integrated global estimation methods.
arXiv link: http://arxiv.org/abs/2309.02072v6
The Local Projection Residual Bootstrap for AR(1) Models
confidence intervals for impulse response coefficients of AR(1) models. Our
bootstrap method is based on the local projection (LP) approach and involves a
residual bootstrap procedure applied to AR(1) models. We present theoretical
results for our bootstrap method and proposed confidence intervals. First, we
prove the uniform consistency of the LP-residual bootstrap over a large class
of AR(1) models that allow for a unit root, conditional heteroskedasticity of
unknown form, and martingale difference shocks. Then, we prove the asymptotic
validity of our confidence intervals over the same class of AR(1) models.
Finally, we show that the LP-residual bootstrap provides asymptotic refinements
for confidence intervals on a restricted class of AR(1) models relative to
those required for the uniform consistency of our bootstrap.
arXiv link: http://arxiv.org/abs/2309.01889v5
Non-Transitivity of the Win Ratio and the Area Under the Receiver Operating Characteristics Curve (AUC): a case for evaluating the strength of stochastic comparisons
that can account for hierarchies within event outcomes. In this paper we report
and study the long-run non-transitive behavior of the win ratio and the closely
related Area Under the Receiver Operating Characteristics Curve (AUC) and argue
that their transitivity cannot be taken for granted. Crucially, traditional
within-group statistics (i.e., comparison of means) are always transitive,
while the WR can detect non-transitivity. Non-transitivity provides valuable
information on the stochastic relationship between two treatment groups, which
should be tested and reported. We specify the necessary conditions for
transitivity, the sufficient conditions for non-transitivity, and demonstrate
non-transitivity in a real-life large randomized controlled trial for the WR of
time-to-death. Our results can be used to rule out or evaluate the possibility
of non-transitivity and show the importance of studying the strength of
stochastic relationships.
arXiv link: http://arxiv.org/abs/2309.01791v2
Generalized Information Criteria for Structured Sparse Models
low-dimensional model in high-dimensional scenarios. Some recent efforts on
this subject focused on creating a unified framework for establishing oracle
bounds, and deriving conditions for support recovery. Under this same
framework, we propose a new Generalized Information Criteria (GIC) that takes
into consideration the sparsity pattern one wishes to recover. We obtain
non-asymptotic model selection bounds and sufficient conditions for model
selection consistency of the GIC. Furthermore, we show that the GIC can also be
used for selecting the regularization parameter within a regularized
$m$-estimation framework, which allows practical use of the GIC for model
selection in high-dimensional scenarios. We provide examples of group LASSO in
the context of generalized linear regression and low rank matrix regression.
arXiv link: http://arxiv.org/abs/2309.01764v1
Design-Based Multi-Way Clustering
cluster dependence, and shows how multi-way clustering can be justified when
clustered assignment and clustered sampling occurs on different dimensions, or
when either sampling or assignment is multi-way clustered. Unlike one-way
clustering, the plug-in variance estimator in multi-way clustering is no longer
conservative, so valid inference either requires an assumption on the
correlation of treatment effects or a more conservative variance estimator.
Simulations suggest that the plug-in variance estimator is usually robust, and
the conservative variance estimator is often too conservative.
arXiv link: http://arxiv.org/abs/2309.01658v1
The Robust F-Statistic as a Test for Weak Instruments
for weak instruments in terms of the Nagar bias of the two-stage least squares
(2SLS) estimator relative to a benchmark worst-case bias. We show that their
methodology applies to a class of linear generalized method of moments (GMM)
estimators with an associated class of generalized effective F-statistics. The
standard nonhomoskedasticity robust F-statistic is a member of this class. The
associated GMMf estimator, with the extension f for first-stage, is a novel and
unusual estimator as the weight matrix is based on the first-stage residuals.
As the robust F-statistic can also be used as a test for underidentification,
expressions for the calculation of the weak-instruments critical values in
terms of the Nagar bias of the GMMf estimator relative to the benchmark
simplify and no simulation methods or Patnaik (1949) distributional
approximations are needed. In the grouped-data IV designs of Andrews (2018),
where the robust F-statistic is large but the effective F-statistic is small,
the GMMf estimator is shown to behave much better in terms of bias than the
2SLS estimator, as expected by the weak-instruments test results.
arXiv link: http://arxiv.org/abs/2309.01637v3
Moment-Based Estimation of Diffusion and Adoption Parameters in Networks
is the efficient estimation choice, however, it is not always a feasible one.
In network diffusion models with unobserved signal propagation, MLE requires
integrating out a large number of latent variables, which quickly becomes
computationally infeasible even for moderate network sizes and time horizons.
Limiting the model time horizon on the other hand entails loss of important
information while approximation techniques entail a (small) error that.
Searching for a viable alternative is thus potentially highly beneficial. This
paper proposes two estimators specifically tailored to the network diffusion
model of partially observed adoption and unobserved network diffusion.
arXiv link: http://arxiv.org/abs/2309.01489v1
A Trimming Estimator for the Latent-Diffusion-Observed-Adoption Model
yet network interaction is hard to observe or measure. Whenever the diffusion
process is unobserved, the number of possible realizations of the latent matrix
that captures agents' diffusion statuses grows exponentially with the size of
network. Due to interdependencies, the log likelihood function can not be
factorized in individual components. As a consequence, exact estimation of
latent diffusion models with more than one round of interaction is
computationally infeasible. In the present paper, I propose a trimming
estimator that enables me to establish and maximize an approximate log
likelihood function that almost exactly identifies the peak of the true log
likelihood function whenever no more than one third of eligible agents are
subject to trimming.
arXiv link: http://arxiv.org/abs/2309.01471v1
iCOS: Option-Implied COS Method
non-parametric estimation of risk-neutral densities, option prices, and option
sensitivities. The iCOS method leverages the Fourier-based COS technique,
proposed by Fang and Oosterlee (2008), by utilizing the option-implied cosine
series coefficients. Notably, this procedure does not rely on any model
assumptions about the underlying asset price dynamics, it is fully
non-parametric, and it does not involve any numerical optimization. These
features make it rather general and computationally appealing. Furthermore, we
derive the asymptotic properties of the proposed non-parametric estimators and
study their finite-sample behavior in Monte Carlo simulations. Our empirical
analysis using S&P 500 index options and Amazon equity options illustrates the
effectiveness of the iCOS method in extracting valuable information from option
prices under different market conditions. Additionally, we apply our
methodology to dissect and quantify observation and discretization errors in
the VIX index.
arXiv link: http://arxiv.org/abs/2309.00943v2
Fairness Implications of Heterogeneous Treatment Effect Estimation with Machine Learning Methods in Policy-making
treatment effect estimates could be very useful tools for governments trying to
make and implement policy. However, as the critical artificial intelligence
literature has shown, governments must be very careful of unintended
consequences when using machine learning models. One way to try and protect
against unintended bad outcomes is with AI Fairness methods which seek to
create machine learning models where sensitive variables like race or gender do
not influence outcomes. In this paper we argue that standard AI Fairness
approaches developed for predictive machine learning are not suitable for all
causal machine learning applications because causal machine learning generally
(at least so far) uses modelling to inform a human who is the ultimate
decision-maker while AI Fairness approaches assume a model that is making
decisions directly. We define these scenarios as indirect and direct
decision-making respectively and suggest that policy-making is best seen as a
joint decision where the causal machine learning model usually only has
indirect power. We lay out a definition of fairness for this scenario - a model
that provides the information a decision-maker needs to accurately make a value
judgement about just policy outcomes - and argue that the complexity of causal
machine learning models can make this difficult to achieve. The solution here
is not traditional AI Fairness adjustments, but careful modelling and awareness
of some of the decision-making biases that these methods might encourage which
we describe.
arXiv link: http://arxiv.org/abs/2309.00805v1
New general dependence measures: construction, estimation and application to high-frequency stock returns
to a wide range of transformations on the marginals, can show tail and risk
asymmetries, are always well-defined, are easy to estimate and can be used on
any dataset. We propose a nonparametric estimator and prove its consistency and
asymptotic normality. Thereby we significantly improve on existing (extreme)
dependence measures used in asset pricing and statistics. To show practical
utility, we use these measures on high-frequency stock return data around
market distress events such as the 2010 Flash Crash and during the GFC.
Contrary to ubiquitously used correlations we find that our measures clearly
show tail asymmetry, non-linearity, lack of diversification and endogenous
buildup of risks present during these distress events. Additionally, our
measures anticipate large (joint) losses during the Flash Crash while also
anticipating the bounce back and flagging the subsequent market fragility. Our
findings have implications for risk management, portfolio construction and
hedging at any frequency.
arXiv link: http://arxiv.org/abs/2309.00025v1
Target PCA: Transfer Learning Large Dimensional Panel Data
large target panel with missing observations by optimally using the information
from auxiliary panel data sets. We refer to our estimator as target-PCA.
Transfer learning from auxiliary panel data allows us to deal with a large
fraction of missing observations and weak signals in the target panel. We show
that our estimator is more efficient and can consistently estimate weak
factors, which are not identifiable with conventional methods. We provide the
asymptotic inferential theory for target-PCA under very general assumptions on
the approximate factor model and missing patterns. In an empirical study of
imputing data in a mixed-frequency macroeconomic panel, we demonstrate that
target-PCA significantly outperforms all benchmark methods.
arXiv link: http://arxiv.org/abs/2308.15627v1
Mixed-Effects Methods for Search and Matching Research
firm effects. In economics such models are usually estimated using
fixed-effects methods. Recent enhancements to those fixed-effects methods
include corrections to the bias in estimating the covariance matrix of the
person and firm effects, which we also consider.
arXiv link: http://arxiv.org/abs/2308.15445v1
Combining predictive distributions of electricity prices: Does minimizing the CRPS lead to optimal decisions in day-ahead bidding?
trading because decisions based on such predictions can yield significantly
higher profits than those made with point forecasts alone. At the same time,
methods are being developed to combine predictive distributions, since no model
is perfect and averaging generally improves forecasting performance. In this
article we address the question of whether using CRPS learning, a novel
weighting technique minimizing the continuous ranked probability score (CRPS),
leads to optimal decisions in day-ahead bidding. To this end, we conduct an
empirical study using hourly day-ahead electricity prices from the German EPEX
market. We find that increasing the diversity of an ensemble can have a
positive impact on accuracy. At the same time, the higher computational cost of
using CRPS learning compared to an equal-weighted aggregation of distributions
is not offset by higher profits, despite significantly more accurate
predictions.
arXiv link: http://arxiv.org/abs/2308.15443v1
Another Look at the Linear Probability Model and Nonlinear Index Models
binary outcomes, focusing on average partial effects (APE). We confirm that
linear projection parameters coincide with APEs in certain scenarios. Through
simulations, we identify other cases where OLS does or does not approximate
APEs and find that having large fraction of fitted values in [0, 1] is neither
necessary nor sufficient. We also show nonlinear least squares estimation of
the ramp model is consistent and asymptotically normal and is equivalent to
using OLS on an iteratively trimmed sample to reduce bias. Our findings offer
practical guidance for empirical research.
arXiv link: http://arxiv.org/abs/2308.15338v3
Forecasting with Feedback
forecasters' irrationality and/or asymmetric loss. In this paper we propose an
alternative explanation: when forecasts inform economic policy decisions, and
the resulting actions affect the realization of the forecast target itself,
forecasts may be optimally biased even under quadratic loss. The result arises
in environments in which the forecaster is uncertain about the decision maker's
reaction to the forecast, which is presumably the case in most applications. We
illustrate the empirical relevance of our theory by reviewing some stylized
properties of Green Book inflation forecasts and relating them to the
predictions from our model. Our results point out that the presence of policy
feedback poses a challenge to traditional tests of forecast rationality.
arXiv link: http://arxiv.org/abs/2308.15062v3
Stochastic Variational Inference for GARCH Models
heteroskedastic time series models. We examine Gaussian, t, and skew-t response
GARCH models and fit these using Gaussian variational approximating densities.
We implement efficient stochastic gradient ascent procedures based on the use
of control variates or the reparameterization trick and demonstrate that the
proposed implementations provide a fast and accurate alternative to Markov
chain Monte Carlo sampling. Additionally, we present sequential updating
versions of our variational algorithms, which are suitable for efficient
portfolio construction and dynamic asset allocation.
arXiv link: http://arxiv.org/abs/2308.14952v1
Donut Regression Discontinuity Designs
discontinuity (RD) designs, a robustness exercise which involves repeating
estimation and inference without the data points in some area around the
treatment threshold. This approach is often motivated by concerns that possible
systematic sorting of units, or similar data issues, in some neighborhood of
the treatment threshold might distort estimation and inference of RD treatment
effects. We show that donut RD estimators can have substantially larger bias
and variance than contentional RD estimators, and that the corresponding
confidence intervals can be substantially longer. We also provide a formal
testing framework for comparing donut and conventional RD estimation results.
arXiv link: http://arxiv.org/abs/2308.14464v1
Bandwidth Selection for Treatment Choice with Binary Outcomes
binary. We focus on statistical treatment rules that plug in fitted values
based on nonparametric kernel regression and show that optimizing two
parameters enables the calculation of the maximum regret. Using this result, we
propose a novel bandwidth selection method based on the minimax regret
criterion. Finally, we perform a numerical analysis to compare the optimal
bandwidth choices for the binary and normally distributed outcomes.
arXiv link: http://arxiv.org/abs/2308.14375v2
Can Machine Learning Catch Economic Recessions Using Economic and Market Sentiments?
and investors. Predicting an economic recession with high accuracy and
reliability would be very beneficial for the society. This paper assesses
machine learning technics to predict economic recessions in United States using
market sentiment and economic indicators (seventy-five explanatory variables)
from Jan 1986 - June 2022 on a monthly basis frequency. In order to solve the
issue of missing time-series data points, Autoregressive Integrated Moving
Average (ARIMA) method used to backcast explanatory variables. Analysis started
with reduction in high dimensional dataset to only most important characters
using Boruta algorithm, correlation matrix and solving multicollinearity issue.
Afterwards, built various cross-validated models, both probability regression
methods and machine learning technics, to predict recession binary outcome. The
methods considered are Probit, Logit, Elastic Net, Random Forest, Gradient
Boosting, and Neural Network. Lastly, discussed different models performance
based on confusion matrix, accuracy and F1 score with potential reasons for
their weakness and robustness.
arXiv link: http://arxiv.org/abs/2308.16200v1
Identification and Estimation of Demand Models with Endogenous Product Entry and Exit
demand estimation. Product entry decisions lack a single crossing property in
terms of demand unobservables, which causes the inconsistency of conventional
methods dealing with selection. We present a novel and straightforward two-step
approach to estimate demand while addressing endogenous product entry. In the
first step, our method estimates a finite mixture model of product entry
accommodating latent market types. In the second step, it estimates demand
controlling for the propensity scores of all latent market types. We apply this
approach to data from the airline industry.
arXiv link: http://arxiv.org/abs/2308.14196v1
High Dimensional Time Series Regression Models: Applications to Statistical Learning Methods
developments for estimation and inference with high dimensional time series
regression models. First, we present main limit theory results for high
dimensional dependent data which is relevant to covariance matrix structures as
well as to dependent time series sequences. Second, we present main aspects of
the asymptotic theory related to time series regression models with many
covariates. Third, we discuss various applications of statistical learning
methodologies for time series analysis purposes.
arXiv link: http://arxiv.org/abs/2308.16192v1
Break-Point Date Estimation for Nonstationary Autoregressive and Predictive Regression Models
break-point estimators in nonstationary autoregressive and predictive
regression models for testing the presence of a single structural break at an
unknown location in the full sample. Moreover, we investigate aspects such as
how the persistence properties of covariates and the location of the
break-point affects the limiting distribution of the proposed break-point
estimators.
arXiv link: http://arxiv.org/abs/2308.13915v1
Splash! Robustifying Donor Pools for Policy Studies
pool in part by using policy domain expertise so the untreated units are most
like the treated unit in the pre intervention period. This potentially leaves
estimation open to biases, especially when researchers have many potential
donors. We compare how functional principal component analysis synthetic
control, forward-selection, and the original synthetic control method select
donors. To do this, we use Gaussian Process simulations as well as policy case
studies from West German Reunification, a hotel moratorium in Barcelona, and a
sugar-sweetened beverage tax in San Francisco. We then summarize the
implications for policy research and provide avenues for future work.
arXiv link: http://arxiv.org/abs/2308.13688v1
GARCHX-NoVaS: A Model-free Approach to Incorporate Exogenous Variables
normalizing and variance-stabilizing (NoVaS) transformation with the possible
inclusion of exogenous variables. From an applied point-of-view, extra
knowledge such as fundamentals- and sentiments-based information could be
beneficial to improve the prediction accuracy of market volatility if they are
incorporated into the forecasting process. In the classical approach, these
models including exogenous variables are typically termed GARCHX-type models.
Being a Model-free prediction method, NoVaS has generally shown more accurate,
stable and robust (to misspecifications) performance than that compared to
classical GARCH-type methods. This motivates us to extend this framework to the
GARCHX forecasting as well. We derive the NoVaS transformation needed to
include exogenous covariates and then construct the corresponding prediction
procedure. We show through extensive simulation studies that bolster our claim
that the NoVaS method outperforms traditional ones, especially for long-term
time aggregated predictions. We also provide an interesting data analysis to
exhibit how our method could possibly shed light on the role of geopolitical
risks in forecasting volatility in national stock market indices for three
different countries in Europe.
arXiv link: http://arxiv.org/abs/2308.13346v3
SGMM: Stochastic Approximation to Generalized Method of Moments
Moments (SGMM), for estimation and inference on (overidentified) moment
restriction models. Our SGMM is a novel stochastic approximation alternative to
the popular Hansen (1982) (offline) GMM, and offers fast and scalable
implementation with the ability to handle streaming datasets in real time. We
establish the almost sure convergence, and the (functional) central limit
theorem for the inefficient online 2SLS and the efficient SGMM. Moreover, we
propose online versions of the Durbin-Wu-Hausman and Sargan-Hansen tests that
can be seamlessly integrated within the SGMM framework. Extensive Monte Carlo
simulations show that as the sample size increases, the SGMM matches the
standard (offline) GMM in terms of estimation accuracy and gains over
computational efficiency, indicating its practical value for both large-scale
and online datasets. We demonstrate the efficacy of our approach by a proof of
concept using two well known empirical examples with large sample sizes.
arXiv link: http://arxiv.org/abs/2308.13564v2
Spatial and Spatiotemporal Volatility Models: A Review
to capture spatial dependence in the volatility of spatial and spatiotemporal
data. Spatial dependence in the volatility may arise due to spatial spillovers
among locations; that is, if two locations are in close proximity, they can
exhibit similar volatilities. In this paper, we aim to provide a comprehensive
review of the recent literature on spatial and spatiotemporal volatility
models. We first briefly review time series volatility models and their
multivariate extensions to motivate their spatial and spatiotemporal
counterparts. We then review various spatial and spatiotemporal volatility
specifications proposed in the literature along with their underlying
motivations and estimation strategies. Through this analysis, we effectively
compare all models and provide practical recommendations for their appropriate
usage. We highlight possible extensions and conclude by outlining directions
for future research.
arXiv link: http://arxiv.org/abs/2308.13061v1
Optimal Shrinkage Estimation of Fixed Effects in Linear Panel Data Models
squares estimators of fixed effects. However, widely used shrinkage estimators
guarantee improved precision only under strong distributional assumptions. I
develop an estimator for the fixed effects that obtains the best possible mean
squared error within a class of shrinkage estimators. This class includes
conventional shrinkage estimators and the optimality does not require
distributional assumptions. The estimator has an intuitive form and is easy to
implement. Moreover, the fixed effects are allowed to vary with time and to be
serially correlated, in which case the shrinkage optimally incorporates the
underlying correlation structure. I also provide a method to forecast fixed
effects one period ahead in this setting.
arXiv link: http://arxiv.org/abs/2308.12485v4
Scalable Estimation of Multinomial Response Models with Random Consideration Sets
for $J$ mutually exclusive categories is that the responses arise from the same
set of $J$ categories across subjects. However, when responses measure a choice
made by the subject, it is more appropriate to condition the distribution of
multinomial responses on a subject-specific consideration set, drawn from the
power set of $\{1,2,\ldots,J\}$. This leads to a mixture of multinomial
response models governed by a probability distribution over the $J^{\ast} = 2^J
-1$ consideration sets. We introduce a novel method for estimating such
generalized multinomial response models based on the fundamental result that
any mass distribution over $J^{\ast}$ consideration sets can be represented as
a mixture of products of $J$ component-specific inclusion-exclusion
probabilities. Moreover, under time-invariant consideration sets, the
conditional posterior distribution of consideration sets is sparse. These
features enable a scalable MCMC algorithm for sampling the posterior
distribution of parameters, random effects, and consideration sets. Under
regularity conditions, the posterior distributions of the marginal response
probabilities and the model parameters satisfy consistency. The methodology is
demonstrated in a longitudinal data set on weekly cereal purchases that cover
$J = 101$ brands, a dimension substantially beyond the reach of existing
methods.
arXiv link: http://arxiv.org/abs/2308.12470v5
Forecasting inflation using disaggregates and machine learning
predicting inflation, focusing on aggregating disaggregated forecasts - also
known in the literature as the bottom-up approach. Taking the Brazilian case as
an application, we consider different disaggregation levels for inflation and
employ a range of traditional time series techniques as well as linear and
nonlinear machine learning (ML) models to deal with a larger number of
predictors. For many forecast horizons, the aggregation of disaggregated
forecasts performs just as well survey-based expectations and models that
generate forecasts using the aggregate directly. Overall, ML methods outperform
traditional time series models in predictive accuracy, with outstanding
performance in forecasting disaggregates. Our results reinforce the benefits of
using models in a data-rich environment for inflation forecasting, including
aggregating disaggregated forecasts from ML techniques, mainly during volatile
periods. Starting from the COVID-19 pandemic, the random forest model based on
both aggregate and disaggregated inflation achieves remarkable predictive
performance at intermediate and longer horizons.
arXiv link: http://arxiv.org/abs/2308.11173v1
Econometrics of Machine Learning Methods in Economic Forecasting
economic forecasting. The survey covers the following topics: nowcasting,
textual data, panel and tensor data, high-dimensional Granger causality tests,
time series cross-validation, classification with economic losses.
arXiv link: http://arxiv.org/abs/2308.10993v1
Simulation Experiments as a Causal Problem
statistical science. In particular, statisticians often is simulation to
explore properties of statistical functionals in models for which developed
statistical theory is insufficient or to assess finite sample properties of
theoretical results. We show that the design of simulation experiments can be
viewed from the perspective of causal intervention on a data generating
mechanism. We then demonstrate the use of causal tools and frameworks in this
context. Our perspective is agnostic to the particular domain of the simulation
experiment which increases the potential impact of our proposed approach. In
this paper, we consider two illustrative examples. First, we re-examine a
predictive machine learning example from a popular textbook designed to assess
the relationship between mean function complexity and the mean-squared error.
Second, we discuss a traditional causal inference method problem, simulating
the effect of unmeasured confounding on estimation, specifically to illustrate
bias amplification. In both cases, applying causal principles and using
graphical models with parameters and distributions as nodes in the spirit of
influence diagrams can 1) make precise which estimand the simulation targets ,
2) suggest modifications to better attain the simulation goals, and 3) provide
scaffolding to discuss performance criteria for a particular simulation design.
arXiv link: http://arxiv.org/abs/2308.10823v1
Genuinely Robust Inference for Clustered Data
clusters of unignorably large size. We formalize this issue by deriving a
necessary and sufficient condition for its validity, and show that this
condition is frequently violated in practice: specifications from 77% of
empirical research articles in American Economic Review and Econometrica during
2020-2021 appear not to meet it. To address this limitation, we propose a
genuinely robust inference procedure based on a new cluster score bootstrap. We
establish its validity and size control across broad classes of data-generating
processes where conventional methods break down. Simulation studies corroborate
our theoretical findings, and empirical applications illustrate that employing
the proposed method can substantially alter conventional statistical
conclusions.
arXiv link: http://arxiv.org/abs/2308.10138v8
Weak Identification with Many Instruments
effects. Many instruments arise from the use of “technical” instruments and
more recently from the empirical strategy of “judge design”. This paper
surveys and summarizes ideas from recent literature on estimation and
statistical inferences with many instruments for a single endogenous regressor.
We discuss how to assess the strength of the instruments and how to conduct
weak identification-robust inference under heteroskedasticity. We establish new
results for a jack-knifed version of the Lagrange Multiplier (LM) test
statistic. Furthermore, we extend the weak-identification-robust tests to
settings with both many exogenous regressors and many instruments. We propose a
test that properly partials out many exogenous regressors while preserving the
re-centering property of the jack-knife. The proposed tests have correct size
and good power properties.
arXiv link: http://arxiv.org/abs/2308.09535v2
Closed-form approximations of moments and densities of continuous-time Markov models
functions, including transition densities and option prices, of continuous-time
Markov processes, including jump--diffusions. The proposed expansions extend
the ones in Kristensen and Mele (2011) to cover general Markov processes. We
demonstrate that the class of expansions nests the transition density and
option price expansions developed in Yang, Chen, and Wan (2019) and Wan and
Yang (2021) as special cases, thereby connecting seemingly different ideas in a
unified framework. We show how the general expansion can be implemented for
fully general jump--diffusion models. We provide a new theory for the validity
of the expansions which shows that series expansions are not guaranteed to
converge as more terms are added in general. Thus, these methods should be used
with caution. At the same time, the numerical studies in this paper demonstrate
good performance of the proposed implementation in practice when a small number
of terms are included.
arXiv link: http://arxiv.org/abs/2308.09009v1
Linear Regression with Weak Exogeneity
exogeneity is the most used identifying assumption in time series. Weak
exogeneity requires the structural error to have zero conditional expectation
given the present and past regressor values, allowing errors to correlate with
future regressor realizations. We show that weak exogeneity in time series
regressions with many controls may produce substantial biases and even render
the least squares (OLS) estimator inconsistent. The bias arises in settings
with many regressors because the normalized OLS design matrix remains
asymptotically random and correlates with the regression error when only weak
(but not strict) exogeneity holds. This bias's magnitude increases with the
number of regressors and their average autocorrelation. To address this issue,
we propose an innovative approach to bias correction that yields a new
estimator with improved properties relative to OLS. We establish consistency
and conditional asymptotic Gaussianity of this new estimator and provide a
method for inference.
arXiv link: http://arxiv.org/abs/2308.08958v2
Testing Partial Instrument Monotonicity
effects, the monotonicity condition may not hold due to heterogeneity in the
population. Under a partial monotonicity condition, which only requires the
monotonicity to hold for each instrument separately holding all the other
instruments fixed, the 2SLS estimand can still be a positively weighted average
of LATEs. In this paper, we provide a simple nonparametric test for partial
instrument monotonicity. We demonstrate the good finite sample properties of
the test through Monte Carlo simulations. We then apply the test to monetary
incentives and distance from results centers as instruments for the knowledge
of HIV status.
arXiv link: http://arxiv.org/abs/2308.08390v2
Computer vision-enriched discrete choice models, with an application to residential location choice
Examples of such decision situations in travel behaviour research include
residential location choices, vehicle choices, tourist destination choices, and
various safety-related choices. However, current discrete choice models cannot
handle image data and thus cannot incorporate information embedded in images
into their representations of choice behaviour. This gap between discrete
choice models' capabilities and the real-world behaviour it seeks to model
leads to incomplete and, possibly, misleading outcomes. To solve this gap, this
study proposes "Computer Vision-enriched Discrete Choice Models" (CV-DCMs).
CV-DCMs can handle choice tasks involving numeric attributes and images by
integrating computer vision and traditional discrete choice models. Moreover,
because CV-DCMs are grounded in random utility maximisation principles, they
maintain the solid behavioural foundation of traditional discrete choice
models. We demonstrate the proposed CV-DCM by applying it to data obtained
through a novel stated choice experiment involving residential location
choices. In this experiment, respondents faced choice tasks with trade-offs
between commute time, monthly housing cost and street-level conditions,
presented using images. As such, this research contributes to the growing body
of literature in the travel behaviour field that seeks to integrate discrete
choice modelling and machine learning.
arXiv link: http://arxiv.org/abs/2308.08276v1
Estimating Effects of Long-Term Treatments
challenging. Treatments, such as updates to product functionalities, user
interface designs, and recommendation algorithms, are intended to persist
within the system for a long duration of time after their initial launches.
However, due to the constraints of conducting long-term experiments,
practitioners often rely on short-term experimental results to make product
launch decisions. It remains open how to accurately estimate the effects of
long-term treatments using short-term experimental data. To address this
question, we introduce a longitudinal surrogate framework that decomposes the
long-term effects into functions based on user attributes, short-term metrics,
and treatment assignments. We outline identification assumptions, estimation
strategies, inferential techniques, and validation methods under this
framework. Empirically, we demonstrate that our approach outperforms existing
solutions by using data from two real-world experiments, each involving more
than a million users on WeChat, one of the world's largest social networking
platforms.
arXiv link: http://arxiv.org/abs/2308.08152v2
Emerging Frontiers: Exploring the Impact of Generative AI Platforms on University Quantitative Finance Examinations
(LLM) enabled platforms - ChatGPT, BARD, and Bing AI - to answer an
undergraduate finance exam with 20 quantitative questions across various
difficulty levels. ChatGPT scored 30 percent, outperforming Bing AI, which
scored 20 percent, while Bard lagged behind with a score of 15 percent. These
models faced common challenges, such as inaccurate computations and formula
selection. While they are currently insufficient for helping students pass the
finance exam, they serve as valuable tools for dedicated learners. Future
advancements are expected to overcome these limitations, allowing for improved
formula selection and accurate computations and potentially enabling students
to score 90 percent or higher.
arXiv link: http://arxiv.org/abs/2308.07979v2
Optimizing B2B Product Offers with Machine Learning, Mixed Logit, and Nonlinear Programming
alternative to discounting. This study outlines a modeling method that uses
customer data (product offers made to each current or potential customer,
features, discounts, and customer purchase decisions) to estimate a mixed logit
choice model. The model is estimated via hierarchical Bayes and machine
learning, delivering customer-level parameter estimates. Customer-level
estimates are input into a nonlinear programming next-offer maximization
problem to select optimal features and discount level for customer segments,
where segments are based on loyalty and discount elasticity. The mixed logit
model is integrated with economic theory (the random utility model), and it
predicts both customer perceived value for and response to alternative future
sales offers. The methodology can be implemented to support value-based pricing
and selling efforts.
Contributions to the literature include: (a) the use of customer-level
parameter estimates from a mixed logit model, delivered via a hierarchical
Bayes estimation procedure, to support value-based pricing decisions; (b)
validation that mixed logit customer-level modeling can deliver strong
predictive accuracy, not as high as random forest but comparing favorably; and
(c) a nonlinear programming problem that uses customer-level mixed logit
estimates to select optimal features and discounts.
arXiv link: http://arxiv.org/abs/2308.07830v1
Serendipity in Science
the most important breakthroughs, ranging from penicillin to the electric
battery, have been made by scientists who were stimulated by a chance exposure
to unsought but useful information. However, not all scientists are equally
likely to benefit from such serendipitous exposure. Although scholars generally
agree that scientists with a prepared mind are most likely to benefit from
serendipitous encounters, there is much less consensus over what precisely
constitutes a prepared mind, with some research suggesting the importance of
openness and others emphasizing the need for deep prior experience in a
particular domain. In this paper, we empirically investigate the role of
serendipity in science by leveraging a policy change that exogenously shifted
the shelving location of journals in university libraries and subsequently
exposed scientists to unsought scientific information. Using large-scale data
on 2.4 million papers published in 9,750 journals by 520,000 scientists at 115
North American research universities, we find that scientists with greater
openness are more likely to benefit from serendipitous encounters. Following
the policy change, these scientists tended to cite less familiar and newer
work, and ultimately published papers that were more innovative. By contrast,
we find little effect on innovativeness for scientists with greater depth of
experience, who, in our sample, tended to cite more familiar and older work
following the policy change.
arXiv link: http://arxiv.org/abs/2308.07519v1
Quantile Time Series Regression Models Revisited
series models in the cases of stationary and nonstationary underline stochastic
processes.
arXiv link: http://arxiv.org/abs/2308.06617v3
Driver Heterogeneity in Willingness to Give Control to Conditional Automation
driving is assessed in a virtual reality based driving-rig, through their
choice to give away driving control and through the extent to which automated
driving is adopted in a mixed-traffic environment. Within- and across-class
unobserved heterogeneity and locus of control variations are taken into
account. The choice of giving away control is modelled using the mixed logit
(MIXL) and mixed latent class (LCML) model. The significant latent segments of
the locus of control are developed into internalizers and externalizers by the
latent class model (LCM) based on the taste heterogeneity identified from the
MIXL model. Results suggest that drivers choose to "giveAway" control of the
vehicle when greater concentration/attentiveness is required (e.g., in the
nighttime) or when they are interested in performing a non-driving-related task
(NDRT). In addition, it is observed that internalizers demonstrate more
heterogeneity compared to externalizers in terms of WTG.
arXiv link: http://arxiv.org/abs/2308.06426v1
Characterizing Correlation Matrices that Admit a Clustered Factor Representation
matrix and is commonly used to parameterize correlation matrices. Our results
reveal that the CF model imposes superfluous restrictions on the correlation
matrix. This can be avoided by a different parametrization, involving the
logarithmic transformation of the block correlation matrix.
arXiv link: http://arxiv.org/abs/2308.05895v1
Large Skew-t Copula Models and Asymmetric Dependence in Intraday Equity Returns
because they allow for asymmetric and extreme tail dependence. We show that the
copula implicit in the skew-t distribution of Azzalini and Capitanio (2003)
allows for a higher level of pairwise asymmetric dependence than two popular
alternative skew-t copulas. Estimation of this copula in high dimensions is
challenging, and we propose a fast and accurate Bayesian variational inference
(VI) approach to do so. The method uses a generative representation of the
skew-t distribution to define an augmented posterior that can be approximated
accurately. A stochastic gradient ascent algorithm is used to solve the
variational optimization. The methodology is used to estimate skew-t factor
copula models with up to 15 factors for intraday returns from 2017 to 2021 on
93 U.S. equities. The copula captures substantial heterogeneity in asymmetric
dependence over equity pairs, in addition to the variability in pairwise
correlations. In a moving window study we show that the asymmetric dependencies
also vary over time, and that intraday predictive densities from the skew-t
copula are more accurate than those from benchmark copula models. Portfolio
selection strategies based on the estimated pairwise asymmetric dependencies
improve performance relative to the index.
arXiv link: http://arxiv.org/abs/2308.05564v4
Money Growth and Inflation: A Quantile Sensitivity Approach
for inflation and money growth. By considering all quantiles and leveraging a
novel notion of quantile sensitivity, the method allows the assessment of
changes in the entire distribution of a variable of interest in response to a
perturbation in another variable's quantile. The construction of this
relationship is demonstrated through a system of linear quantile regressions.
Then, the proposed framework is exploited to examine the distributional effects
of money growth on the distributions of inflation and its disaggregate measures
in the United States and the Euro area. The empirical analysis uncovers
significant impacts of the upper quantile of the money growth distribution on
the distribution of inflation and its disaggregate measures. Conversely, the
lower and median quantiles of the money growth distribution are found to have a
negligible influence. Finally, this distributional impact exhibits variation
over time in both the United States and the Euro area.
arXiv link: http://arxiv.org/abs/2308.05486v3
Solving the Forecast Combination Puzzle
the methodology commonly used to produce forecast combinations. By the
combination puzzle, we refer to the empirical finding that predictions formed
by combining multiple forecasts in ways that seek to optimize forecast
performance often do not out-perform more naive, e.g. equally-weighted,
approaches. In particular, we demonstrate that, due to the manner in which such
forecasts are typically produced, tests that aim to discriminate between the
predictive accuracy of competing combination strategies can have low power, and
can lack size control, leading to an outcome that favours the naive approach.
We show that this poor performance is due to the behavior of the corresponding
test statistic, which has a non-standard asymptotic distribution under the null
hypothesis of no inferior predictive accuracy, rather than the {standard normal
distribution that is} {typically adopted}. In addition, we demonstrate that the
low power of such predictive accuracy tests in the forecast combination setting
can be completely avoided if more efficient estimation strategies are used in
the production of the combinations, when feasible. We illustrate these findings
both in the context of forecasting a functional of interest and in terms of
predictive densities. A short empirical example {using daily financial returns}
exemplifies how researchers can avoid the puzzle in practical settings.
arXiv link: http://arxiv.org/abs/2308.05263v1
Interpolation of numerical series by the Fermat-Torricelli point construction method on the example of the numerical series of inflation in the Czech Republic in 2011-2021
with certain difficulties, which, when models are constructed, are associated
with assumptions, and there is a normal law of error distribution and variables
are statistically independent. In practice , these conditions do not always
take place . This may cause the constructed economic and mathematical model to
have no practical value. As an alternative approach to the study of numerical
series, according to the author, smoothing of numerical series using
Fermat-Torricelli points with subsequent interpolation of these points by
series of exponents could be used. The use of exponential series for
interpolating numerical series makes it possible to achieve the accuracy of
model construction no worse than regression analysis . At the same time, the
interpolation by series of exponents does not require the statistical material
that the errors of the numerical series obey the normal distribution law, and
statistical independence of variables is also not required. Interpolation of
numerical series by exponential series represents a "black box" type model,
that is, only input parameters and output parameters matter.
arXiv link: http://arxiv.org/abs/2308.05183v1
Statistical Decision Theory Respecting Stochastic Dominance
state-dependent mean loss (risk) to measure the performance of statistical
decision functions across potential samples. We think it evident that
evaluation of performance should respect stochastic dominance, but we do not
see a compelling reason to focus exclusively on mean loss. We think it
instructive to also measure performance by other functionals that respect
stochastic dominance, such as quantiles of the distribution of loss. This paper
develops general principles and illustrative applications for statistical
decision theory respecting stochastic dominance. We modify the Wald definition
of admissibility to an analogous concept of stochastic dominance (SD)
admissibility, which uses stochastic dominance rather than mean sampling
performance to compare alternative decision rules. We study SD admissibility in
two relatively simple classes of decision problems that arise in treatment
choice. We reevaluate the relationship between the MLE, James-Stein, and
James-Stein positive part estimators from the perspective of SD admissibility.
We consider alternative criteria for choice among SD-admissible rules. We
juxtapose traditional criteria based on risk, regret, or Bayes risk with
analogous ones based on quantiles of state-dependent sampling distributions or
the Bayes distribution of loss.
arXiv link: http://arxiv.org/abs/2308.05171v1
A Guide to Impact Evaluation under Sample Selection and Missing Data: Teacher's Aides and Adolescent Mental Health
testing in causal evaluation problems when data is selective and/or missing. We
leverage recent advances in the literature on graphical methods to provide a
unifying framework for guiding empirical practice. The approach integrates and
connects to prominent identification and testing strategies in the literature
on missing data, causal machine learning, panel data analysis, and more. We
demonstrate its utility in the context of identification and specification
testing in sample selection models and field experiments with attrition. We
provide a novel analysis of a large-scale cluster-randomized controlled
teacher's aide trial in Danish schools at grade 6. Even with detailed
administrative data, the handling of missing data crucially affects broader
conclusions about effects on mental health. Results suggest that teaching
assistants provide an effective way of improving internalizing behavior for
large parts of the student population.
arXiv link: http://arxiv.org/abs/2308.04963v1
Causal Interpretation of Linear Social Interaction Models with Endogenous Networks
interaction models in the presence of endogeneity in network formation under a
heterogeneous treatment effects framework. We consider an experimental setting
in which individuals are randomly assigned to treatments while no interventions
are made for the network structure. We show that running a linear regression
ignoring network endogeneity is not problematic for estimating the average
direct treatment effect. However, it leads to sample selection bias and
negative-weights problem for the estimation of the average spillover effect. To
overcome these problems, we propose using potential peer treatment as an
instrumental variable (IV), which is automatically a valid IV for actual
spillover exposure. Using this IV, we examine two IV-based estimands and
demonstrate that they have a local average treatment-effect-type causal
interpretation for the spillover effect.
arXiv link: http://arxiv.org/abs/2308.04276v2
Threshold Regression in Heterogeneous Panel Data with Interactive Fixed Effects
regression. We develop a comprehensive asymptotic theory for models with
heterogeneous thresholds, heterogeneous slope coefficients, and interactive
fixed effects. Our estimation methodology employs the Common Correlated Effects
approach, which is able to handle heterogeneous coefficients while maintaining
computational simplicity. We also propose a semi-homogeneous model with
heterogeneous slopes but a common threshold, revealing novel mean group
estimator convergence rates due to the interaction of heterogeneity with the
shrinking threshold assumption. Tests for linearity are provided, and also a
modified information criterion which can choose between the fully heterogeneous
and the semi-homogeneous models. Monte Carlo simulations demonstrate the good
performance of the new methods in small samples. The new theory is applied to
examine the Feldstein-Horioka puzzle and it is found that threshold
nonlinearity with respect to trade openness exists only in a small subset of
countries.
arXiv link: http://arxiv.org/abs/2308.04057v2
Measuring income inequality via percentile relativities
distributions are getting more right skewed and heavily tailed. For such
distributions, the mean is not the best measure of the center, but the
classical indices of income inequality, including the celebrated Gini index,
are all mean-based. In view of this, Professor Gastwirth sounded an alarm back
in 2014 by suggesting to incorporate the median into the definition of the Gini
index, although noted a few shortcomings of his proposed index. In the present
paper we make a further step in the modification of classical indices and, to
acknowledge the possibility of differing viewpoints, arrive at three
median-based indices of inequality. They avoid the shortcomings of the previous
indices and can be used even when populations are ultra heavily tailed, that
is, when their first moments are infinite. The new indices are illustrated both
analytically and numerically using parametric families of income distributions,
and further illustrated using capital incomes coming from 2001 and 2018 surveys
of fifteen European countries. We also discuss the performance of the indices
from the perspective of income transfers.
arXiv link: http://arxiv.org/abs/2308.03708v1
Treatment Effects in Staggered Adoption Designs with Non-Parallel Trends
staggered treatment adoption setting -- that is, where a researcher has access
to panel data and treatment timing varies across units. We consider the case
where untreated potential outcomes may follow non-parallel trends over time
across groups. This implies that the identifying assumptions of leading
approaches such as difference-in-differences do not hold. We mainly focus on
the case where untreated potential outcomes are generated by an interactive
fixed effects model and show that variation in treatment timing provides
additional moment conditions that can be used to recover a large class of
target causal effect parameters. Our approach exploits the variation in
treatment timing without requiring either (i) a large number of time periods or
(ii) requiring any extra exclusion restrictions. This is in contrast to
essentially all of the literature on interactive fixed effects models which
requires at least one of these extra conditions. Rather, our approach directly
applies in settings where there is variation in treatment timing. Although our
main focus is on a model with interactive fixed effects, our idea of using
variation in treatment timing to recover causal effect parameters is quite
general and could be adapted to other settings with non-parallel trends across
groups such as dynamic panel data models.
arXiv link: http://arxiv.org/abs/2308.02899v1
Composite Quantile Factor Model
factor analysis in high-dimensional panel data. We propose to estimate the
factors and factor loadings across multiple quantiles of the data, allowing the
estimates to better adapt to features of the data at different quantiles while
still modeling the mean of the data. We develop the limiting distribution of
the estimated factors and factor loadings, and an information criterion for
consistent factor number selection is also discussed. Simulations show that the
proposed estimator and the information criterion have good finite sample
properties for several non-normal distributions under consideration. We also
consider an empirical study on the factor analysis for 246 quarterly
macroeconomic variables. A companion R package cqrfactor is developed.
arXiv link: http://arxiv.org/abs/2308.02450v2
Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models
missing is not at random and without the requirement of strong signals. Our
development is based on the observation that if the number of missing entries
is small enough compared to the panel size, then they can be estimated well
even when missing is not at random. Taking advantage of this fact, we divide
the missing entries into smaller groups and estimate each group via nuclear
norm regularization. In addition, we show that with appropriate debiasing, our
proposed estimate is asymptotically normal even for fairly weak signals. Our
work is motivated by recent research on the Tick Size Pilot Program, an
experiment conducted by the Security and Exchange Commission (SEC) to evaluate
the impact of widening the tick size on the market quality of stocks from 2016
to 2018. While previous studies were based on traditional regression or
difference-in-difference methods by assuming that the treatment effect is
invariant with respect to time and unit, our analyses suggest significant
heterogeneity across units and intriguing dynamics over time during the pilot
program.
arXiv link: http://arxiv.org/abs/2308.02364v1
Amortized neural networks for agent-based model forecasting
forecasting in agent-based models. The proposed algorithm is based on the
application of amortized neural networks and consists of two steps. The first
step simulates artificial datasets from the model. In the second step, a neural
network is trained to predict the future values of the variables using the
history of observations. The main advantage of the proposed algorithm is its
speed. This is due to the fact that, after the training procedure, it can be
used to yield predictions for almost any data without additional simulations or
the re-estimation of the neural network
arXiv link: http://arxiv.org/abs/2308.05753v1
Individual Shrinkage for Random Effects
individual-level forecasting in micropanels, targeting individual accuracy
rather than aggregate performance. The conventional shrinkage methods used in
the literature, such as the James-Stein estimator and Empirical Bayes, target
aggregate performance and can lead to inaccurate decisions at the individual
level. We propose a class of shrinkage estimators with individual weights (IW)
that leverage an individual's own past history, instead of the cross-sectional
dimension. This approach overcomes the "tyranny of the majority" inherent in
existing methods, while relying on weaker assumptions. A key contribution is
addressing the challenge of obtaining feasible weights from short time-series
data and under parameter heterogeneity. We discuss the theoretical optimality
of IW and recommend using feasible weights determined through a Minimax Regret
analysis in practice.
arXiv link: http://arxiv.org/abs/2308.01596v3
Limit Theory under Network Dependence and Nonstationarity
time series econometrics and network econometrics. We give emphasis on limit
theory for time series regression models as well as the use of the
local-to-unity parametrization when modeling time series nonstationarity.
Moreover, we present various non-asymptotic theory results for moderate
deviation principles when considering the eigenvalues of covariance matrices as
well as asymptotics for unit root moderate deviations in nonstationary
autoregressive processes. Although not all applications from the literature are
covered we also discuss some open problems in the time series and network
econometrics literature.
arXiv link: http://arxiv.org/abs/2308.01418v4
Analyzing the Reporting Error of Public Transport Trips in the Danish National Travel Survey Using Smart Card Data
and households' travel behavior. However, self-reported surveys are subject to
recall bias, as respondents might struggle to recall and report their
activities accurately. This study examines the time reporting error of public
transit users in a nationwide household travel survey by matching, at the
individual level, five consecutive years of data from two sources, namely the
Danish National Travel Survey (TU) and the Danish Smart Card system
(Rejsekort). Survey respondents are matched with travel cards from the
Rejsekort data solely based on the respondents' declared spatiotemporal travel
behavior. Approximately, 70% of the respondents were successfully matched with
Rejsekort travel cards. The findings reveal a median time reporting error of
11.34 minutes, with an Interquartile Range of 28.14 minutes. Furthermore, a
statistical analysis was performed to explore the relationships between the
survey respondents' reporting error and their socio-economic and demographic
characteristics. The results indicate that females and respondents with a fixed
schedule are in general more accurate than males and respondents with a
flexible schedule in reporting their times of travel. Moreover, trips reported
during weekdays or via the internet displayed higher accuracies compared to
trips reported during weekends and holidays or via telephone interviews. This
disaggregated analysis provides valuable insights that could help in improving
the design and analysis of travel surveys, as well accounting for reporting
errors/biases in travel survey-based applications. Furthermore, it offers
valuable insights underlying the psychology of travel recall by survey
respondents.
arXiv link: http://arxiv.org/abs/2308.01198v3
The Bayesian Context Trees State Space Model for time series modelling and forecasting
mixture models for time series, partly motivated by applications in finance and
forecasting. At the top level, meaningful discrete states are identified as
appropriately quantised values of some of the most recent samples. At the
bottom level, a different, arbitrary base model is associated with each state.
This defines a very general framework that can be used in conjunction with any
existing model class to build flexible and interpretable mixture models. We
call this the Bayesian Context Trees State Space Model, or the BCT-X framework.
Appropriate algorithmic tools are described, which allow for effective and
efficient Bayesian inference and learning; these algorithms can be updated
sequentially, facilitating online forecasting. The utility of the general
framework is illustrated in the particular instances when AR or ARCH models are
used as base models. The latter results in a mixture model that offers a
powerful way of modelling the well-known volatility asymmetries in financial
data, revealing a novel, important feature of stock market index data, in the
form of an enhanced leverage effect. In forecasting, the BCT-X methods are
found to outperform several state-of-the-art techniques, both in terms of
accuracy and computational requirements.
arXiv link: http://arxiv.org/abs/2308.00913v3
Testing for Threshold Effects in Presence of Heteroskedasticity and Measurement Error with an application to Italian Strikes
conditional mean and in the conditional variance and, in practice, it is
important to investigate separately these two aspects. Here we address the
issue of testing for threshold nonlinearity in the conditional mean, in the
presence of conditional heteroskedasticity. We propose a supremum Lagrange
Multiplier approach to test a linear ARMA-GARCH model against the alternative
of a TARMA-GARCH model. We derive the asymptotic null distribution of the test
statistic and this requires novel results since the difficulties of working
with nuisance parameters, absent under the null hypothesis, are amplified by
the non-linear moving average, combined with GARCH-type innovations. We show
that tests that do not account for heteroskedasticity fail to achieve the
correct size even for large sample sizes. Moreover, we show that the TARMA
specification naturally accounts for the ubiquitous presence of measurement
error that affects macroeconomic data. We apply the results to analyse the time
series of Italian strikes and we show that the TARMA-GARCH specification is
consistent with the relevant macroeconomic theory while capturing the main
features of the Italian strikes dynamics, such as asymmetric cycles and
regime-switching.
arXiv link: http://arxiv.org/abs/2308.00444v1
Randomization Inference of Heterogeneous Treatment Effects under Network Interference
the presence of network interference. Leveraging the exposure mapping
framework, we study a broad class of null hypotheses that represent various
forms of constant treatment effects in networked populations. These null
hypotheses, unlike the classical Fisher sharp null, are not sharp due to
unknown parameters and multiple potential outcomes. Existing conditional
randomization procedures either fail to control size or suffer from low
statistical power in this setting. We propose a testing procedure that
constructs a data-dependent focal assignment set and permits variation in focal
units across focal assignments. These features complicate both estimation and
inference, necessitating new technical developments. We establish the
asymptotic validity of the proposed procedure under general conditions on the
test statistic and characterize the asymptotic size distortion in terms of
observable quantities. The procedure is applied to experimental network data
and evaluated via Monte Carlo simulations.
arXiv link: http://arxiv.org/abs/2308.00202v5
What's Logs Got to do With it: On the Perils of log Dependent Variables and Difference-in-Differences
a difference-in-differences (DD) model. With a dependent variable in logs, the
DD term captures an approximation of the proportional difference in growth
rates across groups. As I show with both simulations and two empirical
examples, if the baseline outcome distributions are sufficiently different
across groups, the DD parameter for a log-specification can be different in
sign to that of a levels-specification. I provide a condition, based on (i) the
aggregate time effect, and (ii) the difference in relative baseline outcome
means, for when the sign-switch will occur.
arXiv link: http://arxiv.org/abs/2308.00167v3
A new mapping of technological interdependence
question by examining the influence of neighbors' innovativeness and the
structure of the innovators' network on a sector's capacity to develop new
technologies. We study these two dimensions of technological interdependence by
applying novel methods of text mining and network analysis to the documents of
6.5 million patents granted by the United States Patent and Trademark Office
(USPTO) between 1976 and 2021. We find that, in the long run, the influence of
network linkages is as important as that of neighbor innovativeness. In the
short run, however, positive shocks to neighbor innovativeness yield relatively
rapid effects, while the impact of shocks strengthening network linkages
manifests with delay, even though lasts longer. Our analysis also highlights
that patent text contains a wealth of information often not captured by
traditional innovation metrics, such as patent citations.
arXiv link: http://arxiv.org/abs/2308.00014v3
Causal Inference for Banking Finance and Insurance A Survey
by statistical models and artificial intelligence models. Of late, this field
started attracting the attention of researchers and practitioners alike. This
paper presents a comprehensive survey of 37 papers published during 1992-2023
and concerning the application of causal inference to banking, finance, and
insurance. The papers are categorized according to the following families of
domains: (i) Banking, (ii) Finance and its subdomains such as corporate
finance, governance finance including financial risk and financial policy,
financial economics, and Behavioral finance, and (iii) Insurance. Further, the
paper covers the primary ingredients of causal inference namely, statistical
methods such as Bayesian Causal Network, Granger Causality and jargon used
thereof such as counterfactuals. The review also recommends some important
directions for future research. In conclusion, we observed that the application
of causal inference in the banking and insurance sectors is still in its
infancy, and thus more research is possible to turn it into a viable method.
arXiv link: http://arxiv.org/abs/2307.16427v1
Inference for Low-rank Completion without Sample Splitting with Application to Treatment Effect Estimation
It also provides an inference method for the average treatment effect as an
application. We show that the least square estimation of eigenvectors following
the nuclear norm penalization attains the asymptotic normality. The key
contribution of our method is that it does not require sample splitting. In
addition, this paper allows dependent observation patterns and heterogeneous
observation probabilities. Empirically, we apply the proposed procedure to
estimating the impact of the presidential vote on allocating the U.S. federal
budget to the states.
arXiv link: http://arxiv.org/abs/2307.16370v1
Towards Practical Robustness Auditing for Linear Regression
small subsets of a dataset which, when removed, reverse the sign of a
coefficient in an ordinary least squares regression involving that dataset. We
empirically study the performance of well-established algorithmic techniques
for this task -- mixed integer quadratically constrained optimization for
general linear regression problems and exact greedy methods for special cases.
We show that these methods largely outperform the state of the art and provide
a useful robustness check for regression problems in a few dimensions. However,
significant computational bottlenecks remain, especially for the important task
of disproving the existence of such small sets of influential samples for
regression problems of dimension $3$ or greater. We make some headway on this
challenge via a spectral algorithm using ideas drawn from recent innovations in
algorithmic robust statistics. We summarize the limitations of known techniques
in several challenge datasets to encourage further algorithmic innovation.
arXiv link: http://arxiv.org/abs/2307.16315v1
Panel Data Models with Time-Varying Latent Group Structures
unobserved individual and time heterogeneities that are captured by some latent
group structures and an unknown structural break, respectively. To enhance
realism the model may have different numbers of groups and/or different group
memberships before and after the break. With the preliminary
nuclear-norm-regularized estimation followed by row- and column-wise linear
regressions, we estimate the break point based on the idea of binary
segmentation and the latent group structures together with the number of groups
before and after the break by sequential testing K-means algorithm
simultaneously. It is shown that the break point, the number of groups and the
group memberships can each be estimated correctly with probability approaching
one. Asymptotic distributions of the estimators of the slope coefficients are
established. Monte Carlo simulations demonstrate excellent finite sample
performance for the proposed estimation algorithm. An empirical application to
real house price data across 377 Metropolitan Statistical Areas in the US from
1975 to 2014 suggests the presence both of structural breaks and of changes in
group membership.
arXiv link: http://arxiv.org/abs/2307.15863v1
Group-Heterogeneous Changes-in-Changes and Distributional Synthetic Controls
controls when there exists group level heterogeneity. For changes-in-changes,
we allow individuals to belong to a large number of heterogeneous groups. The
new method extends the changes-in-changes method in Athey and Imbens (2006) by
finding appropriate subgroups within the control groups which share similar
group level unobserved characteristics to the treatment groups. For
distributional synthetic control, we show that the appropriate synthetic
control needs to be constructed using units in potentially different time
periods in which they have comparable group level heterogeneity to the
treatment group, instead of units that are only in the same time period as in
Gunsilius (2023). Implementation and data requirements for these new methods
are briefly discussed.
arXiv link: http://arxiv.org/abs/2307.15313v1
On the Efficiency of Finely Stratified Experiments
estimation of a large class of treatment effect parameters that arise in the
analysis of experiments. By a "finely stratified" design, we mean experiments
in which units are divided into groups of a fixed size and a proportion within
each group is assigned to a binary treatment uniformly at random. The class of
parameters considered are those that can be expressed as the solution to a set
of moment conditions constructed using a known function of the observed data.
They include, among other things, average treatment effects, quantile treatment
effects, and local average treatment effects as well as the counterparts to
these quantities in experiments in which the unit is itself a cluster. In this
setting, we establish three results. First, we show that under a finely
stratified design, the naive method of moments estimator achieves the same
asymptotic variance as what could typically be attained under alternative
treatment assignment mechanisms only through ex post covariate adjustment.
Second, we argue that the naive method of moments estimator under a finely
stratified design is asymptotically efficient by deriving a lower bound on the
asymptotic variance of regular estimators of the parameter of interest in the
form of a convolution theorem. In this sense, finely stratified experiments are
attractive because they lead to efficient estimators of treatment effect
parameters "by design." Finally, we strengthen this conclusion by establishing
conditions under which a "fast-balancing" property of finely stratified designs
is in fact necessary for the naive method of moments estimator to attain the
efficiency bound.
arXiv link: http://arxiv.org/abs/2307.15181v6
Predictability Tests Robust against Parameter Instability
structural break testing based on the instrumentation method of Phillips and
Magdalinos (2009). We show that under the assumption of nonstationary
predictors: (i) the tests based on the OLS estimators converge to a nonstandard
limiting distribution which depends on the nuisance coefficient of persistence;
and (ii) the tests based on the IVX estimators can filter out the persistence
under certain parameter restrictions due to the supremum functional. These
results contribute to the literature of joint predictability and parameter
instability testing by providing analytical tractable asymptotic theory when
taking into account nonstationary regressors. We compare the finite-sample size
and power performance of the Wald tests under both estimators via extensive
Monte Carlo experiments. Critical values are computed using standard bootstrap
inference methodologies. We illustrate the usefulness of the proposed framework
to test for predictability under the presence of parameter instability by
examining the stock market predictability puzzle for the US equity premium.
arXiv link: http://arxiv.org/abs/2307.15151v1
One-step smoothing splines instrumental regression
is endogeneity and instrumental variables are available. Unlike popular
existing estimators, the resulting estimator is one-step and relies on a unique
regularization parameter. We derive rates of the convergence for the estimator
and its first derivative, which are uniform in the support of the endogenous
variable. We also address the issue of imposing monotonicity in estimation and
extend the approach to a partly linear model. Simulations confirm the good
performances of our estimator compared to two-step procedures. Our method
yields economically sensible results when used to estimate Engel curves.
arXiv link: http://arxiv.org/abs/2307.14867v4
Weak (Proxy) Factors Robust Hansen-Jagannathan Distance For Linear Asset Pricing Models
measures of model misspecification. However, the conventional HJ specification
test procedure has poor finite sample performance, and we show that it can be
size distorted even in large samples when (proxy) factors exhibit small
correlations with asset returns. In other words, applied researchers are likely
to falsely reject a model even when it is correctly specified. We provide two
alternatives for the HJ statistic and two corresponding novel procedures for
model specification tests, which are robust against the presence of weak
(proxy) factors, and we also offer a novel robust risk premia estimator.
Simulation exercises support our theory. Our empirical application documents
the non-reliability of the traditional HJ test since it may produce
counter-intuitive results when comparing nested models by rejecting a
four-factor model but not the reduced three-factor model. At the same time, our
proposed methods are practically more appealing and show support for a
four-factor model for Fama French portfolios.
arXiv link: http://arxiv.org/abs/2307.14499v1
Bootstrapping Nonstationary Autoregressive Processes with Predictive Regression Models
proposed by Phillips and Magdalinos (2009) for the predictive regression model
parameter based on a local-to-unity specification of the autoregressive
coefficient which covers both nearly nonstationary and nearly stationary
processes. A mixed Gaussian limit distribution is obtained for the
bootstrap-based IVX estimator. The statistical validity of the theoretical
results are illustrated by Monte Carlo experiments for various statistical
inference problems.
arXiv link: http://arxiv.org/abs/2307.14463v1
Causal Effects in Matching Mechanisms with Strategically Reported Preferences
students to schools in a way that reflects student preferences and school
priorities. However, most real-world mechanisms incentivize students to
strategically misreport their preferences. Misreporting complicates the
identification of causal parameters that depend on true preferences, which are
necessary inputs for a broad class of counterfactual analyses. In this paper,
we provide an identification approach that is robust to strategic misreporting
and derive sharp bounds on causal effects of school assignment on future
outcomes. Our approach applies to any mechanism as long as there exist
placement scores and cutoffs that characterize that mechanism's allocation
rule. We use data from a deferred acceptance mechanism that assigns students to
more than 1,000 university--major combinations in Chile. Matching theory
predicts and empirical evidence suggests that students behave strategically in
Chile because they face constraints on their submission of preferences and have
good a priori information on the schools they will have access to. Our bounds
are informative enough to reveal significant heterogeneity in graduation
success with respect to preferences and school assignment.
arXiv link: http://arxiv.org/abs/2307.14282v3
Dynamic Regression Discontinuity: An Event-Study Approach
intertemporal treatment effects in dynamic regression discontinuity designs
(RDDs). Specifically, I develop a dynamic potential outcomes model and
reformulate two assumptions from the difference-in-differences literature, no
anticipation and common trends, to attain point identification of
cutoff-specific impulse responses. The estimand of each target parameter can be
expressed as the sum of two static RDD contrasts, thereby allowing for
nonparametric estimation and inference with standard local polynomial methods.
I also propose a nonparametric approach to aggregate treatment effects across
calendar time and treatment paths, leveraging a limited path independence
restriction to reduce the dimensionality of the parameter space. I apply this
method to estimate the dynamic effects of school district expenditure
authorizations on housing prices in Wisconsin.
arXiv link: http://arxiv.org/abs/2307.14203v5
Using Probabilistic Stated Preference Analyses to Understand Actual Choices
research proposes a novel approach to researchers who have access to both
stated choices in hypothetical scenarios and actual choices. The key idea is to
use probabilistic stated choices to identify the distribution of individual
unobserved heterogeneity, even in the presence of measurement error. If this
unobserved heterogeneity is the source of endogeneity, the researcher can
correct for its influence in a demand function estimation using actual choices,
and recover causal effects. Estimation is possible with an off-the-shelf Group
Fixed Effects estimator.
arXiv link: http://arxiv.org/abs/2307.13966v1
The Core of Bayesian Persuasion
the frequency with which she takes actions conditional on a payoff relevant
state. In this setting, we ask when the analyst can rationalize the agent's
choices as the outcome of the agent learning something about the state before
taking action. Our characterization marries the obedience approach in
information design (Bergemann and Morris, 2016) and the belief approach in
Bayesian persuasion (Kamenica and Gentzkow, 2011) relying on a theorem by
Strassen (1965) and Hall's marriage theorem. We apply our results to
ring-network games and to identify conditions under which a data set is
consistent with a public information structure in first-order Bayesian
persuasion games.
arXiv link: http://arxiv.org/abs/2307.13849v1
Source Condition Double Robust Inference on Functionals of Inverse Problems
solutions to linear inverse problems. Any such parameter admits a doubly robust
representation that depends on the solution to a dual linear inverse problem,
where the dual solution can be thought as a generalization of the inverse
propensity function. We provide the first source condition double robust
inference method that ensures asymptotic normality around the parameter of
interest as long as either the primal or the dual inverse problem is
sufficiently well-posed, without knowledge of which inverse problem is the more
well-posed one. Our result is enabled by novel guarantees for iterated Tikhonov
regularized adversarial estimators for linear inverse problems, over general
hypothesis spaces, which are developments of independent interest.
arXiv link: http://arxiv.org/abs/2307.13793v1
Characteristics and Predictive Modeling of Short-term Impacts of Hurricanes on the US Employment
and the well-being of employees. However, a comprehensive understanding of
these impacts remains elusive as many studies focused on narrow subsets of
regions or hurricanes. Here we present an open-source dataset that serves
interdisciplinary research on hurricane impacts on US employment. Compared to
past domain-specific efforts, this dataset has greater spatial-temporal
granularity and variable coverage. To demonstrate potential applications of
this dataset, we focus on the short-term employment disruptions related to
hurricanes during 1990-2020. The observed county-level employment changes in
the initial month are small on average, though large employment losses (>30%)
can occur after extreme storms. The overall small changes partly result from
compensation among different employment sectors, which may obscure large,
concentrated employment losses after hurricanes. Additional econometric
analyses concur on the post-storm employment losses in hospitality and leisure
but disagree on employment changes in the other industries. The dataset also
enables data-driven analyses that highlight vulnerabilities such as pronounced
employment losses related to Puerto Rico and rainy hurricanes. Furthermore,
predictive modeling of short-term employment changes shows promising
performance for service-providing industries and high-impact storms. In the
examined cases, the nonlinear Random Forests model greatly outperforms the
multiple linear regression model. The nonlinear model also suggests that more
severe hurricane hazards projected by physical models may cause more extreme
losses in US service-providing employment. Finally, we share our dataset and
analytical code to facilitate the study and modeling of hurricane impacts in a
changing climate.
arXiv link: http://arxiv.org/abs/2307.13686v3
Smoothing of numerical series by the triangle method on the example of hungarian gdp data 1992-2022 based on approximation by series of exponents
of a table in the form of some functional dependence . The observed values ,
due to certain circumstances , have an error . For approximation, it is
advisable to use a functional dependence that would allow smoothing out the
errors of the observation results. Approximation allows you to determine
intermediate values of functions that are not listed among the data in the
observation table. The use of exponential series for data approximation allows
you to get a result no worse than from approximation by polynomials In the
economic scientific literature, approximation in the form of power functions,
for example, the Cobb-Douglas function, has become widespread. The advantage of
this type of approximation can be called a simple type of approximating
function , and the disadvantage is that in nature not all processes can be
described by power functions with a given accuracy. An example is the GDP
indicator for several decades . For this case , it is difficult to find a power
function approximating a numerical series . But in this case, as shown in this
article, you can use exponential series to approximate the data. In this paper,
the time series of Hungary's GDP in the period from 1992 to 2022 was
approximated by a series of thirty exponents of a complex variable. The use of
data smoothing by the method of triangles allows you to average the data and
increase the accuracy of approximation . This is of practical importance if the
observed random variable contains outliers that need to be smoothed out.
arXiv link: http://arxiv.org/abs/2307.14378v1
Large sample properties of GMM estimators under second-order identification
distribution theory for GMM estimators for a p - dimensional globally
identified parameter vector {\phi} when local identification conditions fail at
first-order but hold at second-order. They assumed that the first-order
underidentification is due to the expected Jacobian having rank p-1 at the true
value {\phi}_{0}, i.e., having a rank deficiency of one. After reparametrizing
the model such that the last column of the Jacobian vanishes, they showed that
the GMM estimator of the first p-1 parameters converges at rate T^{-1/2} and
the GMM estimator of the remaining parameter, {\phi}_{p}, converges at rate
T^{-1/4}. They also provided a limiting distribution of
T^{1/4}({\phi}_{p}-{\phi}_{0,p}) subject to a (non-transparent) condition which
they claimed to be not restrictive in general. However, as we show in this
paper, their condition is in fact only satisfied when {\phi} is overidentified
and the limiting distribution of T^{1/4}({\phi}_{p}-{\phi}_{0,p}), which is
non-standard, depends on whether {\phi} is exactly identified or
overidentified. In particular, the limiting distributions of the sign of
T^{1/4}({\phi}_{p}-{\phi}_{0,p}) for the cases of exact and overidentification,
respectively, are different and are obtained by using expansions of the GMM
objective function of different orders. Unsurprisingly, we find that the
limiting distribution theories of Dovonon and Hall (2018) for Indirect
Inference (II) estimation under two different scenarios with second-order
identification where the target function is a GMM estimator of the auxiliary
parameter vector, are incomplete for similar reasons. We discuss how our
results for GMM estimation can be used to complete both theories and how they
can be used to obtain the limiting distributions of the II estimators in the
case of exact identification under either scenario.
arXiv link: http://arxiv.org/abs/2307.13475v2
Testing for sparse idiosyncratic components in factor-augmented regression models
against a sparse plus dense alternative augmenting model with sparse
idiosyncratic components. The asymptotic properties of the test are established
under time series dependence and polynomial tails. We outline a data-driven
rule to select the tuning parameter and prove its theoretical validity. In
simulation experiments, our procedure exhibits high power against sparse
alternatives and low power against dense deviations from the null. Moreover, we
apply our test to various datasets in macroeconomics and finance and often
reject the null. This suggests the presence of sparsity -- on top of a dense
model -- in commonly studied economic applications. The R package FAS
implements our approach.
arXiv link: http://arxiv.org/abs/2307.13364v4
Inference in Experiments with Matched Pairs and Imperfect Compliance
randomized controlled trials with imperfect compliance where treatment status
is determined according to "matched pairs." By "matched pairs," we mean that
units are sampled i.i.d. from the population of interest, paired according to
observed, baseline covariates and finally, within each pair, one unit is
selected at random for treatment. Under weak assumptions governing the quality
of the pairings, we first derive the limit distribution of the usual Wald
(i.e., two-stage least squares) estimator of the local average treatment
effect. We show further that conventional heteroskedasticity-robust estimators
of the Wald estimator's limiting variance are generally conservative, in that
their probability limits are (typically strictly) larger than the limiting
variance. We therefore provide an alternative estimator of the limiting
variance that is consistent. Finally, we consider the use of additional
observed, baseline covariates not used in pairing units to increase the
precision with which we can estimate the local average treatment effect. To
this end, we derive the limiting behavior of a two-stage least squares
estimator of the local average treatment effect which includes both the
additional covariates in addition to pair fixed effects, and show that its
limiting variance is always less than or equal to that of the Wald estimator.
To complete our analysis, we provide a consistent estimator of this limiting
variance. A simulation study confirms the practical relevance of our
theoretical results. Finally, we apply our results to revisit a prominent
experiment studying the effect of macroinsurance on microenterprise in Egypt.
arXiv link: http://arxiv.org/abs/2307.13094v2
Identification Robust Inference for the Risk Premium in Term Structure Models
risk premia in dynamic affine term structure models. We do so using the moment
equation specification proposed for these models in Adrian et al. (2013). We
extend the subset (factor) Anderson-Rubin test from Guggenberger et al. (2012)
to models with multiple dynamic factors and time-varying risk prices. Unlike
projection-based tests, it provides a computationally tractable manner to
conduct identification robust tests on a larger number of parameters. We
analyze the potential identification issues arising in empirical studies.
Statistical inference based on the three-stage estimator from Adrian et al.
(2013) requires knowledge of the factors' quality and is misleading without
full-rank beta's or with sampling errors of comparable size as the loadings.
Empirical applications show that some factors, though potentially weak, may
drive the time variation of risk prices, and weak identification issues are
more prominent in multi-factor models.
arXiv link: http://arxiv.org/abs/2307.12628v1
Scenario Sampling for Large Supermodular Games
log-likelihood function of a large supermodular binary-action game. Covered
examples include (certain types of) peer effect, technology adoption, strategic
network formation, and multi-market entry games. More generally, the algorithm
facilitates simulated maximum likelihood (SML) estimation of games with large
numbers of players, $T$, and/or many binary actions per player, $M$ (e.g.,
games with tens of thousands of strategic actions, $TM=O(10^4)$). In such cases
the likelihood of the observed pure strategy combination is typically (i) very
small and (ii) a $TM$-fold integral who region of integration has a complicated
geometry. Direct numerical integration, as well as accept-reject Monte Carlo
integration, are computationally impractical in such settings. In contrast, we
introduce a novel importance sampling algorithm which allows for accurate
likelihood simulation with modest numbers of simulation draws.
arXiv link: http://arxiv.org/abs/2307.11857v1
Functional Differencing in Networks
(such as workers or firms) sort and produce. However, most existing estimation
approaches either require the network to be dense, which is at odds with many
empirical networks, or they require restricting the form of heterogeneity and
the network formation process. We show how the functional differencing approach
introduced by Bonhomme (2012) in the context of panel data, can be applied in
network settings to derive moment restrictions on model parameters and average
effects. Those restrictions are valid irrespective of the form of
heterogeneity, and they hold in both dense and sparse networks. We illustrate
the analysis with linear and nonlinear models of matched employer-employee
data, in the spirit of the model introduced by Abowd, Kramarz, and Margolis
(1999).
arXiv link: http://arxiv.org/abs/2307.11484v1
Asymptotically Unbiased Synthetic Control Methods by Density Matching
comparative case studies. The core idea behind SCMs is to estimate treatment
effects by predicting counterfactual outcomes for a treated unit using a
weighted combination of observed outcomes from untreated units. The accuracy of
these predictions is crucial for evaluating the treatment effect of a policy
intervention. Subsequent research has therefore focused on estimating SC
weights. In this study, we highlight a key endogeneity issue in existing
SCMs-namely, the correlation between the outcomes of untreated units and the
error term of the synthetic control, which leads to bias in both counterfactual
outcome prediction and treatment effect estimation. To address this issue, we
propose a novel SCM based on density matching, assuming that the outcome
density of the treated unit can be approximated by a weighted mixture of the
joint density of untreated units. Under this assumption, we estimate SC weights
by matching the moments of the treated outcomes with the weighted sum of the
moments of the untreated outcomes. Our method offers three advantages: first,
under the mixture model assumption, our estimator is asymptotically unbiased;
second, this asymptotic unbiasedness reduces the mean squared error in
counterfactual predictions; and third, our method provides full densities of
the treatment effect rather than just expected values, thereby broadening the
applicability of SCMs. Finally, we present experimental results that
demonstrate the effectiveness of our approach.
arXiv link: http://arxiv.org/abs/2307.11127v4
Real-Time Detection of Local No-Arbitrage Violations
violation of the standard It\^o semimartingale assumption for financial asset
prices in real time that might induce arbitrage opportunities. Our proposed
detectors, defined as stopping rules, are applied sequentially to continually
incoming high-frequency data. We show that they are asymptotically
exponentially distributed in the absence of Ito semimartingale violations. On
the other hand, when a violation occurs, we can achieve immediate detection
under infill asymptotics. A Monte Carlo study demonstrates that the asymptotic
results provide a good approximation to the finite-sample behavior of the
sequential detectors. An empirical application to S&P 500 index futures data
corroborates the effectiveness of our detectors in swiftly identifying the
emergence of an extreme return persistence episode in real time.
arXiv link: http://arxiv.org/abs/2307.10872v1
PySDTest: a Python/Stata Package for Stochastic Dominance Tests
stochastic dominance. PySDTest implements various testing procedures such as
Barrett and Donald (2003), Linton et al. (2005), Linton et al. (2010), and
Donald and Hsu (2016), along with their extensions. Users can flexibly combine
several resampling methods and test statistics, including the numerical delta
method (D\"umbgen, 1993; Hong and Li, 2018; Fang and Santos, 2019). The package
allows for testing advanced hypotheses on stochastic dominance relations, such
as stochastic maximality among multiple prospects. We first provide an overview
of the concepts of stochastic dominance and testing methods. Then, we offer
practical guidance for using the package and the Stata command pysdtest. We
apply PySDTest to investigate the portfolio choice problem between the daily
returns of Bitcoin and the S&P 500 index as an empirical illustration. Our
findings indicate that the S&P 500 index returns second-order stochastically
dominate the Bitcoin returns.
arXiv link: http://arxiv.org/abs/2307.10694v2
Latent Gaussian dynamic factor modeling and forecasting for multivariate count time series
high-dimensional count time series model constructed from a transformation of a
latent Gaussian dynamic factor series. The estimation of the latent model
parameters is based on second-order properties of the count and underlying
Gaussian time series, yielding estimators of the underlying covariance matrices
for which standard principal component analysis applies. Theoretical
consistency results are established for the proposed estimation, building on
certain concentration results for the models of the type considered. They also
involve the memory of the latent Gaussian process, quantified through a
spectral gap, shown to be suitably bounded as the model dimension increases,
which is of independent interest. In addition, novel cross-validation schemes
are suggested for model selection. The forecasting is carried out through a
particle-based sequential Monte Carlo, leveraging Kalman filtering techniques.
A simulation study and an application are also considered.
arXiv link: http://arxiv.org/abs/2307.10454v3
Asymptotic equivalence of Principal Components and Quasi Maximum Likelihood estimators in Large Approximate Factor Models
of an approximate factor model for an $n$-dimensional vector of stationary time
series. We prove that the factor loadings estimated by Quasi Maximum Likelihood
are asymptotically equivalent, as $n\to\infty$, to those estimated via
Principal Components. Both estimators are, in turn, also asymptotically
equivalent, as $n\to\infty$, to the unfeasible Ordinary Least Squares estimator
we would have if the factors were observed. We also show that the usual
sandwich form of the asymptotic covariance matrix of the Quasi Maximum
Likelihood estimator is asymptotically equivalent to the simpler asymptotic
covariance matrix of the unfeasible Ordinary Least Squares. All these results
hold in the general case in which the idiosyncratic components are
cross-sectionally heteroskedastic, as well as serially and cross-sectionally
weakly correlated. The intuition behind these results is that as $n\to\infty$
the factors can be considered as observed, thus showing that factor models
enjoy a blessing of dimensionality.
arXiv link: http://arxiv.org/abs/2307.09864v5
Risk Preference Types, Limited Consideration, and Welfare
of a mixture model of decision making under risk, when agents make choices in
multiple lines of insurance coverage (contexts) by purchasing a bundle. As a
first departure from the related literature, the model allows for two
preference types. In the first one, agents behave according to standard
expected utility theory with CARA Bernoulli utility function, with an
agent-specific coefficient of absolute risk aversion whose distribution is left
completely unspecified. In the other, agents behave according to the dual
theory of choice under risk(Yaari, 1987) combined with a one-parameter family
distortion function, where the parameter is agent-specific and is drawn from a
distribution that is left completely unspecified. Within each preference type,
the model allows for unobserved heterogeneity in consideration sets, where the
latter form at the bundle level -- a second departure from the related
literature. Our point identification result rests on observing sufficient
variation in covariates across contexts, without requiring any independent
variation across alternatives within a single context. We estimate the model on
data on households' deductible choices in two lines of property insurance, and
use the results to assess the welfare implications of a hypothetical market
intervention where the two lines of insurance are combined into a single one.
We study the role of limited consideration in mediating the welfare effects of
such intervention.
arXiv link: http://arxiv.org/abs/2307.09411v1
Comparative Analysis of Machine Learning, Hybrid, and Deep Learning Forecasting Models Evidence from European Financial Markets and Bitcoins
financial markets and the cryptocurrency market over an extended period,
encompassing the pre, during, and post-pandemic periods. Daily financial market
indices and price observations are used to assess the forecasting models. We
compare statistical, machine learning, and deep learning forecasting models to
evaluate the financial markets, such as the ARIMA, hybrid ETS-ANN, and kNN
predictive models. The study results indicate that predicting financial market
fluctuations is challenging, and the accuracy levels are generally low in
several instances. ARIMA and hybrid ETS-ANN models perform better over extended
periods compared to the kNN model, with ARIMA being the best-performing model
in 2018-2021 and the hybrid ETS-ANN model being the best-performing model in
most of the other subperiods. Still, the kNN model outperforms the others in
several periods, depending on the observed accuracy measure. Researchers have
advocated using parametric and non-parametric modeling combinations to generate
better results. In this study, the results suggest that the hybrid ETS-ANN
model is the best-performing model despite its moderate level of accuracy.
Thus, the hybrid ETS-ANN model is a promising financial time series forecasting
approach. The findings offer financial analysts an additional source that can
provide valuable insights for investment decisions.
arXiv link: http://arxiv.org/abs/2307.08853v1
Supervised Dynamic PCA: Linear Dynamic Forecasting with Many Predictors
Principal Component Analysis (PCA) when a large number of predictors are
available. The new supervised PCA provides an effective way to bridge the gap
between predictors and the target variable of interest by scaling and combining
the predictors and their lagged values, resulting in an effective dynamic
forecasting. Unlike the traditional diffusion-index approach, which does not
learn the relationships between the predictors and the target variable before
conducting PCA, we first re-scale each predictor according to their
significance in forecasting the targeted variable in a dynamic fashion, and a
PCA is then applied to a re-scaled and additive panel, which establishes a
connection between the predictability of the PCA factors and the target
variable. Furthermore, we also propose to use penalized methods such as the
LASSO approach to select the significant factors that have superior predictive
power over the others. Theoretically, we show that our estimators are
consistent and outperform the traditional methods in prediction under some mild
conditions. We conduct extensive simulations to verify that the proposed method
produces satisfactory forecasting results and outperforms most of the existing
methods using the traditional PCA. A real example of predicting U.S.
macroeconomic variables using a large number of predictors showcases that our
method fares better than most of the existing ones in applications. The
proposed method thus provides a comprehensive and effective approach for
dynamic forecasting in high-dimensional data analysis.
arXiv link: http://arxiv.org/abs/2307.07689v1
Sparsified Simultaneous Confidence Intervals for High-Dimensional Linear Models
challenging because the uncertainty introduced by the model selection procedure
is hard to account for. A critical question remains unsettled; that is, is it
possible and how to embed the inference of the model into the simultaneous
inference of the coefficients? To this end, we propose a notion of simultaneous
confidence intervals called the sparsified simultaneous confidence intervals.
Our intervals are sparse in the sense that some of the intervals' upper and
lower bounds are shrunken to zero (i.e., $[0,0]$), indicating the unimportance
of the corresponding covariates. These covariates should be excluded from the
final model. The rest of the intervals, either containing zero (e.g., $[-1,1]$
or $[0,1]$) or not containing zero (e.g., $[2,3]$), indicate the plausible and
significant covariates, respectively. The proposed method can be coupled with
various selection procedures, making it ideal for comparing their uncertainty.
For the proposed method, we establish desirable asymptotic properties, develop
intuitive graphical tools for visualization, and justify its superior
performance through simulation and real data analysis.
arXiv link: http://arxiv.org/abs/2307.07574v2
Global path preference and local response: A reward decomposition approach for network path choice analysis in the presence of locally perceived attributes
preferences of network travelers. To this end, a reward decomposition approach
is proposed and integrated into a link-based recursive (Markovian) path choice
model. The approach decomposes the instantaneous reward function associated
with each state-action pair into the global utility, a function of attributes
globally perceived from anywhere in the network, and the local utility, a
function of attributes that are only locally perceived from the current state.
Only the global utility then enters the value function of each state,
representing the future expected utility toward the destination. This
global-local path choice model with decomposed reward functions allows us to
analyze to what extent and which attributes affect the global and local path
choices of agents. Moreover, unlike most adaptive path choice models, the
proposed model can be estimated based on revealed path observations (without
the information of plans) and as efficiently as deterministic recursive path
choice models. The model was applied to the real pedestrian path choice
observations in an urban street network where the green view index was
extracted as a visual street quality from Google Street View images. The result
revealed that pedestrians locally perceive and react to the visual street
quality, rather than they have the pre-trip global perception on it.
Furthermore, the simulation results using the estimated models suggested the
importance of location selection of interventions when policy-related
attributes are only locally perceived by travelers.
arXiv link: http://arxiv.org/abs/2307.08646v1
Choice Models and Permutation Invariance: Demand Estimation in Differentiated Products Markets
competitive landscape affect consumer choices and reshape market equilibria. In
this paper, we propose a fundamental characterization of choice functions that
encompasses a wide variety of extant choice models. We demonstrate how
non-parametric estimators like neural nets can easily approximate such
functionals and overcome the curse of dimensionality that is inherent in the
non-parametric estimation of choice functions. We demonstrate through extensive
simulations that our proposed functionals can flexibly capture underlying
consumer behavior in a completely data-driven fashion and outperform
traditional parametric models. As demand settings often exhibit endogenous
features, we extend our framework to incorporate estimation under endogenous
features. Further, we also describe a formal inference procedure to construct
valid confidence intervals on objects of interest like price elasticity.
Finally, to assess the practical applicability of our estimator, we utilize a
real-world dataset from S. Berry, Levinsohn, and Pakes (1995). Our empirical
analysis confirms that the estimator generates realistic and comparable own-
and cross-price elasticities that are consistent with the observations reported
in the existing literature.
arXiv link: http://arxiv.org/abs/2307.07090v2
The Canonical Decomposition of Factor Models: Weak Factors are Everywhere
factor model, where the factors are loaded contemporaneously by the common
component, and the Generalised Dynamic Factor Model, where the factors are
loaded with lags. In this paper we derive a canonical decomposition which nests
both models by introducing the weak common component which is the difference
between the dynamic- and the static common component. Such component is driven
by potentially infinitely many non-pervasive weak factors which live in the
dynamically common space (not to be confused with rate-weak factors, being
pervasive but associated with a slower rate). Our result shows that the
relation between the two approaches is far more rich and complex than what
usually assumed. We exemplify why the weak common component shall not be
neglected by means of theoretical and empirical examples. Furthermore, we
propose a simple estimation procedure for the canonical decomposition. Our
empirical estimates on US macroeconomic data reveal that the weak common
component can account for a large part of the variation of individual
variables. Furthermore in a pseudo real-time forecasting evaluation for
industrial production and inflation, we show that gains can be obtained from
considering the dynamic approach over the static approach.
arXiv link: http://arxiv.org/abs/2307.10067v3
The Yule-Frisch-Waugh-Lovell Theorem for Linear Instrumental Variables Estimation
First, I show that the theorem holds for linear instrumental variables
estimation of a multiple regression model that is either exactly or
overidentified. I show that with linear instrumental variables estimation: (a)
coefficients on endogenous variables are identical in full and partial (or
residualized) regressions; (b) residual vectors are identical for full and
partial regressions; and (c) estimated covariance matrices of the coefficient
vectors from full and partial regressions are equal (up to a degree of freedom
correction) if the estimator of the error vector is a function only of the
residual vectors and does not use any information about the covariate matrix
other than its dimensions. While estimation of the full model uses the full set
of instrumental variables, estimation of the partial model uses the
residualized version of the same set of instrumental variables, with
residualization carried out with respect to the set of exogenous variables.
Second, I show that: (a) the theorem applies in large samples to the K-class of
estimators, including the limited information maximum likelihood (LIML)
estimator, and (b) the theorem does not apply in general to linear GMM
estimators, but it does apply to the two step optimal linear GMM estimator.
Third, I trace the historical and analytical development of the theorem and
suggest that it be renamed as the Yule-Frisch-Waugh-Lovell (YFWL) theorem to
recognize the pioneering contribution of the statistician G. Udny Yule in its
development.
arXiv link: http://arxiv.org/abs/2307.12731v2
Stationarity with Occasionally Binding Constraints
known as censored and kinked structural vector autoregressions (CKSVAR), which
are notably able to accommodate series that are subject to occasionally binding
constraints. We develop a set of sufficient conditions for the processes
generated by a CKSVAR to be stationary, ergodic, and weakly dependent. Our
conditions relate directly to the stability of the deterministic part of the
model, and are therefore less conservative than those typically available for
general vector threshold autoregressive (VTAR) models. Though our criteria
refer to quantities, such as refinements of the joint spectral radius, that
cannot feasibly be computed exactly, they can be approximated numerically to a
high degree of precision.
arXiv link: http://arxiv.org/abs/2307.06190v2
Identification in Multiple Treatment Models under Discrete Variation
models with discrete-valued instruments. We allow selection into treatment to
be governed by a general class of threshold crossing models that permits
multidimensional unobserved heterogeneity. Under a semi-parametric restriction
on the distribution of unobserved heterogeneity, we show how a sequence of
linear programs can be used to compute sharp bounds for a number of treatment
effect parameters when the marginal treatment response functions underlying
them remain nonparametric or are additionally parameterized.
arXiv link: http://arxiv.org/abs/2307.06174v1
Robust Impulse Responses using External Instruments: the Role of Information
is not invertible and the measurement error is present. We propose to use this
identification strategy in a structural Dynamic Factor Model, which we call
Proxy DFM. In a simulation analysis, we show that the Proxy DFM always
successfully retrieves the true impulse responses, while the Proxy SVAR
systematically fails to do so when the model is either misspecified, does not
include all relevant information, or the measurement error is present. In an
application to US monetary policy, the Proxy DFM shows that a tightening shock
is unequivocally contractionary, with deteriorations in domestic demand, labor,
credit, housing, exchange, and financial markets. This holds true for all raw
instruments available in the literature. The variance decomposition analysis
highlights the importance of monetary policy shocks in explaining economic
fluctuations, albeit at different horizons.
arXiv link: http://arxiv.org/abs/2307.06145v1
What Does it Take to Control Global Temperatures? A toolbox for testing and estimating the impact of economic policies on climate
through economic policies. It provides a toolbox for a statistical historical
assessment of a Stochastic Integrated Model of Climate and the Economy, and its
use in (possibly counterfactual) policy analysis. Recognizing that
stabilization requires supressing a trend, we use an integrated-cointegrated
Vector Autoregressive Model estimated using a newly compiled dataset ranging
between years A.D. 1000-2008, extending previous results on Control Theory in
nonstationary systems. We test statistically whether, and quantify to what
extent, carbon abatement policies can effectively stabilize or reduce global
temperatures. Our formal test of policy feasibility shows that carbon abatement
can have a significant long run impact and policies can render temperatures
stationary around a chosen long run mean. In a counterfactual empirical
illustration of the possibilities of our modeling strategy, we study a
retrospective policy aiming to keep global temperatures close to their 1900
historical level. Achieving this via carbon abatement may cost about 75% of the
observed 2008 level of world GDP, a cost equivalent to reverting to levels of
output historically observed in the mid 1960s. By contrast, investment in
carbon neutral technology could achieve the policy objective and be
self-sustainable as long as it costs less than 50% of 2008 global GDP and 75%
of consumption.
arXiv link: http://arxiv.org/abs/2307.05818v2
Synthetic Decomposition for Counterfactual Predictions
beyond its pre-policy support. However, in many cases, information about the
policy of interest is available from different ("source") regions where a
similar policy has already been implemented. In this paper, we propose a novel
method of using such data from source regions to predict a new policy in a
target region. Instead of relying on extrapolation of a structural relationship
using a parametric specification, we formulate a transferability condition and
construct a synthetic outcome-policy relationship such that it is as close as
possible to meeting the condition. The synthetic relationship weighs both the
similarity in distributions of observables and in structural relationships. We
develop a general procedure to construct asymptotic confidence intervals for
counterfactual predictions and prove its asymptotic validity. We then apply our
proposal to predict average teenage employment in Texas following a
counterfactual increase in the minimum wage.
arXiv link: http://arxiv.org/abs/2307.05122v2
Decentralized Decision-Making in Retail Chains: Evidence from Inventory Management
decision-making in multi-establishment firms using data from a large retail
chain. Analyzing two years of daily data, we find significant heterogeneity
among the inventory decisions made by 634 store managers. By estimating a
dynamic structural model, we reveal substantial heterogeneity in managers'
perceived costs. Moreover, we observe a correlation between the variance of
these perceptions and managers' education and experience. Counterfactual
experiments show that centralized inventory management reduces costs by
eliminating the impact of managers' skill heterogeneity. However, these
benefits are offset by the negative impact of delayed demand information.
arXiv link: http://arxiv.org/abs/2307.05562v1
Are there Dragon Kings in the Stock Market?
roughly five preceding decades. We focus specifically on the time series of
realized volatility (RV) of the S&P500 index and its distribution function. As
expected, the largest values of RV coincide with the largest economic upheavals
of the period: Savings and Loan Crisis, Tech Bubble, Financial Crisis and Covid
Pandemic. We address the question of whether these values belong to one of the
three categories: Black Swans (BS), that is they lie on scale-free, power-law
tails of the distribution; Dragon Kings (DK), defined as statistically
significant upward deviations from BS; or Negative Dragons Kings (nDK), defined
as statistically significant downward deviations from BS. In analyzing the
tails of the distribution with RV > 40, we observe the appearance of
"potential" DK which eventually terminate in an abrupt plunge to nDK. This
phenomenon becomes more pronounced with the increase of the number of days over
which the average RV is calculated -- here from daily, n=1, to "monthly," n=21.
We fit the entire distribution with a modified Generalized Beta (mGB)
distribution function, which terminates at a finite value of the variable but
exhibits a long power-law stretch prior to that, as well as Generalized Beta
Prime (GB2) distribution function, which has a power-law tail. We also fit the
tails directly with a straight line on a log-log scale. In order to ascertain
BS, DK or nDK behavior, all fits include their confidence intervals and
p-values are evaluated for the data points to check if they can come from the
respective distributions.
arXiv link: http://arxiv.org/abs/2307.03693v1
Generalised Covariances and Correlations
from their respective means. We generalise this well-known measure by replacing
the means with other statistical functionals such as quantiles, expectiles, or
thresholds. Deviations from these functionals are defined via generalised
errors, often induced by identification or moment functions. As a normalised
measure of dependence, a generalised correlation is constructed. Replacing the
common Cauchy-Schwarz normalisation by a novel Fr\'echet-Hoeffding
normalisation, we obtain attainability of the entire interval $[-1, 1]$ for any
given marginals. We uncover favourable properties of these new dependence
measures. The families of quantile and threshold correlations give rise to
function-valued distributional correlations, exhibiting the entire dependence
structure. They lead to tail correlations, which should arguably supersede the
coefficients of tail dependence. Finally, we construct summary covariances
(correlations), which arise as (normalised) weighted averages of distributional
covariances. We retrieve Pearson covariance and Spearman correlation as special
cases. The applicability and usefulness of our new dependence measures is
illustrated on demographic data from the Panel Study of Income Dynamics.
arXiv link: http://arxiv.org/abs/2307.03594v2
Climate Models Underestimate the Sensitivity of Arctic Sea Ice to Carbon Emissions
concentrations have increased. Using observed data from 1979 to 2019, we
estimate a close contemporaneous linear relationship between Arctic sea ice
area and cumulative carbon dioxide emissions. For comparison, we provide
analogous regression estimates using simulated data from global climate models
(drawn from the CMIP5 and CMIP6 model comparison exercises). The carbon
sensitivity of Arctic sea ice area is considerably stronger in the observed
data than in the climate models. Thus, for a given future emissions path, an
ice-free Arctic is likely to occur much earlier than the climate models
project. Furthermore, little progress has been made in recent global climate
modeling (from CMIP5 to CMIP6) to more accurately match the observed
carbon-climate response of Arctic sea ice.
arXiv link: http://arxiv.org/abs/2307.03552v1
Panel Data Nowcasting: The Case of Price-Earnings Ratios
panel data consisting of series sampled at different frequencies. Motivated by
the problem of predicting corporate earnings for a large cross-section of firms
with macroeconomic, financial, and news time series sampled at different
frequencies, we focus on the sparse-group LASSO regularization which can take
advantage of the mixed frequency time series panel data structures. Our
empirical results show the superior performance of our machine learning panel
data regression models over analysts' predictions, forecast combinations,
firm-specific time series regression models, and standard machine learning
methods.
arXiv link: http://arxiv.org/abs/2307.02673v1
Online Learning of Order Flow and Market Impact with Bayesian Change-Point Detection Methods
(sell) trades are often followed by subsequent buy (sell) trades over extended
periods. This persistence can be attributed to the division and gradual
execution of large orders. Consequently, distinct order flow regimes might
emerge, which can be identified through suitable time series models applied to
market data. In this paper, we propose the use of Bayesian online change-point
detection (BOCPD) methods to identify regime shifts in real-time and enable
online predictions of order flow and market impact. To enhance the
effectiveness of our approach, we have developed a novel BOCPD method using a
score-driven approach. This method accommodates temporal correlations and
time-varying parameters within each regime. Through empirical application to
NASDAQ data, we have found that: (i) Our newly proposed model demonstrates
superior out-of-sample predictive performance compared to existing models that
assume i.i.d. behavior within each regime; (ii) When examining the residuals,
our model demonstrates good specification in terms of both distributional
assumptions and temporal correlations; (iii) Within a given regime, the price
dynamics exhibit a concave relationship with respect to time and volume,
mirroring the characteristics of actual large orders; (iv) By incorporating
regime information, our model produces more accurate online predictions of
order flow and market impact compared to models that do not consider regimes.
arXiv link: http://arxiv.org/abs/2307.02375v2
Claim Reserving via Inverse Probability Weighting: A Micro-Level Chain-Ladder Method
method being the most widely adopted. These methods were heuristically
developed without minimal statistical foundations, relying on oversimplified
data assumptions and neglecting policyholder heterogeneity, often resulting in
conservative reserve predictions. Micro-level reserving, utilizing stochastic
modeling with granular information, can improve predictions but tends to
involve less attractive and complex models for practitioners. This paper aims
to strike a practical balance between aggregate and individual models by
introducing a methodology that enables the Chain-Ladder method to incorporate
individual information. We achieve this by proposing a novel framework,
formulating the claim reserving problem within a population sampling context.
We introduce a reserve estimator in a frequency and severity distribution-free
manner that utilizes inverse probability weights (IPW) driven by individual
information, akin to propensity scores. We demonstrate that the Chain-Ladder
method emerges as a particular case of such an IPW estimator, thereby
inheriting a statistically sound foundation based on population sampling theory
that enables the use of granular information, and other extensions.
arXiv link: http://arxiv.org/abs/2307.10808v3
Asymptotics for the Generalized Autoregressive Conditional Duration Model
GARCH literature to prove consistency and asymptotic normality of the
(exponential) QMLE for the generalized autoregressive conditional duration
(ACD) model, the so-called ACD(1,1), under the assumption of strict
stationarity and ergodicity. The GARCH results, however, do not account for the
fact that the number of durations over a given observation period is random.
Thus, in contrast with Engle and Russell (1998), we show that strict
stationarity and ergodicity alone are not sufficient for consistency and
asymptotic normality, and provide additional sufficient conditions to account
for the random number of durations. In particular, we argue that the durations
need to satisfy the stronger requirement that they have finite mean.
arXiv link: http://arxiv.org/abs/2307.01779v1
A Double Machine Learning Approach to Combining Experimental and Observational Data
assumptions. We propose a double machine learning approach to combine
experimental and observational studies, allowing practitioners to test for
assumption violations and estimate treatment effects consistently. Our
framework proposes a falsification test for external validity and ignorability
under milder assumptions. We provide consistent treatment effect estimators
even when one of the assumptions is violated. However, our no-free-lunch
theorem highlights the necessity of accurately identifying the violated
assumption for consistent treatment effect estimation. Through comparative
analyses, we show our framework's superiority over existing data fusion
methods. The practical utility of our approach is further exemplified by three
real-world case studies, underscoring its potential for widespread application
in empirical research.
arXiv link: http://arxiv.org/abs/2307.01449v3
Adaptive Principal Component Regression with Applications to Panel Data
error-in-variables regression, a generalization of the linear regression
setting in which the observed covariates are corrupted with random noise. We
provide the first time-uniform finite sample guarantees for (regularized) PCR
whenever data is collected adaptively. Since the proof techniques for analyzing
PCR in the fixed design setting do not readily extend to the online setting,
our results rely on adapting tools from modern martingale concentration to the
error-in-variables setting. We demonstrate the usefulness of our bounds by
applying them to the domain of panel data, a ubiquitous setting in econometrics
and statistics. As our first application, we provide a framework for experiment
design in panel data settings when interventions are assigned adaptively. Our
framework may be thought of as a generalization of the synthetic control and
synthetic interventions frameworks, where data is collected via an adaptive
intervention assignment policy. Our second application is a procedure for
learning such an intervention assignment policy in a setting where units arrive
sequentially to be treated. In addition to providing theoretical performance
guarantees (as measured by regret), we show that our method empirically
outperforms a baseline which does not leverage error-in-variables regression.
arXiv link: http://arxiv.org/abs/2307.01357v3
Nonparametric Estimation of Large Spot Volatility Matrices for High-Frequency Financial Data
of high-frequency data collected for a large number of assets. We first combine
classic nonparametric kernel-based smoothing with a generalised shrinkage
technique in the matrix estimation for noise-free data under a uniform sparsity
assumption, a natural extension of the approximate sparsity commonly used in
the literature. The uniform consistency property is derived for the proposed
spot volatility matrix estimator with convergence rates comparable to the
optimal minimax one. For the high-frequency data contaminated by microstructure
noise, we introduce a localised pre-averaging estimation method that reduces
the effective magnitude of the noise. We then use the estimation tool developed
in the noise-free scenario, and derive the uniform convergence rates for the
developed spot volatility matrix estimator. We further combine the kernel
smoothing with the shrinkage technique to estimate the time-varying volatility
matrix of the high-dimensional noise vector. In addition, we consider large
spot volatility matrix estimation in time-varying factor models with observable
risk factors and derive the uniform convergence property. We provide numerical
studies including simulation and empirical application to examine the
performance of the proposed estimation methods in finite samples.
arXiv link: http://arxiv.org/abs/2307.01348v1
A maximal inequality for local empirical processes under weak dependence
strongly mixing data. Local empirical processes are defined as the (local)
averages $1{nh}\sum_{i=1}^n 1\{x - h \leq X_i \leq
x+h\}f(Z_i)$, where $f$ belongs to a class of functions, $x \in R$ and
$h > 0$ is a bandwidth. Our nonasymptotic bounds control estimation error
uniformly over the function class, evaluation point $x$ and bandwidth $h$. They
are also general enough to accomodate function classes whose complexity
increases with $n$. As an application, we apply our bounds to function classes
that exhibit polynomial decay in their uniform covering numbers. When
specialized to the problem of kernel density estimation, our bounds reveal
that, under weak dependence with exponential decay, these estimators achieve
the same (up to a logarithmic factor) sharp uniform-in-bandwidth rates derived
in the iid setting by Einmahl2005.
arXiv link: http://arxiv.org/abs/2307.01328v1
Does regional variation in wage levels identify the effects of a national minimum wage?
differences to study the effects of a national minimum wage. It shows that
variations of the “fraction affected” and “effective minimum wage” designs
are vulnerable to bias from measurement error and functional form
misspecification, even when standard identification assumptions hold, and that
small deviations from these assumptions can substantially amplify the biases.
Using simulation exercises and a case study of Brazil's minimum wage increase,
the paper illustrates the practical relevance of these issues and assesses the
performance of potential solutions and diagnostic tools.
arXiv link: http://arxiv.org/abs/2307.01284v5
Doubly Robust Estimation of Direct and Indirect Quantile Treatment Effects with Machine Learning
quantile treatment effects under a selection-on-observables assumption. This
permits disentangling the causal effect of a binary treatment at a specific
outcome rank into an indirect component that operates through an intermediate
variable called mediator and an (unmediated) direct impact. The proposed method
is based on the efficient score functions of the cumulative distribution
functions of potential outcomes, which are robust to certain misspecifications
of the nuisance parameters, i.e., the outcome, treatment, and mediator models.
We estimate these nuisance parameters by machine learning and use cross-fitting
to reduce overfitting bias in the estimation of direct and indirect quantile
treatment effects. We establish uniform consistency and asymptotic normality of
our effect estimators. We also propose a multiplier bootstrap for statistical
inference and show the validity of the multiplier bootstrap. Finally, we
investigate the finite sample performance of our method in a simulation study
and apply it to empirical data from the National Job Corp Study to assess the
direct and indirect earnings effects of training.
arXiv link: http://arxiv.org/abs/2307.01049v1
Expected Shortfall LASSO
Expected Shortfall (ES). The estimator is obtained as the solution to a
least-squares problem for an auxiliary dependent variable, which is defined as
a transformation of the dependent variable and a pre-estimated tail quantile.
Leveraging a sparsity condition, we derive a nonasymptotic bound on the
prediction and estimator errors of the ES estimator, accounting for the
estimation error in the dependent variable, and provide conditions under which
the estimator is consistent. Our estimator is applicable to heavy-tailed
time-series data and we find that the amount of parameters in the model may
grow with the sample size at a rate that depends on the dependence and
heavy-tailedness in the data. In an empirical application, we consider the
systemic risk measure CoES and consider a set of regressors that consists of
nonlinear transformations of a set of state variables. We find that the
nonlinear model outperforms an unpenalized and untransformed benchmark
considerably.
arXiv link: http://arxiv.org/abs/2307.01033v2
Quantifying Distributional Model Risk in Marginal Problems via Optimal Transport
marginal measure is assumed to lie in a Wasserstein ball centered at a fixed
reference measure with a given radius. Theoretically, we establish several
fundamental results including strong duality, finiteness of the proposed
Wasserstein distributional model risk, and the existence of an optimizer at
each radius. In addition, we show continuity of the Wasserstein distributional
model risk as a function of the radius. Using strong duality, we extend the
well-known Makarov bounds for the distribution function of the sum of two
random variables with given marginals to Wasserstein distributionally robust
Markarov bounds. Practically, we illustrate our results on four distinct
applications when the sample information comes from multiple data sources and
only some marginal reference measures are identified. They are: partial
identification of treatment effects; externally valid treatment choice via
robust welfare functions; Wasserstein distributionally robust estimation under
data combination; and evaluation of the worst aggregate risk measures.
arXiv link: http://arxiv.org/abs/2307.00779v1
The Yule-Frisch-Waugh-Lovell Theorem
in the econometrics literature as the Frisch-Waugh-Lovell theorem. This theorem
demonstrates that the coefficients on any subset of covariates in a multiple
regression is equal to the coefficients in a regression of the residualized
outcome variable on the residualized subset of covariates, where
residualization uses the complement of the subset of covariates of interest. In
this paper, I suggest that the theorem should be renamed as the
Yule-Frisch-Waugh-Lovell (YFWL) theorem to recognize the pioneering
contribution of the statistician G. Udny Yule in its development. Second, I
highlight recent work by the statistician, P. Ding, which has extended the YFWL
theorem to a comparison of estimated covariance matrices of coefficients from
multiple and partial, i.e. residualized regressions. Third, I show that, in
cases where Ding's results do not apply, one can still resort to a
computational method to conduct statistical inference about coefficients in
multiple regressions using information from partial regressions.
arXiv link: http://arxiv.org/abs/2307.00369v1
Decomposing cryptocurrency high-frequency price dynamics into recurring and noisy components
cryptocurrency market with a focus on Bitcoin, Ethereum, Dogecoin, and WINkLink
from January 2020 to December 2022. Market activity measures - logarithmic
returns, volume, and transaction number, sampled every 10 seconds, were divided
into intraday and intraweek periods and then further decomposed into recurring
and noise components via correlation matrix formalism. The key findings include
the distinctive market behavior from traditional stock markets due to the
nonexistence of trade opening and closing. This was manifest in three
enhanced-activity phases aligning with Asian, European, and U.S. trading
sessions. An intriguing pattern of activity surge in 15-minute intervals,
particularly at full hours, was also noticed, implying the potential role of
algorithmic trading. Most notably, recurring bursts of activity in bitcoin and
ether were identified to coincide with the release times of significant U.S.
macroeconomic reports such as Nonfarm payrolls, Consumer Price Index data, and
Federal Reserve statements. The most correlated daily patterns of activity
occurred in 2022, possibly reflecting the documented correlations with U.S.
stock indices in the same period. Factors that are external to the inner market
dynamics are found to be responsible for the repeatable components of the
market dynamics, while the internal factors appear to be substantially random,
which manifests itself in a good agreement between the empirical eigenvalue
distributions in their bulk and the random matrix theory predictions expressed
by the Marchenko-Pastur distribution. The findings reported support the growing
integration of cryptocurrencies into the global financial markets.
arXiv link: http://arxiv.org/abs/2306.17095v2
Nonparametric Causal Decomposition of Group Disparities
identifies the mechanisms by which a treatment variable contributes to a
group-based outcome disparity. Our approach distinguishes three mechanisms:
group differences in 1) treatment prevalence, 2) average treatment effects, and
3) selection into treatment based on individual-level treatment effects. Our
approach reformulates classic Kitagawa-Blinder-Oaxaca decompositions in causal
and nonparametric terms, complements causal mediation analysis by explaining
group disparities instead of group effects, and isolates conceptually distinct
mechanisms conflated in recent random equalization decompositions. In contrast
to all prior approaches, our framework uniquely identifies differential
selection into treatment as a novel disparity-generating mechanism. Our
approach can be used for both the retrospective causal explanation of
disparities and the prospective planning of interventions to change
disparities. We present both an unconditional and a conditional decomposition,
where the latter quantifies the contributions of the treatment within levels of
certain covariates. We develop nonparametric estimators that are
$n$-consistent, asymptotically normal, semiparametrically efficient, and
multiply robust. We apply our approach to analyze the mechanisms by which
college graduation causally contributes to intergenerational income persistence
(the disparity in adult income between the children of high- vs low-income
parents). Empirically, we demonstrate a previously undiscovered role played by
the new selection component in intergenerational income persistence.
arXiv link: http://arxiv.org/abs/2306.16591v4
High-Dimensional Canonical Correlation Analysis
an emphasis on the vectors that define canonical variables. The paper shows
that when two dimensions of data grow to infinity jointly and proportionally,
the classical CCA procedure for estimating those vectors fails to deliver a
consistent estimate. This provides the first result on the impossibility of
identification of canonical variables in the CCA procedure when all dimensions
are large. As a countermeasure, the paper derives the magnitude of the
estimation error, which can be used in practice to assess the precision of CCA
estimates. Applications of the results to cyclical vs. non-cyclical stocks and
to a limestone grassland data set are provided.
arXiv link: http://arxiv.org/abs/2306.16393v3
Assessing Heterogeneity of Treatment Effects
example, a poverty reduction measure would be best evaluated by its effects on
those who would be poor in the absence of the treatment, or by the share among
the poor who would increase their earnings because of the treatment. While
these quantities are not identified, we derive nonparametrically sharp bounds
using only the marginal distributions of the control and treated outcomes.
Applications to microfinance and welfare reform demonstrate their utility even
when the average treatment effects are not significant and when economic theory
makes opposite predictions between heterogeneous individuals.
arXiv link: http://arxiv.org/abs/2306.15048v4
Identifying Socially Disruptive Policies
connections between agents. It is a costly side effect of many interventions
and so a growing empirical literature recommends measuring and accounting for
social disruption when evaluating the welfare impact of a policy. However,
there is currently little work characterizing what can actually be learned
about social disruption from data in practice. In this paper, we consider the
problem of identifying social disruption in an experimental setting. We show
that social disruption is not generally point identified, but informative
bounds can be constructed by rearranging the eigenvalues of the marginal
distribution of network connections between pairs of agents identified from the
experiment. We apply our bounds to the setting of Banerjee et al. (2021) and
find large disruptive effects that the authors miss by only considering
regression estimates.
arXiv link: http://arxiv.org/abs/2306.15000v3
Marginal Effects for Probit and Tobit with Endogeneity
structural endogeneity and measurement errors. In contrast to linear models,
these two sources of endogeneity affect partial effects differently in
nonlinear models. We study this issue focusing on the Instrumental Variable
(IV) Probit and Tobit models. We show that even when a valid IV is available,
failing to differentiate between the two types of endogeneity can lead to
either under- or over-estimation of the partial effects. We develop simple
estimators of the bounds on the partial effects and provide easy to implement
confidence intervals that correctly account for both types of endogeneity. We
illustrate the methods in a Monte Carlo simulation and an empirical
application.
arXiv link: http://arxiv.org/abs/2306.14862v4
Optimization of the Generalized Covariance Estimator in Noncausal Processes
estimator (GCov) in estimating and identifying mixed causal and noncausal
models. The GCov estimator is a semi-parametric method that minimizes an
objective function without making any assumptions about the error distribution
and is based on nonlinear autocovariances to identify the causal and noncausal
orders. When the number and type of nonlinear autocovariances included in the
objective function of a GCov estimator is insufficient/inadequate, or the error
density is too close to the Gaussian, identification issues can arise. These
issues result in local minima in the objective function, which correspond to
parameter values associated with incorrect causal and noncausal orders. Then,
depending on the starting point and the optimization algorithm employed, the
algorithm can converge to a local minimum. The paper proposes the use of the
Simulated Annealing (SA) optimization algorithm as an alternative to
conventional numerical optimization methods. The results demonstrate that SA
performs well when applied to mixed causal and noncausal models, successfully
eliminating the effects of local minima. The proposed approach is illustrated
by an empirical application involving a bivariate commodity price series.
arXiv link: http://arxiv.org/abs/2306.14653v3
Hybrid unadjusted Langevin methods for high-dimensional latent variable models
challenging. The latents have to be integrated out numerically, and the
dimension of the latent variables increases with the sample size. This paper
develops a novel approximate Bayesian method based on the Langevin diffusion
process. The method employs the Fisher identity to integrate out the latent
variables, which makes it accurate and computationally feasible when applied to
big data. In contrast to other approximate estimation methods, it does not
require the choice of a parametric distribution for the unknowns, which often
leads to inaccuracies. In an empirical discrete choice example with a million
observations, the proposed method accurately estimates the posterior choice
probabilities using only 2% of the computation time of exact MCMC.
arXiv link: http://arxiv.org/abs/2306.14445v1
Simple Estimation of Semiparametric Models with Measurement Errors
problem in the Generalized Method of Moments (GMM) framework. We focus on the
settings in which the variability of the EIV is a fraction of that of the
mismeasured variables, which is typical for empirical applications. For any
initial set of moment conditions our approach provides a "corrected" set of
moment conditions that are robust to the EIV. We show that the GMM estimator
based on these moments is root-n-consistent, with the standard tests and
confidence intervals providing valid inference. This is true even when the EIV
are so large that naive estimators (that ignore the EIV problem) are heavily
biased with their confidence intervals having 0% coverage. Our approach
involves no nonparametric estimation, which is especially important for
applications with many covariates, and settings with multivariate or
non-classical EIV. In particular, the approach makes it easy to use
instrumental variables to address EIV in nonlinear models.
arXiv link: http://arxiv.org/abs/2306.14311v3
Latent Factor Analysis in Short Panels
pseudo maximum likelihood setting under a large cross-sectional dimension n and
a fixed time series dimension T relies on a diagonal TxT covariance matrix of
the errors without imposing sphericity nor Gaussianity. We outline the
asymptotic distributions of the latent factor and error covariance estimates as
well as of an asymptotically uniformly most powerful invariant (AUMPI) test for
the number of factors based on the likelihood ratio statistic. We derive the
AUMPI characterization from inequalities ensuring the monotone likelihood ratio
property for positive definite quadratic forms in normal variables. An
empirical application to a large panel of monthly U.S. stock returns separates
month after month systematic and idiosyncratic risks in short subperiods of
bear vs. bull market based on the selected number of factors. We observe an
uptrend in the paths of total and idiosyncratic volatilities while the
systematic risk explains a large part of the cross-sectional total variance in
bear markets but is not driven by a single factor. Rank tests show that
observed factors struggle spanning latent factors with a discrepancy between
the dimensions of the two factor spaces decreasing over time.
arXiv link: http://arxiv.org/abs/2306.14004v2
Multivariate Simulation-based Forecasting for Intraday Power Markets: Modelling Cross-Product Price Effects
the intermittent generation of renewable energy resources, which creates a need
for accurate probabilistic price forecasts. However, research to date has
focused on univariate approaches, while in many European intraday electricity
markets all delivery periods are traded in parallel. Thus, the dependency
structure between different traded products and the corresponding cross-product
effects cannot be ignored. We aim to fill this gap in the literature by using
copulas to model the high-dimensional intraday price return vector. We model
the marginal distribution as a zero-inflated Johnson's $S_U$ distribution with
location, scale and shape parameters that depend on market and fundamental
data. The dependence structure is modelled using latent beta regression to
account for the particular market structure of the intraday electricity market,
such as overlapping but independent trading sessions for different delivery
days. We allow the dependence parameter to be time-varying. We validate our
approach in a simulation study for the German intraday electricity market and
find that modelling the dependence structure improves the forecasting
performance. Additionally, we shed light on the impact of the single intraday
coupling (SIDC) on the trading activity and price distribution and interpret
our results in light of the market efficiency hypothesis. The approach is
directly applicable to other European electricity markets.
arXiv link: http://arxiv.org/abs/2306.13419v1
Factor-augmented sparse MIDAS regressions with an application to nowcasting
regressions for high-dimensional time series data, which may be observed at
different frequencies. Our novel approach integrates sparse and dense
dimensionality reduction techniques. We derive the convergence rate of our
estimator under misspecification, $\tau$-mixing dependence, and polynomial
tails. Our method's finite sample performance is assessed via Monte Carlo
simulations. We apply the methodology to nowcasting U.S. GDP growth and
demonstrate that it outperforms both sparse regression and standard
factor-augmented regression during the COVID-19 pandemic. To ensure the
robustness of these results, we also implement factor-augmented sparse logistic
regression, which further confirms the superior accuracy of our nowcast
probabilities during recessions. These findings indicate that recessions are
influenced by both idiosyncratic (sparse) and common (dense) shocks.
arXiv link: http://arxiv.org/abs/2306.13362v3
A Discrimination Report Card
informativeness of the assigned grades against the expected frequency of
ranking errors. Applying the method to a massive correspondence experiment, we
grade the racial biases of 97 U.S. employers. A four-grade ranking limits the
chances that a randomly selected pair of firms is mis-ranked to 5% while
explaining nearly half of the variation in firms' racial contact gaps. The
grades are presented alongside measures of uncertainty about each firm's
contact gap in an accessible rubric that is easily adapted to other settings
where ranks and levels are of simultaneous interest.
arXiv link: http://arxiv.org/abs/2306.13005v1
Price elasticity of electricity demand: Using instrumental variable regressions to address endogeneity and autocorrelation of high-frequency time series
aggregated electricity demand to high-frequency price signals, the short-term
elasticity of electricity demand. We investigate how the endogeneity of prices
and the autocorrelation of the time series, which are particularly pronounced
at hourly granularity, affect and distort common estimators. After developing a
controlled test environment with synthetic data that replicate key statistical
properties of electricity demand, we show that not only the ordinary least
square (OLS) estimator is inconsistent (due to simultaneity), but so is a
regular instrumental variable (IV) regression (due to autocorrelation). Using
wind as an instrument, as it is commonly done, may result in an estimate of the
demand elasticity that is inflated by an order of magnitude. We visualize the
reason for the Thams bias using causal graphs and show that its magnitude
depends on the autocorrelation of both the instrument, and the dependent
variable. We further incorporate and adapt two extensions of the IV estimation,
conditional IV and nuisance IV, which have recently been proposed by Thams et
al. (2022). We show that these extensions can identify the true short-term
elasticity in a synthetic setting and are thus particularly promising for
future empirical research in this field.
arXiv link: http://arxiv.org/abs/2306.12863v1
Estimating the Value of Evidence-Based Decision Making
experiments and observational studies. In this article we propose an empirical
framework to estimate the value of evidence-based decision making (EBDM) and
the return on the investment in statistical precision.
arXiv link: http://arxiv.org/abs/2306.13681v2
A Nonparametric Test of $m$th-degree Inverse Stochastic Dominance
dominance which is a powerful tool for ranking distribution functions according
to social welfare. We construct the test based on empirical process theory. The
test is shown to be asymptotically size controlled and consistent. The good
finite sample properties of the test are illustrated via Monte Carlo
simulations. We apply our test to the inequality growth in the United Kingdom
from 1995 to 2010.
arXiv link: http://arxiv.org/abs/2306.12271v3
Difference-in-Differences with Interference
outcomes are not only dependent upon the unit's own treatment but also its
neighbors' treatment. Despite this, "difference-in-differences" (DID) type
estimators typically ignore such interference among neighbors. I show in this
paper that the canonical DID estimators generally fail to identify interesting
causal effects in the presence of neighborhood interference. To incorporate
interference structure into DID estimation, I propose doubly robust estimators
for the direct average treatment effect on the treated as well as the average
spillover effects under a modified parallel trends assumption. I later relax
common restrictions in the literature, such as immediate neighborhood
interference and correctly specified spillover functions. Moreover, robust
inference is discussed based on the asymptotic distribution of the proposed
estimators.
arXiv link: http://arxiv.org/abs/2306.12003v6
Qini Curves for Multi-Armed Treatment Rules
the benefit of data-driven targeting rules for treatment allocation. We propose
a generalization of the Qini curve to multiple costly treatment arms, that
quantifies the value of optimally selecting among both units and treatment arms
at different budget levels. We develop an efficient algorithm for computing
these curves and propose bootstrap-based confidence intervals that are exact in
large samples for any point on the curve. These confidence intervals can be
used to conduct hypothesis tests comparing the value of treatment targeting
using an optimal combination of arms with using just a subset of arms, or with
a non-targeting assignment rule ignoring covariates, at different budget
levels. We demonstrate the statistical performance in a simulation experiment
and an application to treatment targeting for election turnout.
arXiv link: http://arxiv.org/abs/2306.11979v4
Statistical Tests for Replacing Human Decision Makers with Algorithms
to improve human decision making. The performance of each human decision maker
is benchmarked against that of machine predictions. We replace the diagnoses
made by a subset of the decision makers with the recommendation from the
machine learning algorithm. We apply both a heuristic frequentist approach and
a Bayesian posterior loss function approach to abnormal birth detection using a
nationwide dataset of doctor diagnoses from prepregnancy checkups of
reproductive age couples and pregnancy outcomes. We find that our algorithm on
a test dataset results in a higher overall true positive rate and a lower false
positive rate than the diagnoses made by doctors only.
arXiv link: http://arxiv.org/abs/2306.11689v2
Assumption-lean falsification tests of rate double-robustness of double-machine-learning estimators
(2021) is of central importance in economics and biostatistics. It strictly
includes both (i) the class of mean-square continuous functionals that can be
written as an expectation of an affine functional of a conditional expectation
studied by Chernozhukov et al. (2022b) and (ii) the class of functionals
studied by Robins et al. (2008). The present state-of-the-art estimators for DR
functionals $\psi$ are double-machine-learning (DML) estimators (Chernozhukov
et al., 2018). A DML estimator $\psi_{1}$ of $\psi$ depends on
estimates $p (x)$ and $b (x)$ of a pair of nuisance
functions $p(x)$ and $b(x)$, and is said to satisfy "rate double-robustness" if
the Cauchy--Schwarz upper bound of its bias is $o (n^{- 1/2})$. Were it
achievable, our scientific goal would have been to construct valid,
assumption-lean (i.e. no complexity-reducing assumptions on $b$ or $p$) tests
of the validity of a nominal $(1 - \alpha)$ Wald confidence interval (CI)
centered at $\psi_{1}$. But this would require a test of the bias to
be $o (n^{-1/2})$, which can be shown not to exist. We therefore adopt the less
ambitious goal of falsifying, when possible, an analyst's justification for her
claim that the reported $(1 - \alpha)$ Wald CI is valid. In many instances, an
analyst justifies her claim by imposing complexity-reducing assumptions on $b$
and $p$ to ensure "rate double-robustness". Here we exhibit valid,
assumption-lean tests of $H_{0}$: "rate double-robustness holds", with
non-trivial power against certain alternatives. If $H_{0}$ is rejected, we will
have falsified her justification. However, no assumption-lean test of $H_{0}$,
including ours, can be a consistent test. Thus, the failure of our test to
reject is not meaningful evidence in favor of $H_{0}$.
arXiv link: http://arxiv.org/abs/2306.10590v4
Formal Covariate Benchmarking to Bound Omitted Variable Bias
omitted variable bias and can be used to bound the strength of the unobserved
confounder using information and judgments about observed covariates. It is
common to carry out formal covariate benchmarking after residualizing the
unobserved confounder on the set of observed covariates. In this paper, I
explain the rationale and details of this procedure. I clarify some important
details of the process of formal covariate benchmarking and highlight some of
the difficulties of interpretation that researchers face in reasoning about the
residualized part of unobserved confounders. I explain all the points with
several empirical examples.
arXiv link: http://arxiv.org/abs/2306.10562v1
Testing for Peer Effects without Specifying the Network Structure
effects in panel data without the need to specify the network structure. The
unrestricted model of our test is a linear panel data model of social
interactions with dyad-specific peer effect coefficients for all potential
peers. The proposed AR test evaluates if these peer effect coefficients are all
zero. As the number of peer effect coefficients increases with the sample size,
so does the number of instrumental variables (IVs) employed to test the
restrictions under the null, rendering Bekker's many-IV environment. By
extending existing many-IV asymptotic results to panel data, we establish the
asymptotic validity of the proposed AR test. Our Monte Carlo simulations show
the robustness and superior performance of the proposed test compared to some
existing tests with misspecified networks. We provide two applications to
demonstrate its empirical relevance.
arXiv link: http://arxiv.org/abs/2306.09806v3
Modelling and Forecasting Macroeconomic Risk with Time Varying Skewness Stochastic Volatility Models
is critical for effective policymaking aimed at maintaining economic stability.
In this paper I propose a parametric framework for modelling and forecasting
macroeconomic risk based on stochastic volatility models with Skew-Normal and
Skew-t shocks featuring time varying skewness. Exploiting a mixture stochastic
representation of the Skew-Normal and Skew-t random variables, in the paper I
develop efficient posterior simulation samplers for Bayesian estimation of both
univariate and VAR models of this type. In an application, I use the models to
predict downside risk to GDP growth in the US and I show that these models
represent a competitive alternative to semi-parametric approaches such as
quantile regression. Finally, estimating a medium scale VAR on US data I show
that time varying skewness is a relevant feature of macroeconomic and financial
shocks.
arXiv link: http://arxiv.org/abs/2306.09287v2
Inference in clustered IV models with many and weak instruments
observations towards the number of clusters. For instrumental variable models
this reduced effective sample size makes the instruments more likely to be
weak, in the sense that they contain little information about the endogenous
regressor, and many, in the sense that their number is large compared to the
sample size. Consequently, weak and many instrument problems for estimators and
tests in instrumental variable models are also more likely. None of the
previously developed many and weak instrument robust tests, however, can be
applied to clustered data as they all require independent observations.
Therefore, I adapt the many and weak instrument robust jackknife
Anderson--Rubin and jackknife score tests to clustered data by removing
clusters rather than individual observations from the statistics. Simulations
and a revisitation of a study on the effect of queenly reign on war show the
empirical relevance of the new tests.
arXiv link: http://arxiv.org/abs/2306.08559v3
Machine Learning for Zombie Hunting: Predicting Distress from Firms' Accounts and Missing Values
zombie firms. First, we derive the risk of failure by training and testing our
algorithms on disclosed financial information and non-random missing values of
304,906 firms active in Italy from 2008 to 2017. Then, we spot the highest
financial distress conditional on predictions that lies above a threshold for
which a combination of false positive rate (false prediction of firm failure)
and false negative rate (false prediction of active firms) is minimized.
Therefore, we identify zombies as firms that persist in a state of financial
distress, i.e., their forecasts fall into the risk category above the threshold
for at least three consecutive years. For our purpose, we implement a gradient
boosting algorithm (XGBoost) that exploits information about missing values.
The inclusion of missing values in our predictive model is crucial because
patterns of undisclosed accounts are correlated with firm failure. Finally, we
show that our preferred machine learning algorithm outperforms (i) proxy models
such as Z-scores and the Distance-to-Default, (ii) traditional econometric
methods, and (iii) other widely used machine learning techniques. We provide
evidence that zombies are on average less productive and smaller, and that they
tend to increase in times of crisis. Finally, we argue that our application can
help financial institutions and public authorities design evidence-based
policies-e.g., optimal bankruptcy laws and information disclosure policies.
arXiv link: http://arxiv.org/abs/2306.08165v1
Kernel Choice Matters for Local Polynomial Density Estimators at Boundaries
estimators at boundary points. Contrary to conventional wisdom, we demonstrate
that the choice of kernel has a substantial impact on the efficiency of LPD
estimators. In particular, we provide theoretical results and present
simulation and empirical evidence showing that commonly used kernels, such as
the triangular kernel, suffer from several efficiency issues: They yield a
larger mean squared error than our preferred Laplace kernel. For inference, the
efficiency loss is even more pronounced, with confidence intervals based on
popular kernels being wide, whereas those based on the Laplace kernel are
markedly tighter. Furthermore, the variance of the LPD estimator with such
popular kernels explodes as the sample size decreases, reflecting the fact --
formally proven here -- that its finite-sample variance is infinite. This
small-sample problem, however, can be avoided by employing kernels with
unbounded support. Taken together, both asymptotic and finite-sample analyses
justify the use of the Laplace kernel: Simply changing the kernel function
improves the reliability of LPD estimation and inference, and its effect is
numerically significant.
arXiv link: http://arxiv.org/abs/2306.07619v3
Instrument-based estimation of full treatment effects with movers
evaluation, while often only the effect of a subset of treatment is estimated.
We partially identify the local average treatment effect of receiving full
treatment (LAFTE) using an instrumental variable that may induce individuals
into only a subset of treatment (movers). We show that movers violate the
standard exclusion restriction, necessary conditions on the presence of movers
are testable, and partial identification holds under a double exclusion
restriction. We identify movers in four empirical applications and estimate
informative bounds on the LAFTE in three of them.
arXiv link: http://arxiv.org/abs/2306.07018v1
Localized Neural Network Modelling of Time Series: A Case Study on US Monetary Policy
context of treatment effects via a localized neural network (LNN) approach. Due
to a vast number of parameters involved, we reduce the number of effective
parameters by (i) exploring the use of identification restrictions; and (ii)
adopting a variable selection method based on the group-LASSO technique.
Subsequently, we derive the corresponding estimation theory and propose a
dependent wild bootstrap procedure to construct valid inferences accounting for
the dependence of data. Finally, we validate our theoretical findings through
extensive numerical studies. In an empirical study, we revisit the impacts of a
tightening monetary policy action on a variety of economic variables, including
short-/long-term interest rate, inflation, unemployment rate, industrial price
and equity return via the newly proposed framework using a monthly dataset of
the US.
arXiv link: http://arxiv.org/abs/2306.05593v2
Maximally Machine-Learnable Portfolios
risk-adjusted profitability. We develop a collaborative machine learning
algorithm that optimizes portfolio weights so that the resulting synthetic
security is maximally predictable. Precisely, we introduce MACE, a multivariate
extension of Alternating Conditional Expectations that achieves the
aforementioned goal by wielding a Random Forest on one side of the equation,
and a constrained Ridge Regression on the other. There are two key improvements
with respect to Lo and MacKinlay's original maximally predictable portfolio
approach. First, it accommodates for any (nonlinear) forecasting algorithm and
predictor set. Second, it handles large portfolios. We conduct exercises at the
daily and monthly frequency and report significant increases in predictability
and profitability using very little conditioning information. Interestingly,
predictability is found in bad as well as good times, and MACE successfully
navigates the debacle of 2022.
arXiv link: http://arxiv.org/abs/2306.05568v2
Heterogeneous Autoregressions in Short T Panel Data Models
individual-specific effects and heterogeneous autoregressive coefficients
defined on the interval (-1,1], thus allowing for some of the individual
processes to have unit roots. It proposes estimators for the moments of the
cross-sectional distribution of the autoregressive (AR) coefficients, assuming
a random coefficient model for the autoregressive coefficients without imposing
any restrictions on the fixed effects. It is shown the standard generalized
method of moments estimators obtained under homogeneous slopes are biased.
Small sample properties of the proposed estimators are investigated by Monte
Carlo experiments and compared with a number of alternatives, both under
homogeneous and heterogeneous slopes. It is found that a simple moment
estimator of the mean of heterogeneous AR coefficients performs very well even
for moderate sample sizes, but to reliably estimate the variance of AR
coefficients much larger samples are required. It is also required that the
true value of this variance is not too close to zero. The utility of the
heterogeneous approach is illustrated in the case of earnings dynamics.
arXiv link: http://arxiv.org/abs/2306.05299v3
Matrix GARCH Model: Inference and Application
However, no attempt has been made to study their conditional heteroskedasticity
that is often observed in economic and financial data. To address this gap, we
propose a novel matrix generalized autoregressive conditional
heteroskedasticity (GARCH) model to capture the dynamics of conditional row and
column covariance matrices of matrix time series. The key innovation of the
matrix GARCH model is the use of a univariate GARCH specification for the trace
of conditional row or column covariance matrix, which allows for the
identification of conditional row and column covariance matrices. Moreover, we
introduce a quasi maximum likelihood estimator (QMLE) for model estimation and
develop a portmanteau test for model diagnostic checking. Simulation studies
are conducted to assess the finite-sample performance of the QMLE and
portmanteau test. To handle large dimensional matrix time series, we also
propose a matrix factor GARCH model. Finally, we demonstrate the superiority of
the matrix GARCH and matrix factor GARCH models over existing multivariate
GARCH-type models in volatility forecasting and portfolio allocations using
three applications on credit default swap prices, global stock sector indices,
and future prices.
arXiv link: http://arxiv.org/abs/2306.05169v1
Network-based Representations and Dynamic Discrete Choice Models for Multiple Discrete Choice Analysis
characterized as multiple discrete, which means that people choose multiple
items simultaneously. The analysis and prediction of people behavior in
multiple discrete choice situations pose several challenges. In this paper, to
address this, we propose a random utility maximization (RUM) based model that
considers each subset of choice alternatives as a composite alternative, where
individuals choose a subset according to the RUM framework. While this approach
offers a natural and intuitive modeling approach for multiple-choice analysis,
the large number of subsets of choices in the formulation makes its estimation
and application intractable. To overcome this challenge, we introduce directed
acyclic graph (DAG) based representations of choices where each node of the DAG
is associated with an elemental alternative and additional information such
that the number of selected elemental alternatives. Our innovation is to show
that the multi-choice model is equivalent to a recursive route choice model on
the DAG, leading to the development of new efficient estimation algorithms
based on dynamic programming. In addition, the DAG representations enable us to
bring some advanced route choice models to capture the correlation between
subset choice alternatives. Numerical experiments based on synthetic and real
datasets show many advantages of our modeling approach and the proposed
estimation algorithms.
arXiv link: http://arxiv.org/abs/2306.04606v1
Evaluating the Impact of Regulatory Policies on Social Welfare in Difference-in-Difference Settings
requires the identification of counterfactual distributions. Many of these
policies (e.g. minimum wages or minimum working time) generate mass points
and/or discontinuities in the outcome distribution. Existing approaches in the
difference-in-difference literature cannot accommodate these discontinuities
while accounting for selection on unobservables and non-stationary outcome
distributions. We provide a unifying partial identification result that can
account for these features. Our main identifying assumption is the stability of
the dependence (copula) between the distribution of the untreated potential
outcome and group membership (treatment assignment) across time. Exploiting
this copula stability assumption allows us to provide an identification result
that is invariant to monotonic transformations. We provide sharp bounds on the
counterfactual distribution of the treatment group suitable for any outcome,
whether discrete, continuous, or mixed. Our bounds collapse to the
point-identification result in Athey and Imbens (2006) for continuous outcomes
with strictly increasing distribution functions. We illustrate our approach and
the informativeness of our bounds by analyzing the impact of an increase in the
legal minimum wage using data from a recent minimum wage study (Cengiz, Dube,
Lindner, and Zipperer, 2019).
arXiv link: http://arxiv.org/abs/2306.04494v2
Semiparametric Efficiency Gains From Parametric Restrictions on Propensity Scores
improves semiparametric efficiency bounds in the potential outcome framework.
For stratified propensity scores, considered as a parametric model, we derive
explicit formulas for the efficiency gain from knowing how the covariate space
is split. Based on these, we find that the efficiency gain decreases as the
partition of the stratification becomes finer. For general parametric models,
where it is hard to obtain explicit representations of efficiency bounds, we
propose a novel framework that enables us to see whether knowing a parametric
model is valuable in terms of efficiency even when it is high-dimensional. In
addition to the intuitive fact that knowing the parametric model does not help
much if it is sufficiently flexible, we discover that the efficiency gain can
be nearly zero even though the parametric assumption significantly restricts
the space of possible propensity scores.
arXiv link: http://arxiv.org/abs/2306.04177v3
Semiparametric Discrete Choice Models for Bundles
for bundles. Our first approach is a kernel-weighted rank estimator based on a
matching-based identification strategy. We establish its complete asymptotic
properties and prove the validity of the nonparametric bootstrap for inference.
We then introduce a new multi-index least absolute deviations (LAD) estimator
as an alternative, of which the main advantage is its capacity to estimate
preference parameters on both alternative- and agent-specific regressors. Both
methods can account for arbitrary correlation in disturbances across choices,
with the former also allowing for interpersonal heteroskedasticity. We also
demonstrate that the identification strategy underlying these procedures can be
extended naturally to panel data settings, producing an analogous localized
maximum score estimator and a LAD estimator for estimating bundle choice models
with fixed effects. We derive the limiting distribution of the former and
verify the validity of the numerical bootstrap as an inference tool. All our
proposed methods can be applied to general multi-index models. Monte Carlo
experiments show that they perform well in finite samples.
arXiv link: http://arxiv.org/abs/2306.04135v3
Marijuana on Main Streets? The Story Continues in Colombia: An Endogenous Three-part Model
relevant to analyze the potential implications of its legalization. This paper
proposes an endogenous three-part model taking into account incidental
truncation and access restrictions to study demand for marijuana in Colombia,
and analyze the potential effects of its legalization. Our application suggests
that modeling simultaneously access, intensive and extensive margin is
relevant, and that selection into access is important for the intensive margin.
We find that younger men that have consumed alcohol and cigarettes, living in a
neighborhood with drug suppliers, and friends that consume marijuana face
higher probability of having access and using this drug. In addition, we find
that marijuana is an inelastic good (-0.45 elasticity). Our results are robust
to different specifications and definitions. If marijuana were legalized,
younger individuals with a medium or low risk perception about marijuana use
would increase the probability of use in 3.8 percentage points, from 13.6% to
17.4%. Overall, legalization would increase the probability of consumption in
0.7 p.p. (2.3% to 3.0%). Different price settings suggest that annual tax
revenues fluctuate between USD 11.0 million and USD 54.2 million, a potential
benchmark is USD 32 million.
arXiv link: http://arxiv.org/abs/2306.10031v1
Parametrization, Prior Independence, and the Semiparametric Bernstein-von Mises Theorem for the Partially Linear Model
regression model with independent priors for the low-dimensional parameter of
interest and the infinite-dimensional nuisance parameters. My result avoids a
challenging prior invariance condition that arises from a loss of information
associated with not knowing the nuisance parameter. The key idea is to employ a
feasible reparametrization of the partially linear regression model that
reflects the semiparametric structure of the model. This allows a researcher to
assume independent priors for the model parameters while automatically
accounting for the loss of information associated with not knowing the nuisance
parameters. The theorem is verified for uniform wavelet series priors and
Mat\'{e}rn Gaussian process priors.
arXiv link: http://arxiv.org/abs/2306.03816v5
Uniform Inference for Cointegrated Vector Autoregressive Processes
has so far proven difficult due to certain discontinuities arising in the
asymptotic distribution of the least squares estimator. We extend asymptotic
results from the univariate case to multiple dimensions and show how inference
can be based on these results. Furthermore, we show that lag augmentation and a
recent instrumental variable procedure can also yield uniformly valid tests and
confidence regions. We verify the theoretical findings and investigate finite
sample properties in simulation experiments for two specific examples.
arXiv link: http://arxiv.org/abs/2306.03632v2
Forecasting the Performance of US Stock Market Indices During COVID-19: RF vs LSTM
(2007-2009). COVID-19 poses a significant challenge to US stock traders and
investors. Traders and investors should keep up with the stock market. This is
to mitigate risks and improve profits by using forecasting models that account
for the effects of the pandemic. With consideration of the COVID-19 pandemic
after the recession, two machine learning models, including Random Forest and
LSTM are used to forecast two major US stock market indices. Data on historical
prices after the big recession is used for developing machine learning models
and forecasting index returns. To evaluate the model performance during
training, cross-validation is used. Additionally, hyperparameter optimizing,
regularization, such as dropouts and weight decays, and preprocessing improve
the performances of Machine Learning techniques. Using high-accuracy machine
learning techniques, traders and investors can forecast stock market behavior,
stay ahead of their competition, and improve profitability. Keywords: COVID-19,
LSTM, S&P500, Random Forest, Russell 2000, Forecasting, Machine Learning, Time
Series JEL Code: C6, C8, G4.
arXiv link: http://arxiv.org/abs/2306.03620v1
Robust inference for the treatment effect variance in experiments using machine learning
the first valid confidence intervals for the VCATE, the treatment effect
variance explained by observables. Conventional approaches yield incorrect
coverage when the VCATE is zero. As a result, practitioners could be prone to
detect heterogeneity even when none exists. The reason why coverage worsens at
the boundary is that all efficient estimators have a locally-degenerate
influence function and may not be asymptotically normal. I solve the problem
for a broad class of multistep estimators with a predictive first stage. My
confidence intervals account for higher-order terms in the limiting
distribution and are fast to compute. I also find new connections between the
VCATE and the problem of deciding whom to treat. The gains of targeting
treatment are (sharply) bounded by half the square root of the VCATE. Finally,
I document excellent performance in simulation and reanalyze an experiment from
Malawi.
arXiv link: http://arxiv.org/abs/2306.03363v1
Inference for Local Projections
interesting challenges and opportunities. Analysts typically want to assess the
precision of individual estimates, explore the dynamic evolution of the
response over particular regions, and generally determine whether the impulse
generates a response that is any different from the null of no effect. Each of
these goals requires a different approach to inference. In this article, we
provide an overview of results that have appeared in the literature in the past
20 years along with some new procedures that we introduce here.
arXiv link: http://arxiv.org/abs/2306.03073v2
Improving the accuracy of bubble date estimators under time-varying volatility
time-varying volatility and propose the algorithm of estimating the break dates
with volatility correction: First, we estimate the emerging date of the
explosive bubble, its collapsing date, and the recovering date to the normal
market under assumption of homoskedasticity; second, we collect the residuals
and then employ the WLS-based estimation of the bubble dates. We demonstrate by
Monte Carlo simulations that the accuracy of the break dates estimators improve
significantly by this two-step procedure in some cases compared to those based
on the OLS method.
arXiv link: http://arxiv.org/abs/2306.02977v1
Synthetic Regressing Control Method
sparse weights where only a few control units have non-zero weights, involves
an optimization procedure that simultaneously selects and aligns control units
to closely match the treated unit. However, this simultaneous selection and
alignment of control units may lead to a loss of efficiency. Another concern
arising from the aforementioned procedure is its susceptibility to
under-fitting due to imperfect pre-treatment fit. It is not uncommon for the
linear combination, using nonnegative weights, of pre-treatment period outcomes
for the control units to inadequately approximate the pre-treatment outcomes
for the treated unit. To address both of these issues, this paper proposes a
simple and effective method called Synthetic Regressing Control (SRC). The SRC
method begins by performing the univariate linear regression to appropriately
align the pre-treatment periods of the control units with the treated unit.
Subsequently, a SRC estimator is obtained by synthesizing (taking a weighted
average) the fitted controls. To determine the weights in the synthesis
procedure, we propose an approach that utilizes a criterion of unbiased risk
estimator. Theoretically, we show that the synthesis way is asymptotically
optimal in the sense of achieving the lowest possible squared error. Extensive
numerical experiments highlight the advantages of the SRC method.
arXiv link: http://arxiv.org/abs/2306.02584v2
Individual Causal Inference Using Panel Data With Multiple Outcomes
the average treatment effect and more recently the heterogeneous treatment
effects, often relying on the unconfoundedness assumption. We propose a method
based on the interactive fixed effects model to estimate treatment effects at
the individual level, which allows both the treatment assignment and the
potential outcomes to be correlated with the unobserved individual
characteristics. This method is suitable for panel datasets where multiple
related outcomes are observed for a large number of individuals over a small
number of time periods. Monte Carlo simulations show that our method
outperforms related methods. To illustrate our method, we provide an example of
estimating the effect of health insurance coverage on individual usage of
hospital emergency departments using the Oregon Health Insurance Experiment
data.
arXiv link: http://arxiv.org/abs/2306.01969v1
The Synthetic Control Method with Nonlinear Outcomes: Estimating the Impact of the 2019 Anti-Extradition Law Amendments Bill Protests on Hong Kong's Economy
unbiased assuming that the outcome is a linear function of the underlying
predictors and that the treated unit can be well approximated by the synthetic
control before the treatment. When the outcome is nonlinear, the bias of the
synthetic control estimator can be severe. In this paper, we provide conditions
for the synthetic control estimator to be asymptotically unbiased when the
outcome is nonlinear, and propose a flexible and data-driven method to choose
the synthetic control weights. Monte Carlo simulations show that compared with
the competing methods, the nonlinear synthetic control method has similar or
better performance when the outcome is linear, and better performance when the
outcome is nonlinear, and that the confidence intervals have good coverage
probabilities across settings. In the empirical application, we illustrate the
method by estimating the impact of the 2019 anti-extradition law amendments
bill protests on Hong Kong's economy, and find that the year-long protests
reduced real GDP per capita in Hong Kong by 11.27% in the first quarter of
2020, which was larger in magnitude than the economic decline during the 1997
Asian financial crisis or the 2008 global financial crisis.
arXiv link: http://arxiv.org/abs/2306.01967v1
Load Asymptotics and Dynamic Speed Optimization for the Greenest Path Problem: A Comprehensive Analysis
of the most fuel-efficient(greenest) path for different trucks in various urban
environments.We adapt a variant of the Comprehensive Modal Emission Model(CMEM)
to show that the optimal speed and the greenest path are slope dependent
(dynamic).When there are no elevation changes in a road network, the most
fuel-efficient path is the shortest path with a constant (static) optimal speed
throughout.However, if the network is not flat, then the shortest path is not
necessarily the greenest path, and the optimal driving speed is dynamic.We
prove that the greenest path converges to an asymptotic greenest path as the
payload approaches infinity and that this limiting path is attained for a
finite load.In a set of extensive numerical experiments, we benchmark the
CO2emissions reduction of our dynamic speed and the greenest path policies
against policies that ignore elevation data.We use the geospatial data of
25major cities across 6continents.We observe numerically that the greenest path
quickly diverges from the shortest path and attains the asymptotic greenest
path even for moderate payloads.Based on an analysis of variance, the main
determinants of the CO2emissions reduction potential are the variation of the
road gradients along the shortest path as well as the relative elevation of the
source from the target.Using speed data estimates for rush hour in New York
City, we test CO2emissions reduction by comparing the greenest paths with
optimized speeds against the fastest paths with traffic speed.We observe that
selecting the greenest paths instead of the fastest paths can significantly
reduce CO2emissions.Additionally,our results show that while speed optimization
on uphill arcs can significantly help CO2reduction,the potential to leverage
gravity for acceleration on downhill arcs is limited due to traffic congestion.
arXiv link: http://arxiv.org/abs/2306.01687v2
Social Interactions in Endogenous Groups
a two-sided many-to-one matching model, where individuals select groups based
on preferences, while groups admit individuals based on qualifications until
reaching capacities. Endogenous formation of groups leads to selection bias in
peer effect estimation, which is complicated by equilibrium effects and
alternative groups. We propose novel methods to simplify selection bias and
develop a sieve OLS estimator for peer effects that is n-consistent and
asymptotically normal. Using Chilean data, we find that ignoring selection into
high schools leads to overestimated peer influence and distorts the estimation
of school effectiveness.
arXiv link: http://arxiv.org/abs/2306.01544v2
Rank-heterogeneous Preference Models for School Choice
and predict families' preferences. The most widely-used choice model, the
multinomial logit (MNL), is linear in school and/or household attributes. While
the model is simple and interpretable, it assumes the ranked preference lists
arise from a choice process that is uniform throughout the ranking, from top to
bottom. In this work, we introduce two strategies for rank-heterogeneous choice
modeling tailored for school choice. First, we adapt a context-dependent random
utility model (CDM), considering down-rank choices as occurring in the context
of earlier up-rank choices. Second, we consider stratifying the choice modeling
by rank, regularizing rank-adjacent models towards one another when
appropriate. Using data on household preferences from the San Francisco Unified
School District (SFUSD) across multiple years, we show that the contextual
models considerably improve our out-of-sample evaluation metrics across all
rank positions over the non-contextual models in the literature. Meanwhile,
stratifying the model by rank can yield more accurate first-choice predictions
while down-rank predictions are relatively unimproved. These models provide
performance upgrades that school choice researchers can adopt to improve
predictions and counterfactual analyses.
arXiv link: http://arxiv.org/abs/2306.01801v1
Causal Estimation of User Learning in Personalized Systems
change over time as 1) users learn about the intervention, and 2) the system
personalization, such as individualized recommendations, change over time. We
introduce a non-parametric causal model of user actions in a personalized
system. We show that the Cookie-Cookie-Day (CCD) experiment, designed for the
measurement of the user learning effect, is biased when there is
personalization. We derive new experimental designs that intervene in the
personalization system to generate the variation necessary to separately
identify the causal effect mediated through user learning and personalization.
Making parametric assumptions allows for the estimation of long-term causal
effects based on medium-term experiments. In simulations, we show that our new
designs successfully recover the dynamic causal effects of interest.
arXiv link: http://arxiv.org/abs/2306.00485v1
Inference in Predictive Quantile Regressions
predictive regressor has a near-unit root. We derive asymptotic distributions
for the quantile regression estimator and its heteroskedasticity and
autocorrelation consistent (HAC) t-statistic in terms of functionals of
Ornstein-Uhlenbeck processes. We then propose a switching-fully modified (FM)
predictive test for quantile predictability. The proposed test employs an FM
style correction with a Bonferroni bound for the local-to-unity parameter when
the predictor has a near unit root. It switches to a standard predictive
quantile regression test with a slightly conservative critical value when the
largest root of the predictor lies in the stationary range. Simulations
indicate that the test has a reliable size in small samples and good power. We
employ this new methodology to test the ability of three commonly employed,
highly persistent and endogenous lagged valuation regressors - the dividend
price ratio, earnings price ratio, and book-to-market ratio - to predict the
median, shoulders, and tails of the stock return distribution.
arXiv link: http://arxiv.org/abs/2306.00296v2
Deep Neural Network Estimation in Panel Data Models
data models. We provide asymptotic guarantees on deep feed-forward neural
network estimation of the conditional mean, building on the work of Farrell et
al. (2021), and explore latent patterns in the cross-section. We use the
proposed estimators to forecast the progression of new COVID-19 cases across
the G7 countries during the pandemic. We find significant forecasting gains
over both linear panel and nonlinear time series models. Containment or
lockdown policies, as instigated at the national-level by governments, are
found to have out-of-sample predictive power for new COVID-19 cases. We
illustrate how the use of partial derivatives can help open the "black-box" of
neural networks and facilitate semi-structural analysis: school and workplace
closures are found to have been effective policies at restricting the
progression of the pandemic across the G7 countries. But our methods illustrate
significant heterogeneity and time-variation in the effectiveness of specific
containment policies.
arXiv link: http://arxiv.org/abs/2305.19921v1
Quasi-Score Matching Estimation for Spatial Autoregressive Model with Random Weights Matrix and Regressors
application of the spatial autoregressive (SAR) model has become increasingly
prevalent in real-world analysis, particularly when dealing with large
datasets. However, the commonly used quasi-maximum likelihood estimation (QMLE)
for the SAR model is not computationally scalable to handle the data with a
large size. In addition, when establishing the asymptotic properties of the
parameter estimators of the SAR model, both weights matrix and regressors are
assumed to be nonstochastic in classical spatial econometrics, which is perhaps
not realistic in real applications. Motivated by the machine learning
literature, this paper proposes quasi-score matching estimation for the SAR
model. This new estimation approach is developed based on the likelihood, but
significantly reduces the computational complexity of the QMLE. The asymptotic
properties of parameter estimators under the random weights matrix and
regressors are established, which provides a new theoretical framework for the
asymptotic inference of the SAR-type models. The usefulness of the quasi-score
matching estimation and its asymptotic inference is illustrated via extensive
simulation studies and a case study of an anti-conflict social network
experiment for middle school students.
arXiv link: http://arxiv.org/abs/2305.19721v2
A Simple Method for Predicting Covariance Matrices of Financial Returns
covariance matrix of a vector of financial returns. Popular methods range from
simple predictors like rolling window or exponentially weighted moving average
(EWMA) to more sophisticated predictors such as generalized autoregressive
conditional heteroscedastic (GARCH) type methods. Building on a specific
covariance estimator suggested by Engle in 2002, we propose a relatively simple
extension that requires little or no tuning or fitting, is interpretable, and
produces results at least as good as MGARCH, a popular extension of GARCH that
handles multiple assets. To evaluate predictors we introduce a novel approach,
evaluating the regret of the log-likelihood over a time period such as a
quarter. This metric allows us to see not only how well a covariance predictor
does over all, but also how quickly it reacts to changes in market conditions.
Our simple predictor outperforms MGARCH in terms of regret. We also test
covariance predictors on downstream applications such as portfolio optimization
methods that depend on the covariance matrix. For these applications our simple
covariance predictor and MGARCH perform similarly.
arXiv link: http://arxiv.org/abs/2305.19484v2
Impulse Response Analysis of Structural Nonlinear Time Series Models
response functions of nonlinear time series within a general class of
structural autoregressive models. We prove that a two-step procedure can
flexibly accommodate nonlinear specifications while avoiding the need to choose
fixed parametric forms. Sieve impulse responses are proven to be consistent by
deriving uniform estimation guarantees, and an iterative algorithm makes it
straightforward to compute them in practice. With simulations, we show that the
proposed semiparametric approach proves effective against misspecification
while suffering only from minor efficiency losses. In a U.S. monetary policy
application, the pointwise sieve GDP response associated with an interest rate
increase is larger than that of a linear model. Finally, in an analysis of
interest rate uncertainty shocks, sieve responses indicate more substantial
contractionary effects on production and inflation.
arXiv link: http://arxiv.org/abs/2305.19089v6
Incorporating Domain Knowledge in Deep Neural Networks for Discrete Choice Models
a powerful theoretical econometric framework for understanding and predicting
choice behaviors. DCMs are formed as random utility models (RUM), with their
key advantage of interpretability. However, a core requirement for the
estimation of these models is a priori specification of the associated utility
functions, making them sensitive to modelers' subjective beliefs. Recently,
machine learning (ML) approaches have emerged as a promising avenue for
learning unobserved non-linear relationships in DCMs. However, ML models are
considered "black box" and may not correspond with expected relationships. This
paper proposes a framework that expands the potential of data-driven approaches
for DCM by supporting the development of interpretable models that incorporate
domain knowledge and prior beliefs through constraints. The proposed framework
includes pseudo data samples that represent required relationships and a loss
function that measures their fulfillment, along with observed data, for model
training. The developed framework aims to improve model interpretability by
combining ML's specification flexibility with econometrics and interpretable
behavioral analysis. A case study demonstrates the potential of this framework
for discrete choice analysis.
arXiv link: http://arxiv.org/abs/2306.00016v1
Generalized Autoregressive Score Trees and Forests
score (GAS) models (Creal et. al, 2013; Harvey, 2013) by localizing their
parameters using decision trees and random forests. These methods avoid the
curse of dimensionality faced by kernel-based approaches, and allow one to draw
on information from multiple state variables simultaneously. We apply the new
models to four distinct empirical analyses, and in all applications the
proposed new methods significantly outperform the baseline GAS model. In our
applications to stock return volatility and density prediction, the optimal GAS
tree model reveals a leverage effect and a variance risk premium effect. Our
study of stock-bond dependence finds evidence of a flight-to-quality effect in
the optimal GAS forest forecasts, while our analysis of high-frequency trade
durations uncovers a volume-volatility effect.
arXiv link: http://arxiv.org/abs/2305.18991v1
Nonlinear Impulse Response Functions and Local Projections
Response Functions (IRF) by means of local projections in the nonlinear dynamic
framework. We discuss the existence of a nonlinear autoregressive
representation for Markov processes and explain how their IRFs are directly
linked to the Nonlinear Local Projection (NLP), as in the case for the linear
setting. We present a fully nonparametric LP estimator in the one dimensional
nonlinear framework, compare its asymptotic properties to that of IRFs implied
by the nonlinear autoregressive model and show that the two approaches are
asymptotically equivalent. This extends the well-known result in the linear
autoregressive model by Plagborg-Moller and Wolf (2017). We also consider
extensions to the multivariate framework through the lens of semiparametric
models, and demonstrate that the indirect approach by the NLP is less accurate
than the direct estimation approach of the IRF.
arXiv link: http://arxiv.org/abs/2305.18145v2
Dynamic LATEs with a Static Instrument
of an irreversible treatment with a time-invariant binary instrumental variable
(IV). For example, in evaluations of dynamic effects of training programs with
a single lottery determining eligibility. A common approach in these situations
is to report per-period IV estimates. Under a dynamic extension of standard IV
assumptions, we show that such IV estimands identify a weighted sum of
treatment effects for different latent groups and treatment exposures. However,
there is possibility of negative weights. We discuss point and partial
identification of dynamic treatment effects in this setting under different
sets of assumptions.
arXiv link: http://arxiv.org/abs/2305.18114v3
Time-Varying Vector Error-Correction Models: Estimation and Inference
for different time series behaviours (e.g., unit-root and locally stationary
processes) to interact with each other to co-exist. From practical
perspectives, this framework can be used to estimate shifts in the
predictability of non-stationary variables, test whether economic theories hold
periodically, etc. We first develop a time-varying Granger Representation
Theorem, which facilitates the establishment of asymptotic properties for the
model, and then propose estimation and inferential methods and theory for both
short-run and long-run coefficients. We also propose an information criterion
to estimate the lag length, a singular-value ratio test to determine the
cointegration rank, and a hypothesis test to examine the parameter stability.
To validate the theoretical findings, we conduct extensive simulations.
Finally, we demonstrate the empirical relevance by applying the framework to
investigate the rational expectations hypothesis of the U.S. term structure.
arXiv link: http://arxiv.org/abs/2305.17829v1
Estimating overidentified linear models with heteroskedasticity and outliers
conventional heuristic rule used to motivate new estimators in this context is
approximate bias. This paper formalizes the definition of approximate bias and
expands the applicability of approximate bias to various classes of estimators
that bridge OLS, TSLS, and Jackknife IV estimators (JIVEs). By evaluating their
approximate biases, I propose new approximately unbiased estimators, including
UOJIVE1 and UOJIVE2. UOJIVE1 can be interpreted as a generalization of an
existing estimator UIJIVE1. Both UOJIVEs are proven to be consistent and
asymptotically normal under a fixed number of instruments and controls. The
asymptotic proofs for UOJIVE1 in this paper require the absence of high
leverage points, whereas proofs for UOJIVE2 do not. In addition, UOJIVE2 is
consistent under many-instrument asymptotic. The simulation results align with
the theorems in this paper: (i) Both UOJIVEs perform well under many instrument
scenarios with or without heteroskedasticity, (ii) When a high leverage point
coincides with a high variance of the error term, an outlier is generated and
the performance of UOJIVE1 is much poorer than that of UOJIVE2.
arXiv link: http://arxiv.org/abs/2305.17615v5
Using Limited Trial Evidence to Credibly Choose Treatment Dosage when Efficacy and Adverse Effects Weakly Increase with Dose
intensity (dosage) on evidence in randomized trials. Yet it has been rare to
study how outcomes vary with dosage. In trials to obtain drug approval, the
norm has been to specify some dose of a new drug and compare it with an
established therapy or placebo. Design-based trial analysis views each trial
arm as qualitatively different, but it may be highly credible to assume that
efficacy and adverse effects (AEs) weakly increase with dosage. Optimization of
patient care requires joint attention to both, as well as to treatment cost.
This paper develops methodology to credibly use limited trial evidence to
choose dosage when efficacy and AEs weakly increase with dose. I suppose that
dosage is an integer choice t in (0, 1, . . . , T), T being a specified maximum
dose. I study dosage choice when trial evidence on outcomes is available for
only K dose levels, where K < T + 1. Then the population distribution of dose
response is partially rather than point identified. The identification region
is a convex polygon determined by linear equalities and inequalities. I
characterize clinical and public-health decision making using the
minimax-regret criterion. A simple analytical solution exists when T = 2 and
computation is tractable when T is larger.
arXiv link: http://arxiv.org/abs/2305.17206v1
A Policy Gradient Method for Confounded POMDPs
observable Markov decision processes (POMDPs) with continuous state and
observation spaces in the offline setting. We first establish a novel
identification result to non-parametrically estimate any history-dependent
policy gradient under POMDPs using the offline data. The identification enables
us to solve a sequence of conditional moment restrictions and adopt the min-max
learning procedure with general function approximation for estimating the
policy gradient. We then provide a finite-sample non-asymptotic bound for
estimating the gradient uniformly over a pre-specified policy class in terms of
the sample size, length of horizon, concentratability coefficient and the
measure of ill-posedness in solving the conditional moment restrictions.
Lastly, by deploying the proposed gradient estimation in the gradient ascent
algorithm, we show the global convergence of the proposed algorithm in finding
the history-dependent optimal policy under some technical conditions. To the
best of our knowledge, this is the first work studying the policy gradient
method for POMDPs under the offline setting.
arXiv link: http://arxiv.org/abs/2305.17083v2
When is cross impact relevant?
referred to as cross impact. Using tick-by-tick data spanning 5 years for 500
assets listed in the United States, we identify the features that make
cross-impact relevant to explain the variance of price returns. We show that
price formation occurs endogenously within highly liquid assets. Then, trades
in these assets influence the prices of less liquid correlated products, with
an impact velocity constrained by their minimum trading frequency. We
investigate the implications of such a multidimensional price formation
mechanism on interest rate markets. We find that the 10-year bond future serves
as the primary liquidity reservoir, influencing the prices of cash bonds and
futures contracts within the interest rate curve. Such behaviour challenges the
validity of the theory in Financial Economics that regards long-term rates as
agents anticipations of future short term rates.
arXiv link: http://arxiv.org/abs/2305.16915v2
Fast and Order-invariant Inference in Bayesian VARs with Non-Parametric Shocks
(VARs) have the potential to be non-Gaussian, exhibiting asymmetries and fat
tails. This consideration motivates the VAR developed in this paper which uses
a Dirichlet process mixture (DPM) to model the shocks. However, we do not
follow the obvious strategy of simply modeling the VAR errors with a DPM since
this would lead to computationally infeasible Bayesian inference in larger VARs
and potentially a sensitivity to the way the variables are ordered in the VAR.
Instead we develop a particular additive error structure inspired by Bayesian
nonparametric treatments of random effects in panel data models. We show that
this leads to a model which allows for computationally fast and order-invariant
inference in large VARs with nonparametric shocks. Our empirical results with
nonparametric VARs of various dimensions shows that nonparametric treatment of
the VAR errors is particularly useful in periods such as the financial crisis
and the pandemic.
arXiv link: http://arxiv.org/abs/2305.16827v1
Hierarchical forecasting for aggregated curves with an application to day-ahead electricity price auctions
most prominent examples are supply and demand curves. In this study, we exploit
the fact that all aggregated curves have an intrinsic hierarchical structure,
and thus hierarchical reconciliation methods can be used to improve the
forecast accuracy. We provide an in-depth theory on how aggregated curves can
be constructed or deconstructed, and conclude that these methods are equivalent
under weak assumptions. We consider multiple reconciliation methods for
aggregated curves, including previously established bottom-up, top-down, and
linear optimal reconciliation approaches. We also present a new benchmark
reconciliation method called 'aggregated-down' with similar complexity to
bottom-up and top-down approaches, but it tends to provide better accuracy in
this setup. We conducted an empirical forecasting study on the German day-ahead
power auction market by predicting the demand and supply curves, where their
equilibrium determines the electricity price for the next day. Our results
demonstrate that hierarchical reconciliation methods can be used to improve the
forecasting accuracy of aggregated curves.
arXiv link: http://arxiv.org/abs/2305.16255v1
Validating a dynamic input-output model for the propagation of supply and demand shocks during the COVID-19 pandemic in Belgium
impact of economic shocks caused by COVID-19 in the UK, using data for Belgium.
Because the model was published early during the 2020 COVID-19 pandemic, it
relied on several assumptions regarding the magnitude of the observed economic
shocks, for which more accurate data have become available in the meantime. We
refined the propagated shocks to align with observed data collected during the
pandemic and calibrated some less well-informed parameters using 115 economic
time series. The refined model effectively captures the evolution of GDP,
revenue, and employment during the COVID-19 pandemic in Belgium at both
individual economic activity and aggregate levels. However, the reduction in
business-to-business demand is overestimated, revealing structural shortcomings
in accounting for businesses' motivations to sustain trade despite the
pandemic's induced shocks. We confirm that the relaxation of the stringent
Leontief production function by a survey on the criticality of inputs
significantly improved the model's accuracy. However, despite a large dataset,
distinguishing between varying degrees of relaxation proved challenging.
Overall, this work demonstrates the model's validity in assessing the impact of
economic shocks caused by an epidemic in Belgium.
arXiv link: http://arxiv.org/abs/2305.16377v2
Adapting to Misspecification
researcher seeking to estimate a scalar parameter can invoke strong assumptions
to motivate a restricted estimator that is precise but may be heavily biased,
or they can relax some of these assumptions to motivate a more robust, but
variable, unrestricted estimator. When a bound on the bias of the restricted
estimator is available, it is optimal to shrink the unrestricted estimator
towards the restricted estimator. For settings where a bound on the bias of the
restricted estimator is unknown, we propose adaptive estimators that minimize
the percentage increase in worst case risk relative to an oracle that knows the
bound. We show that adaptive estimators solve a weighted convex minimax problem
and provide lookup tables facilitating their rapid computation. Revisiting some
well known empirical studies where questions of model specification arise, we
examine the advantages of adapting to -- rather than testing for --
misspecification.
arXiv link: http://arxiv.org/abs/2305.14265v6
Flexible Bayesian Quantile Analysis of Residential Rental Rates
data that allows for increased distributional flexibility, multivariate
heterogeneity, and time-invariant covariates in situations where mean
regression may be unsuitable. Our approach is Bayesian and builds upon the
generalized asymmetric Laplace distribution to decouple the modeling of
skewness from the quantile parameter. We derive an efficient simulation-based
estimation algorithm, demonstrate its properties and performance in targeted
simulation studies, and employ it in the computation of marginal likelihoods to
enable formal Bayesian model comparisons. The methodology is applied in a study
of U.S. residential rental rates following the Global Financial Crisis. Our
empirical results provide interesting insights on the interaction between rents
and economic, demographic and policy variables, weigh in on key modeling
features, and overwhelmingly support the additional flexibility at nearly all
quantiles and across several sub-samples. The practical differences that arise
as a result of allowing for flexible modeling can be nontrivial, especially for
quantiles away from the median.
arXiv link: http://arxiv.org/abs/2305.13687v2
Prediction Risk and Estimation Risk of the Ridgeless Least Squares Estimator under General Assumptions on Regression Errors
minimum $\ell_2$ norm (ridgeless) interpolation least squares estimators.
However, the majority of these analyses have been limited to an unrealistic
regression error structure, assuming independent and identically distributed
errors with zero mean and common variance. In this paper, we explore prediction
risk as well as estimation risk under more general regression error
assumptions, highlighting the benefits of overparameterization in a more
realistic setting that allows for clustered or serial dependence. Notably, we
establish that the estimation difficulties associated with the variance
components of both risks can be summarized through the trace of the
variance-covariance matrix of the regression errors. Our findings suggest that
the benefits of overparameterization can extend to time series, panel and
grouped data.
arXiv link: http://arxiv.org/abs/2305.12883v3
Federated Offline Policy Learning
observational bandit feedback data across multiple heterogeneous data sources.
In our approach, we introduce a novel regret analysis that establishes
finite-sample upper bounds on distinguishing notions of global regret for all
data sources on aggregate and of local regret for any given data source. We
characterize these regret bounds by expressions of source heterogeneity and
distribution shift. Moreover, we examine the practical considerations of this
problem in the federated setting where a central server aims to train a policy
on data distributed across the heterogeneous sources without collecting any of
their raw data. We present a policy learning algorithm amenable to federation
based on the aggregation of local policies trained with doubly robust offline
policy evaluation strategies. Our analysis and supporting experimental results
provide insights into tradeoffs in the participation of heterogeneous data
sources in offline policy learning.
arXiv link: http://arxiv.org/abs/2305.12407v2
Identification and Estimation of Production Function with Unobserved Heterogeneity
functions, considering firm heterogeneity beyond Hicks-neutral technology
terms. We propose a finite mixture model to account for unobserved
heterogeneity in production technology and productivity growth processes. Our
analysis demonstrates that the production function for each latent type can be
nonparametrically identified using four periods of panel data, relying on
assumptions similar to those employed in existing literature on production
function and panel data identification. By analyzing Japanese plant-level panel
data, we uncover significant disparities in estimated input elasticities and
productivity growth processes among latent types within narrowly defined
industries. We further show that neglecting unobserved heterogeneity in input
elasticities may lead to substantial and systematic bias in the estimation of
productivity growth.
arXiv link: http://arxiv.org/abs/2305.12067v1
Statistical Estimation for Covariance Structures with Tail Estimates using Nodewise Quantile Predictive Regression Models
estimates. We focus on two aspects: (i) the estimation of the VaR-CoVaR risk
matrix in the case of larger number of time series observations than assets in
a portfolio using quantile predictive regression models without assuming the
presence of nonstationary regressors and; (ii) the construction of a novel
variable selection algorithm, so-called, Feature Ordering by Centrality
Exclusion (FOCE), which is based on an assumption-lean regression framework,
has no tuning parameters and is proved to be consistent under general sparsity
assumptions. We illustrate the usefulness of our proposed methodology with
numerical studies of real and simulated datasets when modelling systemic risk
in a network.
arXiv link: http://arxiv.org/abs/2305.11282v2
Context-Dependent Heterogeneous Preferences: A Comment on Barseghyan and Molinari (2023)
semi-nonparametric point identification of parameters of interest in a mixture
model of decision-making under risk, allowing for unobserved heterogeneity in
utility functions and limited consideration. A key assumption in the model is
that the heterogeneity of risk preferences is unobservable but
context-independent. In this comment, we build on their insights and present
identification results in a setting where the risk preferences are allowed to
be context-dependent.
arXiv link: http://arxiv.org/abs/2305.10934v1
Modeling Interference Using Experiment Roll-out
interference, where the outcome of a unit is impacted by the treatment status
of other units. We propose a framework for modeling interference using a
ubiquitous deployment mechanism for experiments, staggered roll-out designs,
which slowly increase the fraction of units exposed to the treatment to
mitigate any unanticipated adverse side effects. Our main idea is to leverage
the temporal variations in treatment assignments introduced by roll-outs to
model the interference structure. Since there are often multiple competing
models of interference in practice we first develop a model selection method
that evaluates models based on their ability to explain outcome variation
observed along the roll-out. Through simulations, we show that our heuristic
model selection method, Leave-One-Period-Out, outperforms other baselines.
Next, we present a set of model identification conditions under which the
estimation of common estimands is possible and show how these conditions are
aided by roll-out designs. We conclude with a set of considerations, robustness
checks, and potential limitations for practitioners wishing to use our
framework.
arXiv link: http://arxiv.org/abs/2305.10728v2
Nowcasting with signature methods
month. The nowcasting literature has arisen to provide fast, reliable estimates
of delayed economic indicators and is closely related to filtering methods in
signal processing. The path signature is a mathematical object which captures
geometric properties of sequential data; it naturally handles missing data from
mixed frequency and/or irregular sampling -- issues often encountered when
merging multiple data sources -- by embedding the observed data in continuous
time. Calculating path signatures and using them as features in models has
achieved state-of-the-art results in fields such as finance, medicine, and
cyber security. We look at the nowcasting problem by applying regression on
signatures, a simple linear model on these nonlinear objects that we show
subsumes the popular Kalman filter. We quantify the performance via a
simulation exercise, and through application to nowcasting US GDP growth, where
we see a lower error than a dynamic factor model based on the New York Fed
staff nowcasting model. Finally we demonstrate the flexibility of this method
by applying regression on signatures to nowcast weekly fuel prices using daily
data. Regression on signatures is an easy-to-apply approach that allows great
flexibility for data with complex sampling patterns.
arXiv link: http://arxiv.org/abs/2305.10256v1
Monitoring multicountry macroeconomic risk
(QFAVAR) to model heterogeneities both across countries and across
characteristics of the distributions of macroeconomic time series. The presence
of quantile factors allows for summarizing these two heterogeneities in a
parsimonious way. We develop two algorithms for posterior inference that
feature varying level of trade-off between estimation precision and
computational speed. Using monthly data for the euro area, we establish the
good empirical properties of the QFAVAR as a tool for assessing the effects of
global shocks on country-level macroeconomic risks. In particular, QFAVAR
short-run tail forecasts are more accurate compared to a FAVAR with symmetric
Gaussian errors, as well as univariate quantile autoregressions that ignore
comovements among quantiles of macroeconomic variables. We also illustrate how
quantile impulse response functions and quantile connectedness measures,
resulting from the new model, can be used to implement joint risk scenario
analysis.
arXiv link: http://arxiv.org/abs/2305.09563v1
Grenander-type Density Estimation under Myerson Regularity
values from second-price auctions, diverging from the conventional use of
smoothing-based estimators. We introduce a Grenander-type estimator,
constructed based on a shape restriction in the form of a convexity constraint.
This constraint corresponds to the renowned Myerson regularity condition in
auction theory, which is equivalent to the concavity of the revenue function
for selling the auction item. Our estimator is nonparametric and does not
require any tuning parameters. Under mild assumptions, we establish the
cube-root consistency and show that the estimator asymptotically follows the
scaled Chernoff's distribution. Moreover, we demonstrate that the estimator
achieves the minimax optimal convergence rate.
arXiv link: http://arxiv.org/abs/2305.09052v1
Designing Discontinuities
on outcomes in larger systems. Indeed, their arbitrariness is why they have
been used to infer causal relationships among variables in numerous settings.
Regression discontinuity from econometrics assumes the existence of a
discontinuous variable that splits the population into distinct partitions to
estimate the causal effects of a given phenomenon. Here we consider the design
of partitions for a given discontinuous variable to optimize a certain effect
previously studied using regression discontinuity. To do so, we propose a
quantization-theoretic approach to optimize the effect of interest, first
learning the causal effect size of a given discontinuous variable and then
applying dynamic programming for optimal quantization design of discontinuities
to balance the gain and loss in that effect size. We also develop a
computationally-efficient reinforcement learning algorithm for the dynamic
programming formulation of optimal quantization. We demonstrate our approach by
designing optimal time zone borders for counterfactuals of social capital,
social mobility, and health. This is based on regression discontinuity analyses
we perform on novel data, which may be of independent empirical interest.
arXiv link: http://arxiv.org/abs/2305.08559v3
Hierarchical DCC-HEAVY Model for High-Dimensional Covariance Matrices
high-dimensional covariance matrices, employing the realized measures built
from higher-frequency data. The modelling approach features straightforward
estimation and forecasting schemes, independent of the cross-sectional
dimension of the assets under consideration, and accounts for sophisticated
asymmetric dynamics in the covariances. Empirical analyses suggest that the HD
DCC-HEAVY models have a better in-sample fit and deliver statistically and
economically significant out-of-sample gains relative to the existing
hierarchical factor model and standard benchmarks. The results are robust under
different frequencies and market conditions.
arXiv link: http://arxiv.org/abs/2305.08488v2
Efficient Semiparametric Estimation of Average Treatment Effects Under Covariate Adaptive Randomization
in applied economics and other fields. In such experiments, the experimenter
first stratifies the sample according to observed baseline covariates and then
assigns treatment randomly within these strata so as to achieve balance
according to pre-specified stratum-specific target assignment proportions. In
this paper, we compute the semiparametric efficiency bound for estimating the
average treatment effect (ATE) in such experiments with binary treatments
allowing for the class of CAR procedures considered in Bugni, Canay, and Shaikh
(2018, 2019). This is a broad class of procedures and is motivated by those
used in practice. The stratum-specific target proportions play the role of the
propensity score conditional on all baseline covariates (and not just the
strata) in these experiments. Thus, the efficiency bound is a special case of
the bound in Hahn (1998), but conditional on all baseline covariates.
Additionally, this efficiency bound is shown to be achievable under the same
conditions as those used to derive the bound by using a cross-fitted
Nadaraya-Watson kernel estimator to form nonparametric regression adjustments.
arXiv link: http://arxiv.org/abs/2305.08340v1
The Nonstationary Newsvendor with (and without) Predictions
selecting a quantity of inventory, under the assumption that the demand is
drawn from a known distribution. Motivated by applications such as cloud
provisioning and staffing, we consider a setting in which newsvendor-type
decisions must be made sequentially, in the face of demand drawn from a
stochastic process that is both unknown and nonstationary. All prior work on
this problem either (a) assumes that the level of nonstationarity is known, or
(b) imposes additional statistical assumptions that enable accurate predictions
of the unknown demand. Our research tackles the Nonstationary Newsvendor
without these assumptions, both with and without predictions.
We first, in the setting without predictions, design a policy which we prove
achieves order-optimal regret -- ours is the first policy to accomplish this
without being given the level of nonstationarity of the underlying demand. We
then, for the first time, introduce a model for generic (i.e. with no
statistical assumptions) predictions with arbitrary accuracy, and propose a
policy that incorporates these predictions without being given their accuracy.
We upper bound the regret of this policy, and show that it matches the best
achievable regret had the accuracy of the predictions been known.
Our findings provide valuable insights on inventory management. Managers can
make more informed and effective decisions in dynamic environments, reducing
costs and enhancing service levels despite uncertain demand patterns. We
empirically validate our new policy with experiments based on three real-world
datasets containing thousands of time-series, showing that it succeeds in
closing approximately 74% of the gap between the best approaches based on
nonstationarity and predictions alone.
arXiv link: http://arxiv.org/abs/2305.07993v4
Semiparametrically Optimal Cointegration Test
cointegration rank testing in finite-order vector autoregressive models, where
the innovation distribution is considered an infinite-dimensional nuisance
parameter. Our asymptotic analysis relies on Le Cam's theory of limit
experiment, which in this context takes the form of Locally Asymptotically
Brownian Functional (LABF). By leveraging the structural version of LABF, an
Ornstein-Uhlenbeck experiment, we develop the asymptotic power envelopes of
asymptotically invariant tests for both cases with and without a time trend. We
propose feasible tests based on a nonparametrically estimated density and
demonstrate that their power can achieve the semiparametric power envelopes,
making them semiparametrically optimal. We validate the theoretical results
through large-sample simulations and illustrate satisfactory size control and
excellent power performance of our tests under small samples. In both cases
with and without time trend, we show that a remarkable amount of additional
power can be obtained from non-Gaussian distributions.
arXiv link: http://arxiv.org/abs/2305.08880v1
Band-Pass Filtering with High-Dimensional Time Series
growth, obtained by projecting a quarterly measure of aggregate economic
activity, namely gross domestic product (GDP), into the space spanned by a
finite number of smooth principal components, representative of the
medium-to-long-run component of economic growth of a high-dimensional time
series, available at the monthly frequency. The smooth principal components
result from applying a cross-sectional filter distilling the low-pass component
of growth in real time. The outcome of the projection is a monthly nowcast of
the medium-to-long-run component of GDP growth. After discussing the
theoretical properties of the indicator, we deal with the assessment of its
reliability and predictive validity with reference to a panel of macroeconomic
U.S. time series.
arXiv link: http://arxiv.org/abs/2305.06618v1
The price elasticity of Gleevec in patients with Chronic Myeloid Leukemia enrolled in Medicare Part D: Evidence from a regression discontinuity design
myeloid leukemia (CML) patients on Medicare Part D to determine if high
out-of-pocket payments (OOP) are driving the substantial levels of
non-adherence observed in this population.
Data sources and study setting We use data from the TriNetX Diamond Network
(TDN) United States database for the period from first availability in 2011
through the end of patent exclusivity following the introduction of generic
imatinib in early 2016.
Study design We implement a fuzzy regression discontinuity design to
separately estimate the effect of Medicare Part D enrollment at age 65 on
adherence and OOP in newly-diagnosed CML patients initiating branded imatinib.
The corresponding price elasticity of demand (PED) is estimated and results are
assessed across a variety of specifications and robustness checks.
Data collection/extraction methods Data from eligible patients following the
application of inclusion and exclusion criteria were analyzed.
Principal findings Our analysis suggests that there is a significant increase
in initial OOP of $232 (95% Confidence interval (CI): $102 to $362) for
individuals that enrolled in Part D due to expanded eligibility at age 65. The
relatively smaller and non-significant decrease in adherence of only 6
percentage points (95% CI: -0.21 to 0.08) led to a PED of -0.02 (95% CI:
-0.056, 0.015).
Conclusion This study provides evidence regarding the financial impact of
coinsurance-based benefit designs on Medicare-age patients with CML initiating
branded imatinib. Results indicate that factors besides high OOP are driving
the substantial non-adherence observed in this population and add to the
growing literature on PED for specialty drugs.
arXiv link: http://arxiv.org/abs/2305.06076v1
On the Time-Varying Structure of the Arbitrage Pricing Theory using the Japanese Sector Indices
the Japanese stock market. In particular, we measure how changes in each risk
factor affect the stock risk premiums to investigate the validity of the APT
over time, applying the rolling window method to Fama and MacBeth's (1973)
two-step regression and Kamstra and Shi's (2023) generalized GRS test. We
summarize our empirical results as follows: (1) the changes in monetary policy
by major central banks greatly affect the validity of the APT in Japan, and (2)
the time-varying estimates of the risk premiums for each factor are also
unstable over time, and they are affected by the business cycle and economic
crises. Therefore, we conclude that the validity of the APT as an appropriate
model to explain the Japanese sector index is not stable over time.
arXiv link: http://arxiv.org/abs/2305.05998v4
Does Principal Component Analysis Preserve the Sparsity in Sparse Weak Factor Models?
weak factor models with sparse loadings. We uncover an intrinsic near-sparsity
preservation property for the PC estimators of loadings, which comes from the
approximately upper triangular (block) structure of the rotation matrix. It
implies an asymmetric relationship among factors: the rotated loadings for a
stronger factor can be contaminated by those from a weaker one, but the
loadings for a weaker factor is almost free of the impact of those from a
stronger one. More importantly, the finding implies that there is no need to
use complicated penalties to sparsify the loading estimators. Instead, we adopt
a simple screening method to recover the sparsity and construct estimators for
various factor strengths. In addition, for sparse weak factor models, we
provide a singular value thresholding-based approach to determine the number of
factors and establish uniform convergence rates for PC estimators, which
complement Bai and Ng (2023). The accuracy and efficiency of the proposed
estimators are investigated via Monte Carlo simulations. The application to the
FRED-QD dataset reveals the underlying factor strengths and loading sparsity as
well as their dynamic features.
arXiv link: http://arxiv.org/abs/2305.05934v2
Volatility of Volatility and Leverage Effect from Options
volatility and leverage effect using high-frequency observations of short-dated
options. At each point in time, we integrate available options into estimates
of the conditional characteristic function of the price increment until the
options' expiration and we use these estimates to recover spot volatility. Our
volatility of volatility estimator is then formed from the sample variance and
first-order autocovariance of the spot volatility increments, with the latter
correcting for the bias in the former due to option observation errors. The
leverage effect estimator is the sample covariance between price increments and
the estimated volatility increments. The rate of convergence of the estimators
depends on the diffusive innovations in the latent volatility process as well
as on the observation error in the options with strikes in the vicinity of the
current spot price. Feasible inference is developed in a way that does not
require prior knowledge of the source of estimation error that is
asymptotically dominating.
arXiv link: http://arxiv.org/abs/2305.04137v2
Risk management in the use of published statistical results for policy decisions
for decision-making purposes. For a policy implementer, the value of
implementing published policy research depends critically upon this
reliability. For a policy researcher, the value of policy implementation may
depend weakly or not at all upon the policy's outcome. Some researchers might
benefit from overstating the reliability of statistical results. Implementers
may find it difficult or impossible to determine whether researchers are
overstating reliability. This information asymmetry between researchers and
implementers can lead to an adverse selection problem where, at best, the full
benefits of a policy are not realized or, at worst, a policy is deemed too
risky to implement at any scale. Researchers can remedy this by guaranteeing
the policy outcome. Researchers can overcome their own risk aversion and wealth
constraints by exchanging risks with other researchers or offering only partial
insurance. The problem and remedy are illustrated using a confidence interval
for the success probability of a binomial policy outcome.
arXiv link: http://arxiv.org/abs/2305.03205v2
Debiased Inference for Dynamic Nonlinear Panels with Multi-dimensional Heterogeneities
models that incorporate individual and time fixed effects in both the intercept
and slope. These models are subject to the incidental parameter problem, in
that the limiting distribution of the point estimator is not centered at zero,
and that test statistics do not follow their standard asymptotic distributions
as in the absence of the fixed effects. To address the problem, we develop an
analytical bias correction procedure to construct a bias-corrected likelihood.
The resulting estimator follows an asymptotic normal distribution with mean
zero. Moreover, likelihood-based tests statistics -- including
likelihood-ratio, Lagrange-multiplier, and Wald tests -- follow the limiting
chi-squared distribution under the null hypothesis. Simulations demonstrate the
effectiveness of the proposed correction method, and an empirical application
on the labor force participation of single mothers underscores its practical
importance.
arXiv link: http://arxiv.org/abs/2305.03134v4
Doubly Robust Uniform Confidence Bands for Group-Time Conditional Average Treatment Effects in Difference-in-Differences
effects with respect to groups, periods, and a pre-treatment covariate of
interest in the staggered difference-in-differences setting of Callaway and
Sant'Anna (2021). Under standard identification conditions, a doubly robust
estimand conditional on the covariate identifies the group-time conditional
average treatment effect given the covariate. Focusing on the case of a
continuous covariate, we propose a three-step estimation procedure based on
nonparametric local polynomial regressions and parametric estimation methods.
Using uniformly valid distributional approximation results for empirical
processes and weighted/multiplier bootstrapping, we develop doubly robust
inference methods to construct uniform confidence bands for the group-time
conditional average treatment effect function and a variety of useful summary
parameters. The accompanying R package didhetero allows for easy implementation
of our methods.
arXiv link: http://arxiv.org/abs/2305.02185v4
Large Global Volatility Matrix Analysis Based on Observation Structural Information
procedure for analyzing global financial markets. Practitioners often use
lower-frequency data, such as weekly or monthly returns, to address the issue
of different trading hours in the international financial market. However, this
approach can lead to inefficiency due to information loss. To mitigate this
problem, our proposed method, called Structured Principal Orthogonal complEment
Thresholding (Structured-POET), incorporates observation structural information
for both global and national factor models. We establish the asymptotic
properties of the Structured-POET estimator, and also demonstrate the drawbacks
of conventional covariance matrix estimation procedures when using
lower-frequency data. Finally, we apply the Structured-POET estimator to an
out-of-sample portfolio allocation study using international stock market data.
arXiv link: http://arxiv.org/abs/2305.01464v3
Transfer Estimates for Causal Effects across Heterogeneous Sites
heterogeneous populations (“sites"/“contexts"). We consider an idealized
scenario in which the researcher observes cross-sectional data for a large
number of units across several “experimental" sites in which an intervention
has already been implemented to a new “target" site for which a baseline
survey of unit-specific, pre-treatment outcomes and relevant attributes is
available. Our approach treats the baseline as functional data, and this choice
is motivated by the observation that unobserved site-specific confounders
manifest themselves not only in average levels of outcomes, but also how these
interact with observed unit-specific attributes. We consider the problem of
determining the optimal finite-dimensional feature space in which to solve that
prediction problem. Our approach is design-based in the sense that the
performance of the predictor is evaluated given the specific, finite selection
of experimental and target sites. Our approach is nonparametric, and our formal
results concern the construction of an optimal basis of predictors as well as
convergence rates for the estimated conditional average treatment effect
relative to the constrained-optimal population predictor for the target site.
We quantify the potential gains from adapting experimental estimates to a
target location in an application to conditional cash transfer (CCT) programs
using a combined data set from five multi-site randomized controlled trials.
arXiv link: http://arxiv.org/abs/2305.01435v7
Estimating Input Coefficients for Regional Input-Output Tables Using Deep Learning with Mixup
situation of a region. Generally, the input-output table for each region
(regional input-output table) in Japan is not always publicly available, so it
is necessary to estimate the table. In particular, various methods have been
developed for estimating input coefficients, which are an important part of the
input-output table. Currently, non-survey methods are often used to estimate
input coefficients because they require less data and computation, but these
methods have some problems, such as discarding information and requiring
additional data for estimation.
In this study, the input coefficients are estimated by approximating the
generation process with an artificial neural network (ANN) to mitigate the
problems of the non-survey methods and to estimate the input coefficients with
higher precision. To avoid over-fitting due to the small data used, data
augmentation, called mixup, is introduced to increase the data size by
generating virtual regions through region composition and scaling.
By comparing the estimates of the input coefficients with those of Japan as a
whole, it is shown that the accuracy of the method of this research is higher
and more stable than that of the conventional non-survey methods. In addition,
the estimated input coefficients for the three cities in Japan are generally
close to the published values for each city.
arXiv link: http://arxiv.org/abs/2305.01201v3
Estimation and Inference in Threshold Predictive Regression Models with Locally Explosive Regressors
model with hybrid stochastic local unit root predictors. We demonstrate the
estimation procedure and derive the asymptotic distribution of the least square
estimator and the IV based estimator proposed by Magdalinos and Phillips
(2009), under the null hypothesis of a diminishing threshold effect. Simulation
experiments focus on the finite sample performance of our proposed estimators
and the corresponding predictability tests as in Gonzalo and Pitarakis (2012),
under the presence of threshold effects with stochastic local unit roots. An
empirical application to stock return equity indices, illustrate the usefulness
of our framework in uncovering regimes of predictability during certain
periods. In particular, we focus on an aspect not previously examined in the
predictability literature, that is, the effect of economic policy uncertainty.
arXiv link: http://arxiv.org/abs/2305.00860v3
Double and Single Descent in Causal Inference with an Application to High-Dimensional Synthetic Control
learning, we consider highly over-parameterized models in causal inference,
including synthetic control with many control units. In such models, there may
be so many free parameters that the model fits the training data perfectly. We
first investigate high-dimensional linear regression for imputing wage data and
estimating average treatment effects, where we find that models with many more
covariates than sample size can outperform simple ones. We then document the
performance of high-dimensional synthetic control estimators with many control
units. We find that adding control units can help improve imputation
performance even beyond the point where the pre-treatment fit is perfect. We
provide a unified theoretical perspective on the performance of these
high-dimensional models. Specifically, we show that more complex models can be
interpreted as model-averaging estimators over simpler ones, which we link to
an improvement in average performance. This perspective yields concrete
insights into the use of synthetic control when control units are many relative
to the number of pre-treatment periods.
arXiv link: http://arxiv.org/abs/2305.00700v3
Optimal tests following sequential experiments
sequential experiments. While these experiments are not always designed with
hypothesis testing in mind, researchers may still be interested in performing
tests after the experiment is completed. The purpose of this paper is to aid in
the development of optimal tests for sequential experiments by analyzing their
asymptotic properties. Our key finding is that the asymptotic power function of
any test can be matched by a test in a limit experiment where a Gaussian
process is observed for each treatment, and inference is made for the drifts of
these processes. This result has important implications, including a powerful
sufficiency result: any candidate test only needs to rely on a fixed set of
statistics, regardless of the type of sequential experiment. These statistics
are the number of times each treatment has been sampled by the end of the
experiment, along with final value of the score (for parametric models) or
efficient influence function (for non-parametric models) process for each
treatment. We then characterize asymptotically optimal tests under various
restrictions such as unbiasedness, \alpha-spending constraints etc. Finally, we
apply our our results to three key classes of sequential experiments: costly
sampling, group sequential trials, and bandit experiments, and show how optimal
inference can be conducted in these scenarios.
arXiv link: http://arxiv.org/abs/2305.00403v2
Augmented balancing weights as linear regression
known as automatic debiased machine learning (AutoDML). These popular doubly
robust or de-biased machine learning estimators combine outcome modeling with
balancing weights - weights that achieve covariate balance directly in lieu of
estimating and inverting the propensity score. When the outcome and weighting
models are both linear in some (possibly infinite) basis, we show that the
augmented estimator is equivalent to a single linear model with coefficients
that combine the coefficients from the original outcome model and coefficients
from an unpenalized ordinary least squares (OLS) fit on the same data. We see
that, under certain choices of regularization parameters, the augmented
estimator often collapses to the OLS estimator alone; this occurs for example
in a re-analysis of the Lalonde 1986 dataset. We then extend these results to
specific choices of outcome and weighting models. We first show that the
augmented estimator that uses (kernel) ridge regression for both outcome and
weighting models is equivalent to a single, undersmoothed (kernel) ridge
regression. This holds numerically in finite samples and lays the groundwork
for a novel analysis of undersmoothing and asymptotic rates of convergence.
When the weighting model is instead lasso-penalized regression, we give
closed-form expressions for special cases and demonstrate a “double
selection” property. Our framework opens the black box on this increasingly
popular class of estimators, bridges the gap between existing results on the
semiparametric efficiency of undersmoothed and doubly robust estimators, and
provides new insights into the performance of augmented balancing weights.
arXiv link: http://arxiv.org/abs/2304.14545v3
Assessing Text Mining and Technical Analyses on Forecasting Financial Time Series
economics that anticipates market movements in financial markets. This paper
investigates the accuracy of text mining and technical analyses in forecasting
financial time series. It focuses on the S&P500 stock market index during the
pandemic, which tracks the performance of the largest publicly traded companies
in the US. The study compares two methods of forecasting the future price of
the S&P500: text mining, which uses NLP techniques to extract meaningful
insights from financial news, and technical analysis, which uses historical
price and volume data to make predictions. The study examines the advantages
and limitations of both methods and analyze their performance in predicting the
S&P500. The FinBERT model outperforms other models in terms of S&P500 price
prediction, as evidenced by its lower RMSE value, and has the potential to
revolutionize financial analysis and prediction using financial news data.
Keywords: ARIMA, BERT, FinBERT, Forecasting Financial Time Series, GARCH, LSTM,
Technical Analysis, Text Mining JEL classifications: G4, C8
arXiv link: http://arxiv.org/abs/2304.14544v1
Convexity Not Required: Estimation of Smooth Moment Condition Models
structural Economic models. Yet, it is commonly reported that optimization is
challenging because the corresponding objective function is non-convex. For
smooth problems, this paper shows that convexity is not required: under
conditions involving the Jacobian of the moments, certain algorithms are
globally convergent. These include a gradient-descent and a Gauss-Newton
algorithm with appropriate choice of tuning parameters. The results are robust
to 1) non-convexity, 2) one-to-one moderately non-linear reparameterizations,
and 3) moderate misspecification. The conditions preclude non-global optima.
Numerical and empirical examples illustrate the condition, non-convexity, and
convergence properties of different optimizers.
arXiv link: http://arxiv.org/abs/2304.14386v2
A universal model for the Lorenz curve with novel applications for datasets containing zeros and/or exhibiting extreme inequality
not fit all possible size distributions, a universal parametric functional form
is introduced. By using the empirical data from different scientific
disciplines and also the hypothetical data, this study shows that, the proposed
model fits not only the data whose actual Lorenz plots have a typical convex
segment but also the data whose actual Lorenz plots have both horizontal and
convex segments practically well. It also perfectly fits the data whose
observation is larger in size while the rest of observations are smaller and
equal in size as characterized by 2 positive-slope linear segments. In
addition, the proposed model has a closed-form expression for the Gini index,
making it computationally convenient to calculate. Considering that the Lorenz
curve and the Gini index are widely used in various disciplines of sciences,
the proposed model and the closed-form expression for the Gini index could be
used as alternative tools to analyze size distributions of non-negative
quantities and examine their inequalities or unevennesses.
arXiv link: http://arxiv.org/abs/2304.13934v1
Difference-in-Differences with Compositional Changes
cross-sectional data and potential compositional changes across time periods.
We begin our analysis by deriving the efficient influence function and the
semiparametric efficiency bound for the average treatment effect on the treated
(ATT). We introduce nonparametric estimators that attain the semiparametric
efficiency bound under mild rate conditions on the estimators of the nuisance
functions, exhibiting a type of rate doubly robust (DR) property. Additionally,
we document a trade-off related to compositional changes: We derive the
asymptotic bias of DR DiD estimators that erroneously exclude compositional
changes and the efficiency loss when one fails to correctly rule out
compositional changes. We propose a nonparametric Hausman-type test for
compositional changes based on these trade-offs. The finite sample performance
of the proposed DiD tools is evaluated through Monte Carlo experiments and an
empirical application. We consider extensions of our framework that accommodate
double machine learning procedures with cross-fitting, and setups when some
units are observed in both pre- and post-treatment periods. As a by-product of
our analysis, we present a new uniform stochastic expansion of the local
polynomial multinomial logit estimator, which may be of independent interest.
arXiv link: http://arxiv.org/abs/2304.13925v2
Estimation of Characteristics-based Quantile Factor Models
models where the factor loadings are unknown functions of observed individual
characteristics while the idiosyncratic error terms are subject to conditional
quantile restrictions. We propose a three-stage estimation procedure that is
easily implementable in practice and has nice properties. The convergence
rates, the limiting distributions of the estimated factors and loading
functions, and a consistent selection criterion for the number of factors at
each quantile are derived under general conditions. The proposed estimation
methodology is shown to work satisfactorily when: (i) the idiosyncratic errors
have heavy tails, (ii) the time dimension of the panel dataset is not large,
and (iii) the number of factors exceeds the number of characteristics. Finite
sample simulations and an empirical application aimed at estimating the loading
functions of the daily returns of a large panel of S&P500 index securities
help illustrate these properties.
arXiv link: http://arxiv.org/abs/2304.13206v1
Common Correlated Effects Estimation of Nonlinear Panel Data Models
of observed regressors in nonlinear panel data models with interactive fixed
effects, using the common correlated effects (CCE) framework. The proposed
two-step estimation method involves applying principal component analysis to
estimate latent factors based on cross-sectional averages of the regressors in
the first step, and jointly estimating the coefficients of the regressors and
factor loadings in the second step. The asymptotic distributions of the
proposed estimators are derived under general conditions, assuming that the
number of time-series observations is comparable to the number of
cross-sectional observations. To correct for asymptotic biases of the
estimators, we introduce both analytical and split-panel jackknife methods, and
confirm their good performance in finite samples using Monte Carlo simulations.
An empirical application utilizes the proposed method to study the arbitrage
behaviour of nonfinancial firms across different security markets.
arXiv link: http://arxiv.org/abs/2304.13199v1
Enhanced multilayer perceptron with feature selection and grid search for travel mode choice prediction
for developing multi-mode urban transportation systems, conducting
transportation planning and formulating traffic demand management strategies.
Traditional discrete choice models have dominated the modelling methods for
decades yet suffer from strict model assumptions and low prediction accuracy.
In recent years, machine learning (ML) models, such as neural networks and
boosting models, are widely used by researchers for travel mode choice
prediction and have yielded promising results. However, despite the superior
prediction performance, a large body of ML methods, especially the branch of
neural network models, is also limited by overfitting and tedious model
structure determination process. To bridge this gap, this study proposes an
enhanced multilayer perceptron (MLP; a neural network) with two hidden layers
for travel mode choice prediction; this MLP is enhanced by XGBoost (a boosting
method) for feature selection and a grid search method for optimal hidden
neurone determination of each hidden layer. The proposed method was trained and
tested on a real resident travel diary dataset collected in Chengdu, China.
arXiv link: http://arxiv.org/abs/2304.12698v2
The Ordinary Least Eigenvalues Estimator
network data with interacted (unobservable) individual effects. The estimator
achieves a faster rate of convergence $N$ compared to the standard estimators'
$N$ rate and is efficient in cases that we discuss. We observe that the
individual effects alter the eigenvalue distribution of the data's matrix
representation in significant and distinctive ways. We subsequently offer a
correction for the ordinary least squares' objective function to
attenuate the statistical noise that arises due to the individual effects, and
in some cases, completely eliminate it. The new estimator is asymptotically
normal and we provide a valid estimator for its asymptotic covariance matrix.
While this paper only considers models accounting for first-order interactions
between individual effects, our estimation procedure is naturally extendable to
higher-order interactions and more general specifications of the error terms.
arXiv link: http://arxiv.org/abs/2304.12554v1
Determination of the effective cointegration rank in high-dimensional time-series predictive regressions
rank in high-dimensional unit-root (HDUR) time series from a prediction
perspective using reduced-rank regression. For a HDUR process $x_t\in
R^N$ and a stationary series $y_t\in R^p$ of
interest, our goal is to predict future values of $y_t$ using
$x_t$ and lagged values of $y_t$. The proposed framework
consists of a two-step estimation procedure. First, the Principal Component
Analysis is used to identify all cointegrating vectors of $x_t$.
Second, the co-integrated stationary series are used as regressors, together
with some lagged variables of $y_t$, to predict $y_t$. The
estimated reduced rank is then defined as the effective cointegration rank of
$x_t$. Under the scenario that the autoregressive coefficient matrices
are sparse (or of low-rank), we apply the Least Absolute Shrinkage and
Selection Operator (or the reduced-rank techniques) to estimate the
autoregressive coefficients when the dimension involved is high. Theoretical
properties of the estimators are established under the assumptions that the
dimensions $p$ and $N$ and the sample size $T \to \infty$. Both simulated and
real examples are used to illustrate the proposed framework, and the empirical
application suggests that the proposed procedure fares well in predicting stock
returns.
arXiv link: http://arxiv.org/abs/2304.12134v2
Policy Learning under Biased Sample Selection
treatment assignment policy that can be deployed on a target population. A
recurring concern in doing so is that, even if the randomized trial was
well-executed (i.e., internal validity holds), the study participants may not
represent a random sample of the target population (i.e., external validity
fails)--and this may lead to policies that perform suboptimally on the target
population. We consider a model where observable attributes can impact sample
selection probabilities arbitrarily but the effect of unobservable attributes
is bounded by a constant, and we aim to learn policies with the best possible
performance guarantees that hold under any sampling bias of this type. In
particular, we derive the partial identification result for the worst-case
welfare in the presence of sampling bias and show that the optimal max-min,
max-min gain, and minimax regret policies depend on both the conditional
average treatment effect (CATE) and the conditional value-at-risk (CVaR) of
potential outcomes given covariates. To avoid finite-sample inefficiencies of
plug-in estimates, we further provide an end-to-end procedure for learning the
optimal max-min and max-min gain policies that does not require the separate
estimation of nuisance parameters.
arXiv link: http://arxiv.org/abs/2304.11735v1
The Impact of Industrial Zone:Evidence from China's National High-tech Zone Policy
in China from 2000 to 2020, this study regards the policy of establishing the
national high-tech zones as a quasi-natural experiment. Using this experiment,
this study firstly estimated the treatment effect of the policy and checked the
robustness of the estimation. Then the study examined the heterogeneity in
different geographic demarcation of China and in different city level of China.
After that, this study explored the possible influence mechanism of the policy.
It shows that the possible mechanism of the policy is financial support,
industrial agglomeration of secondary industry and the spillovers. In the end,
this study examined the spillovers deeply and showed the distribution of
spillover effect.
arXiv link: http://arxiv.org/abs/2304.09775v1
A hybrid model for day-ahead electricity price forecasting: Combining fundamental and stochastic modelling
effective trading strategies, power plant scheduling, profit maximisation and
efficient system operation. However, uncertainties in supply and demand make
such predictions challenging. We propose a hybrid model that combines a
techno-economic energy system model with stochastic models to address this
challenge. The techno-economic model in our hybrid approach provides a deep
understanding of the market. It captures the underlying factors and their
impacts on electricity prices, which is impossible with statistical models
alone. The statistical models incorporate non-techno-economic aspects, such as
the expectations and speculative behaviour of market participants, through the
interpretation of prices. The hybrid model generates both conventional point
predictions and probabilistic forecasts, providing a comprehensive
understanding of the market landscape. Probabilistic forecasts are particularly
valuable because they account for market uncertainty, facilitating informed
decision-making and risk management. Our model delivers state-of-the-art
results, helping market participants to make informed decisions and operate
their systems more efficiently.
arXiv link: http://arxiv.org/abs/2304.09336v1
Club coefficients in the UEFA Champions League: Time for shift to an Elo-based formula
will see a fundamental reform from the 2024/25 season: the traditional group
stage will be replaced by one league where each of the 36 teams plays eight
matches. To guarantee that the opponents of the clubs are of the same strength
in the new design, it is crucial to forecast the performance of the teams
before the tournament as well as possible. This paper investigates whether the
currently used rating of the teams, the UEFA club coefficient, can be improved
by taking the games played in the national leagues into account. According to
our logistic regression models, a variant of the Elo method provides a higher
accuracy in terms of explanatory power in the Champions League matches. The
Union of European Football Associations (UEFA) is encouraged to follow the
example of the FIFA World Ranking and reform the calculation of the club
coefficients in order to avoid unbalanced schedules in the novel tournament
format of the Champions League.
arXiv link: http://arxiv.org/abs/2304.09078v6
Doubly Robust Estimators with Weak Overlap
treatment effect estimands that is also robust against weak covariate overlap.
Our proposed estimator relies on trimming observations with extreme propensity
scores and uses a bias correction device for trimming bias. Our framework
accommodates many research designs, such as unconfoundedness, local treatment
effects, and difference-in-differences. Simulation exercises illustrate that
our proposed tools indeed have attractive finite sample properties, which are
aligned with our theoretical asymptotic results.
arXiv link: http://arxiv.org/abs/2304.08974v2
Adjustment with Many Regressors Under Covariate-Adaptive Randomizations
causal inference under covariate-adaptive randomizations (CARs). On one hand,
RAs can improve the efficiency of causal estimators by incorporating
information from covariates that are not used in the randomization. On the
other hand, RAs can degrade estimation efficiency due to their estimation
errors, which are not asymptotically negligible when the number of regressors
is of the same order as the sample size. Ignoring the estimation errors of RAs
may result in serious over-rejection of causal inference under the null
hypothesis. To address the issue, we construct a new ATE estimator by optimally
linearly combining the estimators with and without RAs. We then develop a
unified inference theory for this estimator under CARs. It has two features:
(1) the Wald test based on it achieves the exact asymptotic size under the null
hypothesis, regardless of whether the number of covariates is fixed or diverges
no faster than the sample size; and (2) it guarantees weak efficiency
improvement over estimators both with and without RAs.
arXiv link: http://arxiv.org/abs/2304.08184v5
Coarsened Bayesian VARs -- Correcting BVARs for Incorrect Specification
influence estimates of quantities of interest such as structural parameters,
forecast distributions or responses to structural shocks, even more so if
higher-order forecasts or responses are considered, due to parameter Model
misspecification in multivariate econometric models can strongly influence
estimates of quantities of interest such as structural parameters, forecast
distributions or responses to structural shocks, even more so if higher-order
forecasts or responses are considered, due to parameter convolution. We propose
a simple method for addressing these specification issues in the context of
Bayesian VARs. Our method, called coarsened Bayesian VARs (cBVARs), replaces
the exact likelihood with a coarsened likelihood that takes into account that
the model might be misspecified along important but unknown dimensions. Since
endogenous variables in a VAR can feature different degrees of
misspecification, our model allows for this and automatically detects the
degree of misspecification. The resulting cBVARs perform well in simulations
for several types of misspecification. Applied to US data, cBVARs improve point
and density forecasts compared to standard BVARs.
arXiv link: http://arxiv.org/abs/2304.07856v3
Penalized Likelihood Inference with Survey Data
$C(\alpha)$ and Selective Inference to a survey environment. We establish the
asymptotic validity of the inference procedures in generalized linear models
with survey weights and/or heteroskedasticity. Moreover, we generalize the
methods to inference on nonlinear parameter functions e.g. the average marginal
effect in survey logit models. We illustrate the effectiveness of the approach
in simulated data and Canadian Internet Use Survey 2020 data.
arXiv link: http://arxiv.org/abs/2304.07855v1
Gini-stable Lorenz curves and their relation to the generalised Pareto distribution
can extend ordered normalised vectors by new elements based on a simple affine
transformation, while preserving the predefined level of inequality, G, as
measured by the Gini index.
Then, we derive the family of empirical Lorenz curves of the corresponding
vectors and prove that it is stochastically ordered with respect to both the
sample size and G which plays the role of the uncertainty parameter. We prove
that asymptotically, we obtain all, and only, Lorenz curves generated by a new,
intuitive parametrisation of the finite-mean Pickands' Generalised Pareto
Distribution (GPD) that unifies three other families, namely: the Pareto Type
II, exponential, and scaled beta distributions. The family is not only totally
ordered with respect to the parameter G, but also, thanks to our derivations,
has a nice underlying interpretation. Our result may thus shed a new light on
the genesis of this family of distributions.
Our model fits bibliometric, informetric, socioeconomic, and environmental
data reasonably well. It is quite user-friendly for it only depends on the
sample size and its Gini index.
arXiv link: http://arxiv.org/abs/2304.07480v3
Equivalence of inequality indices: Three dimensions of impact revisited
incomes, talents, resources, and citations, amongst many others. Its intensity
varies across different environments: from relatively evenly distributed ones,
to where a small group of stakeholders controls the majority of the available
resources. We would like to understand why inequality naturally arises as a
consequence of the natural evolution of any system. Studying simple
mathematical models governed by intuitive assumptions can bring many insights
into this problem. In particular, we recently observed (Siudem et al., PNAS
117:13896-13900, 2020) that impact distribution might be modelled accurately by
a time-dependent agent-based model involving a mixture of the rich-get-richer
and sheer chance components. Here we point out its relationship to an iterative
process that generates rank distributions of any length and a predefined level
of inequality, as measured by the Gini index.
Many indices quantifying the degree of inequality have been proposed. Which
of them is the most informative? We show that, under our model, indices such as
the Bonferroni, De Vergottini, and Hoover ones are equivalent. Given one of
them, we can recreate the value of any other measure using the derived
functional relationships. Also, thanks to the obtained formulae, we can
understand how they depend on the sample size. An empirical analysis of a large
sample of citation records in economics (RePEc) as well as countrywise family
income data, confirms our theoretical observations. Therefore, we can safely
and effectively remain faithful to the simplest measure: the Gini index.
arXiv link: http://arxiv.org/abs/2304.07479v1
Generalized Automatic Least Squares: Efficiency Gains from Misspecified Heteroscedasticity Models
squares estimator is not efficient. I propose a generalized automatic least
squares estimator (GALS) that makes partial correction of heteroscedasticity
based on a (potentially) misspecified model without a pretest. Such an
estimator is guaranteed to be at least as efficient as either OLS or WLS but
can provide some asymptotic efficiency gains over OLS if the misspecified model
is approximately correct. If the heteroscedasticity model is correct, the
proposed estimator achieves full asymptotic efficiency. The idea is to frame
moment conditions corresponding to OLS and WLS squares based on miss-specified
heteroscedasticity as a joint generalized method of moments estimation problem.
The resulting optimal GMM estimator is equivalent to a feasible GLS with
estimated weight matrix. I also propose an optimal GMM variance-covariance
estimator for GALS to account for any remaining heteroscedasticity in the
residuals.
arXiv link: http://arxiv.org/abs/2304.07331v1
Detection and Estimation of Structural Breaks in High-Dimensional Functional Time Series
mean functions of high-dimensional functional time series which are allowed to
be cross-sectionally correlated and temporally dependent. A new test statistic
combining the functional CUSUM statistic and power enhancement component is
proposed with asymptotic null distribution theory comparable to the
conventional CUSUM theory derived for a single functional time series. In
particular, the extra power enhancement component enlarges the region where the
proposed test has power, and results in stable power performance when breaks
are sparse in the alternative hypothesis. Furthermore, we impose a latent group
structure on the subjects with heterogeneous break points and introduce an
easy-to-implement clustering algorithm with an information criterion to
consistently estimate the unknown group number and membership. The estimated
group structure can subsequently improve the convergence property of the
post-clustering break point estimate. Monte-Carlo simulation studies and
empirical applications show that the proposed estimation and testing techniques
have satisfactory performance in finite samples.
arXiv link: http://arxiv.org/abs/2304.07003v1
Predictive Incrementality by Experimentation (PIE) for Ad Measurement
use exogenous variation in advertising exposure (RCTs) for a subset of ad
campaigns to build a model that can predict the causal effect of ad campaigns
that were run without RCTs. This approach -- Predictive Incrementality by
Experimentation (PIE) -- frames the task of estimating the causal effect of an
ad campaign as a prediction problem, with the unit of observation being an RCT
itself. In contrast, traditional causal inference approaches with observational
data seek to adjust covariate imbalance at the user level. A key insight is to
use post-campaign features, such as last-click conversion counts, that do not
require an RCT, as features in our predictive model. We find that our PIE model
recovers RCT-derived incremental conversions per dollar (ICPD) much better than
the program evaluation approaches analyzed in Gordon et al. (forthcoming). The
prediction errors from the best PIE model are 48%, 42%, and 62% of the
RCT-based average ICPD for upper-, mid-, and lower-funnel conversion outcomes,
respectively. In contrast, across the same data, the average prediction error
of stratified propensity score matching exceeds 491%, and that of
double/debiased machine learning exceeds 2,904%. Using a decision-making
framework inspired by industry, we show that PIE leads to different decisions
compared to RCTs for only 6% of upper-funnel, 7% of mid-funnel, and 13% of
lower-funnel outcomes. We conclude that PIE could enable advertising platforms
to scale causal ad measurement by extrapolating from a limited number of RCTs
to a large set of non-experimental ad campaigns.
arXiv link: http://arxiv.org/abs/2304.06828v1
GDP nowcasting with artificial neural networks: How much does long-term memory matter?
for the U.S. economy. Using the monthly FRED-MD database, we compare the
nowcasting performance of five different ANN architectures: the multilayer
perceptron (MLP), the one-dimensional convolutional neural network (1D CNN),
the Elman recurrent neural network (RNN), the long short-term memory network
(LSTM), and the gated recurrent unit (GRU). The empirical analysis presents
results from two distinctively different evaluation periods. The first (2012:Q1
-- 2019:Q4) is characterized by balanced economic growth, while the second
(2012:Q1 -- 2024:Q2) also includes periods of the COVID-19 recession. During
the first evaluation period, longer input sequences slightly improve nowcasting
performance for some ANNs, but the best accuracy is still achieved with
8-month-long input sequences at the end of the nowcasting window. Results from
the second test period depict the role of long-term memory even more clearly.
The MLP, the 1D CNN, and the Elman RNN work best with 8-month-long input
sequences at each step of the nowcasting window. The relatively weak
performance of the gated RNNs also suggests that architectural features
enabling long-term memory do not result in more accurate nowcasts for GDP
growth. The combined results indicate that the 1D CNN seems to represent a
“sweet spot” between the simple time-agnostic MLP and the more
complex (gated) RNNs. The network generates nearly as accurate nowcasts as the
best competitor for the first test period, while it achieves the overall best
accuracy during the second evaluation period. Consequently, as a first in the
literature, we propose the application of the 1D CNN for economic nowcasting.
arXiv link: http://arxiv.org/abs/2304.05805v4
Financial Time Series Forecasting using CNN and Transformer
decision-making. In particular, financial time series such as stock prices can
be hard to predict as it is difficult to model short-term and long-term
temporal dependencies between data points. Convolutional Neural Networks (CNN)
are good at capturing local patterns for modeling short-term dependencies.
However, CNNs cannot learn long-term dependencies due to the limited receptive
field. Transformers on the other hand are capable of learning global context
and long-term dependencies. In this paper, we propose to harness the power of
CNNs and Transformers to model both short-term and long-term dependencies
within a time series, and forecast if the price would go up, down or remain the
same (flat) in the future. In our experiments, we demonstrated the success of
the proposed method in comparison to commonly adopted statistical and deep
learning methods on forecasting intraday stock price change of S&P 500
constituents.
arXiv link: http://arxiv.org/abs/2304.04912v1
Adaptive Student's t-distribution with method of moments moving estimator for nonstationary time series
question of model adaptation. Classical approaches like ARMA-ARCH assume
arbitrary type of dependence. To avoid their bias, we will focus on recently
proposed agnostic philosophy of moving estimator: in time $t$ finding
parameters optimizing e.g. $F_t=\sum_{\tau<t} (1-\eta)^{t-\tau} \ln(\rho_\theta
(x_\tau))$ moving log-likelihood, evolving in time. It allows for example to
estimate parameters using inexpensive exponential moving averages (EMA), like
absolute central moments $m_p=E[|x-\mu|^p]$ evolving for one or multiple powers
$p\inR^+$ using $m_{p,t+1} = m_{p,t} + \eta (|x_t-\mu_t|^p-m_{p,t})$.
Application of such general adaptive methods of moments will be presented on
Student's t-distribution, popular especially in economical applications, here
applied to log-returns of DJIA companies. While standard ARMA-ARCH approaches
provide evolution of $\mu$ and $\sigma$, here we also get evolution of $\nu$
describing $\rho(x)\sim |x|^{-\nu-1}$ tail shape, probability of extreme events
- which might turn out catastrophic, destabilizing the market.
arXiv link: http://arxiv.org/abs/2304.03069v4
Faster estimation of dynamic discrete choice models using index invertibility
heterogeneity have desirable statistical properties but are computationally
intensive. In this paper we propose a method to quicken estimation for a broad
class of dynamic discrete choice problems by exploiting semiparametric index
restrictions. Specifically, we propose an estimator for models whose reduced
form parameters are invertible functions of one or more linear indices (Ahn,
Ichimura, Powell and Ruud 2018), a property we term index invertibility. We
establish that index invertibility implies a set of equality constraints on the
model parameters. Our proposed estimator uses the equality constraints to
decrease the dimension of the optimization problem, thereby generating
computational gains. Our main result shows that the proposed estimator is
asymptotically equivalent to the unconstrained, computationally heavy
estimator. In addition, we provide a series of results on the number of
independent index restrictions on the model parameters, providing theoretical
guidance on the extent of computational gains. Finally, we demonstrate the
advantages of our approach via Monte Carlo simulations.
arXiv link: http://arxiv.org/abs/2304.02171v4
Individual Welfare Analysis: Random Quasilinear Utility, Independence, and Confidence Bounds
builds on a parametric model for continuous demand with a quasilinear utility
function, allowing for heterogeneous coefficients and unobserved
individual-good-level preference shocks. We obtain bounds on the
individual-level consumer welfare loss at any confidence level due to a
hypothetical price increase, solving a scalable optimization problem
constrained by a novel confidence set under an independence restriction. This
confidence set is computationally simple and robust to weak instruments,
nonlinearity, and partial identification. The validity of the confidence set is
guaranteed by our new results on the joint limiting distribution of the
independence test by Chatterjee (2021). These results together with the
confidence set may have applications beyond welfare analysis. Monte Carlo
simulations and two empirical applications on gasoline and food demand
demonstrate the effectiveness of our method.
arXiv link: http://arxiv.org/abs/2304.01921v4
Torch-Choice: A PyTorch Package for Large-Scale Choice Modeling with Python
choice modeling with Python and PyTorch. $torch-choice$ provides a
$ChoiceDataset$ data structure to manage databases flexibly and
memory-efficiently. The paper demonstrates constructing a
$ChoiceDataset$ from databases of various formats and functionalities
of $ChoiceDataset$. The package implements two widely used models,
namely the multinomial logit and nested logit models, and supports
regularization during model estimation. The package incorporates the option to
take advantage of GPUs for estimation, allowing it to scale to massive datasets
while being computationally efficient. Models can be initialized using either
R-style formula strings or Python dictionaries. We conclude with a comparison
of the computational efficiencies of $torch-choice$ and
$mlogit$ in R as (1) the number of observations increases, (2) the
number of covariates increases, and (3) the expansion of item sets. Finally, we
demonstrate the scalability of $torch-choice$ on large-scale datasets.
arXiv link: http://arxiv.org/abs/2304.01906v4
Heterogeneity-robust granular instruments
empirical macro-finance. The methodology's rise showcases granularity's
potential for identification across many economic environments, like the
estimation of spillovers and demand systems. I propose a new estimator--called
robust granular instrumental variables (RGIV)--that enables studying unit-level
heterogeneity in spillovers. Unlike existing methods that assume heterogeneity
is a function of observables, RGIV leaves heterogeneity unrestricted. In
contrast to the baseline GIV estimator, RGIV allows for unknown shock variances
and equal-sized units. Applied to the Euro area, I find strong evidence of
country-level heterogeneity in sovereign yield spillovers.
arXiv link: http://arxiv.org/abs/2304.01273v3
Testing for idiosyncratic Treatment Effect Heterogeneity
treatment effect heterogeneity. Importantly, I consider the presence of
heterogeneity that is not explained by observed characteristics, or so-called
idiosyncratic heterogeneity. When examining this heterogeneity, common
statistical tests encounter a nuisance parameter problem in the average
treatment effect which renders the asymptotic distribution of the test
statistic dependent on that parameter. I propose an asymptotically valid test
that circumvents the estimation of that parameter using the empirical
characteristic function. A simulation study illustrates not only the test's
validity but its higher power in rejecting a false null as compared to current
tests. Furthermore, I show the method's usefulness through its application to a
microfinance experiment in Bosnia and Herzegovina. In this experiment and for
outcomes related to loan take-up and self-employment, the tests suggest that
treatment effect heterogeneity does not seem to be completely accounted for by
baseline characteristics. For those outcomes, researchers could potentially try
to collect more baseline characteristics to inspect the remaining treatment
effect heterogeneity, and potentially, improve treatment targeting.
arXiv link: http://arxiv.org/abs/2304.01141v1
Artificial neural networks and time series of counts: A class of nonlinear INGARCH models
integer-valued autoregressive models with conditional heteroskedasticity
(INGARCH). These models employ response functions to map a vector of past
observations and past conditional expectations to the conditional expectation
of the present observation. In this paper, it is shown how INGARCH models can
be combined with artificial neural network (ANN) response functions to obtain a
class of nonlinear INGARCH models. The ANN framework allows for the
interpretation of many existing INGARCH models as a degenerate version of a
corresponding neural model. Details on maximum likelihood estimation, marginal
effects and confidence intervals are given. The empirical analysis of time
series of bounded and unbounded counts reveals that the neural INGARCH models
are able to outperform reasonable degenerate competitor models in terms of the
information loss.
arXiv link: http://arxiv.org/abs/2304.01025v1
Testing and Identifying Substitution and Complementarity Patterns
complementarity patterns between two goods using a panel multinomial choice
model with bundles. The model allows the two goods to be either substitutes or
complements and admits heterogeneous complementarity through observed
characteristics. I first provide testable implications for the complementarity
relationship between goods. I then characterize the sharp identified set for
the model parameters and provide sufficient conditions for point
identification. The identification analysis accommodates endogenous covariates
through flexible dependence structures between observed characteristics and
fixed effects while placing no distributional assumptions on unobserved
preference shocks. My method is shown to perform more robustly than the
parametric method through Monte Carlo simulations. As an extension, I allow for
unobserved heterogeneity in the complementarity, investigate scenarios
involving more than two goods, and study a class of nonseparable utility
functions.
arXiv link: http://arxiv.org/abs/2304.00678v1
IV Regressions without Exclusion Restrictions
regression models without excluded instrumental variables, based on the
standard mean independence condition and a nonlinear relevance condition. Based
on the identification results, we propose two semiparametric estimators as well
as a discretization-based estimator that does not require any nonparametric
regressions. We establish their asymptotic normality and demonstrate via
simulations their robust finite-sample performances with respect to exclusion
restrictions violations and endogeneity. Our approach is applied to study the
returns to education, and to test the direct effects of college proximity
indicators as well as family background variables on the outcome.
arXiv link: http://arxiv.org/abs/2304.00626v3
Hypothesis testing on invariant subspaces of non-diagonalizable matrices with applications to network statistics
matrices of Tyler (1981) to that of invariant and singular subspaces of
non-diagonalizable matrices. Wald tests for invariant vectors and $t$-tests for
their individual coefficients perform well in simulations, despite the matrix
being not symmetric. Using these results, it is now possible to perform
inference on network statistics that depend on eigenvectors of non-symmetric
adjacency matrices as they arise in empirical applications from directed
networks. Further, we find that statisticians only need control over the
first-order Davis-Kahan bound to control convergence rates of invariant
subspace estimators to higher-orders. For general invariant subspaces, the
minimal eigenvalue separation dominates the first-order bound potentially
slowing convergence rates considerably. In an example, we find that accounting
for uncertainty in network estimates changes empirical conclusions about the
ranking of nodes' popularity.
arXiv link: http://arxiv.org/abs/2303.18233v5
Under-Identification of Structural Models Based on Timing and Information Set Assumptions
structural models, which have been used in the context of production functions,
demand equations, and hedonic pricing models (e.g. Olley and Pakes (1996),
Blundell and Bond (2000)). First, we demonstrate a general under-identification
problem using these assumptions in a simple version of the Blundell-Bond
dynamic panel model. In particular, the basic moment conditions can yield
multiple discrete solutions: one at the persistence parameter in the main
equation and another at the persistence parameter governing the regressor. We
then show that the problem can persist in a broader set of models but
disappears in models under stronger timing assumptions. We then propose
possible solutions in the simple setting by enforcing an assumed sign
restriction and conclude by using lessons from our basic identification
approach to propose more general practical advice for empirical researchers.
arXiv link: http://arxiv.org/abs/2303.15170v1
Sensitivity Analysis in Unconditional Quantile Effects
policies on the unconditional quantiles of an outcome variable. For a given
counterfactual policy, we obtain identified sets for the effect of both
marginal and global changes in the proportion of treated individuals. To
conduct a sensitivity analysis, we introduce the quantile breakdown frontier, a
curve that (i) indicates whether a sensitivity analysis is possible or not, and
(ii) when a sensitivity analysis is possible, quantifies the amount of
selection bias consistent with a given conclusion of interest across different
quantiles. To illustrate our method, we perform a sensitivity analysis on the
effect of unionizing low income workers on the quantiles of the distribution of
(log) wages.
arXiv link: http://arxiv.org/abs/2303.14298v3
Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions
interventions. Our goal is to learn unit-specific potential outcomes for any
combination of these $p$ interventions, i.e., $N \times 2^p$ causal parameters.
Choosing a combination of interventions is a problem that naturally arises in a
variety of applications such as factorial design experiments, recommendation
engines, combination therapies in medicine, conjoint analysis, etc. Running $N
\times 2^p$ experiments to estimate the various parameters is likely expensive
and/or infeasible as $N$ and $p$ grow. Further, with observational data there
is likely confounding, i.e., whether or not a unit is seen under a combination
is correlated with its potential outcome under that combination. To address
these challenges, we propose a novel latent factor model that imposes structure
across units (i.e., the matrix of potential outcomes is approximately rank
$r$), and combinations of interventions (i.e., the coefficients in the Fourier
expansion of the potential outcomes is approximately $s$ sparse). We establish
identification for all $N \times 2^p$ parameters despite unobserved
confounding. We propose an estimation procedure, Synthetic Combinations, and
establish it is finite-sample consistent and asymptotically normal under
precise conditions on the observation pattern. Our results imply consistent
estimation given $poly(r) \times \left( N + s^2p\right)$ observations,
while previous methods have sample complexity scaling as $\min(N \times s^2p, \
\ poly(r) \times (N + 2^p))$. We use Synthetic Combinations to propose a
data-efficient experimental design. Empirically, Synthetic Combinations
outperforms competing approaches on a real-world dataset on movie
recommendations. Lastly, we extend our analysis to do causal inference where
the intervention is a permutation over $p$ items (e.g., rankings).
arXiv link: http://arxiv.org/abs/2303.14226v2
On the failure of the bootstrap for Chatterjee's rank correlation
of us have realized that the standard bootstrap, in general, does not work for
Chatterjee's rank correlation. In this paper, we provide proof of this issue
under an additional independence assumption, and complement our theory with
simulation evidence for general settings. Chatterjee's rank correlation thus
falls into a category of statistics that are asymptotically normal but
bootstrap inconsistent. Valid inferential methods in this case are Chatterjee's
original proposal (for testing independence) and Lin and Han (2022)'s analytic
asymptotic variance estimator (for more general purposes).
arXiv link: http://arxiv.org/abs/2303.14088v2
Point Identification of LATE with Two Imperfect Instruments
treatment effect (LATE) using two imperfect instruments. The classical approach
(Imbens and Angrist (1994)) establishes the identification of LATE via an
instrument that satisfies exclusion, monotonicity, and independence. However,
it may be challenging to find a single instrument that satisfies all these
assumptions simultaneously. My paper uses two instruments but imposes weaker
assumptions on both instruments. The first instrument is allowed to violate the
exclusion restriction and the second instrument does not need to satisfy
monotonicity. Therefore, the first instrument can affect the outcome via both
direct effects and a shift in the treatment status. The direct effects can be
identified via exogenous variation in the second instrument and therefore the
local average treatment effect is identified. An estimator is proposed, and
using Monte Carlo simulations, it is shown to perform more robustly than the
instrumental variable estimand.
arXiv link: http://arxiv.org/abs/2303.13795v1
Bootstrap-Assisted Inference for Generalized Grenander-type Estimators
distributional properties of generalized Grenander-type estimators, a versatile
class of nonparametric estimators of monotone functions. The limiting
distribution of those estimators is representable as the left derivative of the
greatest convex minorant of a Gaussian process whose monomial mean can be of
unknown order (when the degree of flatness of the function of interest is
unknown). The standard nonparametric bootstrap is unable to consistently
approximate the large sample distribution of the generalized Grenander-type
estimators even if the monomial order of the mean is known, making statistical
inference a challenging endeavour in applications. To address this inferential
problem, we present a bootstrap-assisted inference procedure for generalized
Grenander-type estimators. The procedure relies on a carefully crafted, yet
automatic, transformation of the estimator. Moreover, our proposed method can
be made “flatness robust” in the sense that it can be made adaptive to the
(possibly unknown) degree of flatness of the function of interest. The method
requires only the consistent estimation of a single scalar quantity, for which
we propose an automatic procedure based on numerical derivative estimation and
the generalized jackknife. Under random sampling, our inference method can be
implemented using a computationally attractive exchangeable bootstrap
procedure. We illustrate our methods with examples and we also provide a small
simulation study. The development of formal results is made possible by some
technical results that may be of independent interest.
arXiv link: http://arxiv.org/abs/2303.13598v3
Sequential Cauchy Combination Test for Multiple Testing Problems with Financial Applications
individual signals in scenarios involving many tests, dependent test
statistics, and potentially sparse signals. The tool applies the Cauchy
combination test recursively on a sequence of expanding subsets of $p$-values
and is referred to as the sequential Cauchy combination test. While the
original Cauchy combination test aims to make a global statement about a set of
null hypotheses by summing transformed $p$-values, our sequential version
determines which $p$-values trigger the rejection of the global null. The
sequential test achieves strong familywise error rate control, exhibits less
conservatism compared to existing controlling procedures when dealing with
dependent test statistics, and provides a power boost. As illustrations, we
revisit two well-known large-scale multiple testing problems in finance for
which the test statistics have either serial dependence or cross-sectional
dependence, namely monitoring drift bursts in asset prices and searching for
assets with a nonzero alpha. In both applications, the sequential Cauchy
combination test proves to be a preferable alternative. It overcomes many of
the drawbacks inherent to inequality-based controlling procedures, extreme
value approaches, resampling and screening methods, and it improves the power
in simulations, leading to distinct empirical outcomes.
arXiv link: http://arxiv.org/abs/2303.13406v2
Uncertain Short-Run Restrictions and Statistically Identified Structural Vector Autoregressions
with potentially invalid short-run zero restrictions. The estimator shrinks
towards imposed restrictions and stops shrinkage when the data provide evidence
against a restriction. Simulation results demonstrate how incorporating valid
restrictions through the shrinkage approach enhances the accuracy of the
statistically identified estimator and how the impact of invalid restrictions
decreases with the sample size. The estimator is applied to analyze the
interaction between the stock and oil market. The results indicate that
incorporating stock market data into the analysis is crucial, as it enables the
identification of information shocks, which are shown to be important drivers
of the oil price.
arXiv link: http://arxiv.org/abs/2303.13281v2
Functional-Coefficient Quantile Regression for Panel Data with Latent Group Structure
quantile regression with individual effects, allowing the cross-sectional and
temporal dependence for large panel observations. A latent group structure is
imposed on the heterogenous quantile regression models so that the number of
nonparametric functional coefficients to be estimated can be reduced
considerably. With the preliminary local linear quantile estimates of the
subject-specific functional coefficients, a classic agglomerative clustering
algorithm is used to estimate the unknown group structure and an
easy-to-implement ratio criterion is proposed to determine the group number.
The estimated group number and structure are shown to be consistent.
Furthermore, a post-grouping local linear smoothing method is introduced to
estimate the group-specific functional coefficients, and the relevant
asymptotic normal distribution theory is derived with a normalisation rate
comparable to that in the literature. The developed methodologies and theory
are verified through a simulation study and showcased with an application to
house price data from UK local authority districts, which reveals different
homogeneity structures at different quantile levels.
arXiv link: http://arxiv.org/abs/2303.13218v1
sparseDFM: An R Package to Estimate Dynamic Factor Models with Sparse Loadings
methods for dynamic factor models (DFMs) including the novel Sparse DFM
approach of Mosley et al. (2023). The Sparse DFM ameliorates interpretability
issues of factor structure in classic DFMs by constraining the loading matrices
to have few non-zero entries (i.e. are sparse). Mosley et al. (2023) construct
an efficient expectation maximisation (EM) algorithm to enable estimation of
model parameters using a regularised quasi-maximum likelihood. We provide
detail on the estimation strategy in this paper and show how we implement this
in a computationally efficient way. We then provide two real-data case studies
to act as tutorials on how one may use the sparseDFM package. The first case
study focuses on summarising the structure of a small subset of quarterly CPI
(consumer price inflation) index data for the UK, while the second applies the
package onto a large-scale set of monthly time series for the purpose of
nowcasting nine of the main trade commodities the UK exports worldwide.
arXiv link: http://arxiv.org/abs/2303.14125v1
Forecasting Large Realized Covariance Matrices: The Benefits of Factor Models and Shrinkage
applying it to the constituents of the S&P 500 daily. To address the curse of
dimensionality, we decompose the return covariance matrix using standard
firm-level factors (e.g., size, value, and profitability) and use sectoral
restrictions in the residual covariance matrix. This restricted model is then
estimated using vector heterogeneous autoregressive (VHAR) models with the
least absolute shrinkage and selection operator (LASSO). Our methodology
improves forecasting precision relative to standard benchmarks and leads to
better estimates of minimum variance portfolios.
arXiv link: http://arxiv.org/abs/2303.16151v1
Don't (fully) exclude me, it's not necessary! Causal inference with semi-IVs
to instrumental variables (IVs) to identify the causal effect of a binary (or
discrete) endogenous treatment. A semi-IV is a less restrictive form of
instrument: it affects the selection into treatment but is excluded only from
one, not necessarily both, potential outcomes. Having two continuously
distributed semi-IVs, one excluded from the potential outcome under treatment
and the other from the potential outcome under control, is sufficient to
nonparametrically point identify marginal treatment effect (MTE) and local
average treatment effect (LATE) parameters. In practice, semi-IVs provide a
solution to the challenge of finding valid IVs because they are often easier to
find: many selection-specific shocks, policies, prices, costs, or benefits are
valid semi-IVs. As an application, I estimate the returns to working in the
manufacturing sector on earnings using sector-specific characteristics as
semi-IVs.
arXiv link: http://arxiv.org/abs/2303.12667v5
Quasi Maximum Likelihood Estimation of High-Dimensional Factor Models: A Critical Review
high-dimensional panels of time series. We consider two cases: (1) estimation
when no dynamic model for the factors is specified (Bai and Li, 2012, 2016);
(2) estimation based on the Kalman smoother and the Expectation Maximization
algorithm thus allowing to model explicitly the factor dynamics (Doz et al.,
2012, Barigozzi and Luciani, 2019). Our interest is in approximate factor
models, i.e., when we allow for the idiosyncratic components to be mildly
cross-sectionally, as well as serially, correlated. Although such setting
apparently makes estimation harder, we show, in fact, that factor models do not
suffer of the {\it curse of dimensionality} problem, but instead they enjoy a
{\it blessing of dimensionality} property. In particular, given an approximate
factor structure, if the cross-sectional dimension of the data, $N$, grows to
infinity, we show that: (i) identification of the model is still possible, (ii)
the mis-specification error due to the use of an exact factor model
log-likelihood vanishes. Moreover, if we let also the sample size, $T$, grow to
infinity, we can also consistently estimate all parameters of the model and
make inference. The same is true for estimation of the latent factors which can
be carried out by weighted least-squares, linear projection, or Kalman
filtering/smoothing. We also compare the approaches presented with: Principal
Component analysis and the classical, fixed $N$, exact Maximum Likelihood
approach. We conclude with a discussion on efficiency of the considered
estimators.
arXiv link: http://arxiv.org/abs/2303.11777v5
Using Forests in Multivariate Regression Discontinuity Designs
regression discontinuity (RD) designs with multiple scores. In addition to
local linear regressions and the minimax-optimal estimator more recently
proposed by Imbens and Wager (2019), we argue that two variants of random
forests, honest regression forests and local linear forests, should be added to
the toolkit of applied researchers working with multivariate RD designs; their
validity follows from results in Wager and Athey (2018) and Friedberg et al.
(2020). We design a systematic Monte Carlo study with data generating processes
built both from functional forms that we specify and from Wasserstein
Generative Adversarial Networks that closely mimic the observed data. We find
no single estimator dominates across all specifications: (i) local linear
regressions perform well in univariate settings, but the common practice of
reducing multivariate scores to a univariate one can incur under-coverage,
possibly due to vanishing density at the transformed cutoff; (ii) good
performance of the minimax-optimal estimator depends on accurate estimation of
a nuisance parameter and its current implementation only accepts up to two
scores; (iii) forest-based estimators are not designed for estimation at
boundary points and are susceptible to finite-sample bias, but their
flexibility in modeling multivariate scores opens the door to a wide range of
empirical applications, as illustrated by an empirical study of COVID-19
hospital funding with three eligibility criteria.
arXiv link: http://arxiv.org/abs/2303.11721v3
On the Existence and Information of Orthogonal Moments
machine learning first steps, but their existence has not been investigated for
general parameters. In this paper, we provide a necessary and sufficient
condition, referred to as Restricted Local Non-surjectivity (RLN), for the
existence of such orthogonal moments to conduct robust inference on general
parameters of interest in regular semiparametric models. Importantly, RLN does
not require either identification of the parameters of interest or the nuisance
parameters. However, for orthogonal moments to be informative, the efficient
Fisher Information matrix for the parameter must be non-zero (though possibly
singular). Thus, orthogonal moments exist and are informative under more
general conditions than previously recognized. We demonstrate the utility of
our general results by characterizing orthogonal moments in a class of models
with Unobserved Heterogeneity (UH). For this class of models our method
delivers functional differencing as a special case. Orthogonality for general
smooth functionals of the distribution of UH is also characterized. As a second
major application, we investigate the existence of orthogonal moments and their
relevance for models defined by moment restrictions with possibly different
conditioning variables. We find orthogonal moments for the fully saturated two
stage least squares, for heterogeneous parameters in treatment effects, for
sample selection models, and for popular models of demand for differentiated
products. We apply our results to the Oregon Health Experiment to study
heterogeneous treatment effects of Medicaid on different health outcomes.
arXiv link: http://arxiv.org/abs/2303.11418v2
How Much Should We Trust Instrumental Variable Estimates in Political Science? Practical Advice Based on Over 60 Replicated Studies
establish causal relationships. However, the identifying assumptions required
by an IV design are demanding, and it remains challenging for researchers to
assess their validity. In this paper, we replicate 67 papers published in three
top journals in political science during 2010-2022 and identify several
troubling patterns. First, researchers often overestimate the strength of their
IVs due to non-i.i.d. errors, such as a clustering structure. Second, the most
commonly used t-test for the two-stage-least-squares (2SLS) estimates often
severely underestimates uncertainty. Using more robust inferential methods, we
find that around 19-30% of the 2SLS estimates in our sample are underpowered.
Third, in the majority of the replicated studies, the 2SLS estimates are much
larger than the ordinary-least-squares estimates, and their ratio is negatively
correlated with the strength of the IVs in studies where the IVs are not
experimentally generated, suggesting potential violations of unconfoundedness
or the exclusion restriction. To help researchers avoid these pitfalls, we
provide a checklist for better practice.
arXiv link: http://arxiv.org/abs/2303.11399v3
Network log-ARCH models for forecasting stock market volatility
heteroscedasticity (ARCH) model based on spatiotemporal ARCH models to forecast
volatility in the US stock market. To improve the forecasting accuracy, the
model integrates temporally lagged volatility information and information from
adjacent nodes, which may instantaneously spill across the entire network. The
model is also suitable for high-dimensional cases where multivariate ARCH
models are typically no longer applicable. We adopt the theoretical foundations
from spatiotemporal statistics and transfer the dynamic ARCH model for
processes to networks. This new approach is compared with independent
univariate log-ARCH models. We could quantify the improvements due to the
instantaneous network ARCH effects, which are studied for the first time in
this paper. The edges are determined based on various distance and correlation
measures between the time series. The performances of the alternative networks'
definitions are compared in terms of out-of-sample accuracy. Furthermore, we
consider ensemble forecasts based on different network definitions.
arXiv link: http://arxiv.org/abs/2303.11064v1
Standard errors when a regressor is randomly assigned
regressor of interest are assigned randomly and independently of other
regressors. We find that the OLS variance formula in this case is often
simplified, sometimes substantially. In particular, when the regressor of
interest is independent not only of other regressors but also of the error
term, the textbook homoskedastic variance formula is valid even if the error
term and auxiliary regressors exhibit a general dependence structure. In the
context of randomized controlled trials, this conclusion holds in completely
randomized experiments with constant treatment effects. When the error term is
heteroscedastic with respect to the regressor of interest, the variance formula
has to be adjusted not only for heteroscedasticity but also for correlation
structure of the error term. However, even in the latter case, some
simplifications are possible as only a part of the correlation structure of the
error term should be taken into account. In the context of randomized control
trials, this implies that the textbook homoscedastic variance formula is
typically not valid if treatment effects are heterogenous but
heteroscedasticity-robust variance formulas are valid if treatment effects are
independent across units, even if the error term exhibits a general dependence
structure. In addition, we extend the results to the case when the regressor of
interest is assigned randomly at a group level, such as in randomized control
trials with treatment assignment determined at a group (e.g., school/village)
level.
arXiv link: http://arxiv.org/abs/2303.10306v1
Estimation of Grouped Time-Varying Network Vector Autoregression Models
model framework for large-scale time series. A latent group structure is
imposed on the heterogeneous and node-specific time-varying momentum and
network spillover effects so that the number of unknown time-varying
coefficients to be estimated can be reduced considerably. A classic
agglomerative clustering algorithm with nonparametrically estimated distance
matrix is combined with a ratio criterion to consistently estimate the latent
group number and membership. A post-grouping local linear smoothing method is
proposed to estimate the group-specific time-varying momentum and network
effects, substantially improving the convergence rates of the preliminary
estimates which ignore the latent structure. We further modify the methodology
and theory to allow for structural breaks in either the group membership, group
number or group-specific coefficient functions. Numerical studies including
Monte-Carlo simulation and an empirical application are presented to examine
the finite-sample performance of the developed model and methodology.
arXiv link: http://arxiv.org/abs/2303.10117v2
Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices
multivariate probabilistic forecasts, considering dependencies between
quantiles and marginals through a smoothing procedure that allows for online
learning. We discuss two smoothing methods: dimensionality reduction using
Basis matrices and penalized smoothing. The new online learning algorithm
generalizes the standard CRPS learning framework into multivariate dimensions.
It is based on Bernstein Online Aggregation (BOA) and yields optimal asymptotic
learning properties. The procedure uses horizontal aggregation, i.e.,
aggregation across quantiles. We provide an in-depth discussion on possible
extensions of the algorithm and several nested cases related to the existing
literature on online forecast combination. We apply the proposed methodology to
forecasting day-ahead electricity prices, which are 24-dimensional
distributional forecasts. The proposed method yields significant improvements
over uniform combination in terms of continuous ranked probability score
(CRPS). We discuss the temporal evolution of the weights and hyperparameters
and present the results of reduced versions of the preferred model. A fast C++
implementation of the proposed algorithm is provided in the open-source
R-Package profoc on CRAN.
arXiv link: http://arxiv.org/abs/2303.10019v3
Bootstrap based asymptotic refinements for high-dimensional nonlinear models
nonlinear model that is sparse in the sense that most of its parameters are
zero but some are not. We use the SCAD penalty function, which provides model
selection consistent and oracle efficient estimates under suitable conditions.
However, asymptotic approximations based on the oracle model can be inaccurate
with the sample sizes found in many applications. This paper gives conditions
under which the bootstrap, based on estimates obtained through SCAD
penalization with thresholding, provides asymptotic refinements of size \(O
\left( n^{- 2} \right)\) for the error in the rejection (coverage) probability
of a symmetric hypothesis test (confidence interval) and \(O \left( n^{- 1}
\right)\) for the error in the rejection (coverage) probability of a one-sided
or equal tailed test (confidence interval). The results of Monte Carlo
experiments show that the bootstrap can provide large reductions in errors in
rejection and coverage probabilities. The bootstrap is consistent, though it
does not necessarily provide asymptotic refinements, even if some parameters
are close but not equal to zero. Random-coefficients logit and probit models
and nonlinear moment models are examples of models to which the procedure
applies.
arXiv link: http://arxiv.org/abs/2303.09680v2
On the robustness of posterior means
with known $\sigma^2$. Suppose $\theta \sim G_0$, where the prior $G_0$ has
zero mean and variance bounded by $V$. Let $G_1$ be a possibly misspecified
prior with zero mean and variance bounded by $V$. We show that the squared
error Bayes risk of the posterior mean under $G_1$ is bounded, subjected to an
additional tail condition on $G_1$, uniformly over $G_0, G_1, \sigma^2 > 0$.
arXiv link: http://arxiv.org/abs/2303.08653v2
Identifying an Earnings Process With Dependent Contemporaneous Income Shocks
earnings dynamics model with arbitrarily dependent contemporaneous income
shocks. Traditional methods relying on second moments fail to identify these
coefficients, emphasizing the need for nongaussianity assumptions that capture
information from higher moments. Our results contribute to the literature on
earnings dynamics by allowing models of earnings to have, for example, the
permanent income shock of a job change to be linked to the contemporaneous
transitory income shock of a relocation bonus.
arXiv link: http://arxiv.org/abs/2303.08460v2
Identification- and many moment-robust inference via invariant moment conditions
updating GMM objective function. When the number of moment conditions grows
proportionally with the sample size, the large-dimensional weighting matrix
prohibits the use of conventional asymptotic approximations and the behavior of
these tests remains unknown. We show that the structure of the weighting matrix
opens up an alternative route to asymptotic results when, under the null
hypothesis, the distribution of the moment conditions satisfies a symmetry
condition known as reflection invariance. We provide several examples in which
the invariance follows from standard assumptions. Our results show that
existing tests will be asymptotically conservative, and we propose an
adjustment to attain nominal size in large samples. We illustrate our findings
through simulations for various linear and nonlinear models, and an empirical
application on the effect of the concentration of financial activities in banks
on systemic risk.
arXiv link: http://arxiv.org/abs/2303.07822v5
Tight Non-asymptotic Inference via Sub-Gaussian Intrinsic Moment Norm
distributions are of paramount importance. However, directly estimating these
parameters using the empirical moment generating function (MGF) is infeasible.
To address this, we suggest using the sub-Gaussian intrinsic moment norm
[Buldygin and Kozachenko (2000), Theorem 1.3] achieved by maximizing a sequence
of normalized moments. Significantly, the suggested norm can not only
reconstruct the exponential moment bounds of MGFs but also provide tighter
sub-Gaussian concentration inequalities. In practice, we provide an intuitive
method for assessing whether data with a finite sample size is sub-Gaussian,
utilizing the sub-Gaussian plot. The intrinsic moment norm can be robustly
estimated via a simple plug-in approach. Our theoretical findings are also
applicable to reinforcement learning, including the multi-armed bandit
scenario.
arXiv link: http://arxiv.org/abs/2303.07287v2
Inflation forecasting with attention based transformer neural networks
a fundamental aim of governments and central banks. However, forecasting
inflation is not a trivial task, as its prediction relies on low frequency,
highly fluctuating data with unclear explanatory variables. While classical
models show some possibility of predicting inflation, reliably beating the
random walk benchmark remains difficult. Recently, (deep) neural networks have
shown impressive results in a multitude of applications, increasingly setting
the new state-of-the-art. This paper investigates the potential of the
transformer deep neural network architecture to forecast different inflation
rates. The results are compared to a study on classical time series and machine
learning models. We show that our adapted transformer, on average, outperforms
the baseline in 6 out of 16 experiments, showing best scores in two out of four
investigated inflation rates. Our results demonstrate that a transformer based
neural network can outperform classical regression and machine learning models
in certain inflation rates and forecasting horizons.
arXiv link: http://arxiv.org/abs/2303.15364v2
Counterfactual Copula and Its Application to the Effects of College Education on Intergenerational Mobility
two outcome variables that would be affected by a policy intervention. The
proposed estimator allows policymakers to conduct ex-ante evaluations by
comparing the estimated counterfactual and actual copulas as well as their
corresponding measures of association. Asymptotic properties of the
counterfactual copula estimator are established under regularity conditions.
These conditions are also used to validate the nonparametric bootstrap for
inference on counterfactual quantities. Simulation results indicate that our
estimation and inference procedures perform well in moderately sized samples.
Applying the proposed method to studying the effects of college education on
intergenerational income mobility under two counterfactual scenarios, we find
that while providing some college education to all children is unlikely to
promote mobility, offering a college degree to children from less educated
families can significantly reduce income persistence across generations.
arXiv link: http://arxiv.org/abs/2303.06658v1
Distributional Vector Autoregression: Eliciting Macro and Financial Dependence
finance for understanding the dynamic interdependencies among multivariate time
series. In this study, we expand the scope of vector autoregression by
incorporating a multivariate distributional regression framework and
introducing a distributional impulse response function, providing a
comprehensive view of dynamic heterogeneity. We propose a straightforward yet
flexible estimation method and establish its asymptotic properties under weak
dependence assumptions. Our empirical analysis examines the conditional joint
distribution of GDP growth and financial conditions in the United States, with
a focus on the global financial crisis. Our results show that tight financial
conditions lead to a multimodal conditional joint distribution of GDP growth
and financial conditions, and easing financial conditions significantly impacts
long-term GDP growth, while improving the GDP growth during the global
financial crisis has limited effects on financial conditions.
arXiv link: http://arxiv.org/abs/2303.04994v1
Inference on Optimal Dynamic Policies via Softmax Approximation
problem in dynamic decision making. In the context of causal inference, the
problem is known as estimating the optimal dynamic treatment regime. Even
though there exists a plethora of methods for estimation, constructing
confidence intervals for the value of the optimal regime and structural
parameters associated with it is inherently harder, as it involves non-linear
and non-differentiable functionals of unknown quantities that need to be
estimated. Prior work resorted to sub-sample approaches that can deteriorate
the quality of the estimate. We show that a simple soft-max approximation to
the optimal treatment regime, for an appropriately fast growing temperature
parameter, can achieve valid inference on the truly optimal regime. We
illustrate our result for a two-period optimal dynamic regime, though our
approach should directly extend to the finite horizon case. Our work combines
techniques from semi-parametric inference and $g$-estimation, together with an
appropriate triangular array central limit theorem, as well as a novel analysis
of the asymptotic influence and asymptotic bias of softmax approximations.
arXiv link: http://arxiv.org/abs/2303.04416v3
Just Ask Them Twice: Choice Probabilities and Identification of Ex ante returns and Willingness-To-Pay
use of probabilistic stated preference experiments to estimate semi-parametric
population distributions of ex ante returns and willingness-to-pay (WTP) for a
choice attribute. This relies on eliciting several choices per individual, and
estimating separate demand functions, at the cost of possibly long survey
instruments. This paper shows that the distributions of interest can be
recovered from at most two stated choices, without requiring ad-hoc parametric
assumptions. Hence, it allows for significantly shorter survey instruments. The
paper also shows that eliciting probabilistic stated choices allows identifying
much richer objects than we have done so far, and therefore, provides better
tools for ex ante policy evaluation. Finally, it showcases the feasibility and
relevance of the results by studying the preference of high ability students in
Cote d'Ivoire for public sector jobs exploiting a unique survey on this
population. Our analysis supports the claim that public sector jobs might
significantly increase the cost of hiring elite students for the private
sector.
arXiv link: http://arxiv.org/abs/2303.03009v4
EnsembleIV: Creating Instrumental Variables from Ensemble Learners for Robust Statistical Inference
machine learning generated variables into regression models for statistical
inference suffers from the measurement error problem, which can bias estimation
and threaten the validity of inferences. In this paper, we develop a novel
approach to alleviate associated estimation biases. Our proposed approach,
EnsembleIV, creates valid and strong instrumental variables from weak learners
in an ensemble model, and uses them to obtain consistent estimates that are
robust against the measurement error problem. Our empirical evaluations, using
both synthetic and real-world datasets, show that EnsembleIV can effectively
reduce estimation biases across several common regression specifications, and
can be combined with modern deep learning techniques when dealing with
unstructured data.
arXiv link: http://arxiv.org/abs/2303.02820v2
Censored Quantile Regression with Many Controls
regression models with high-dimensional controls. The methods are based on the
application of double/debiased machine learning (DML) framework to the censored
quantile regression estimator of Buchinsky and Hahn (1998). I provide valid
inference for low-dimensional parameters of interest in the presence of
high-dimensional nuisance parameters when implementing machine learning
estimators. The proposed estimator is shown to be consistent and asymptotically
normal. The performance of the estimator with high-dimensional controls is
illustrated with numerical simulation and an empirical application that
examines the effect of 401(k) eligibility on savings.
arXiv link: http://arxiv.org/abs/2303.02784v1
Deterministic, quenched and annealed parameter estimation for heterogeneous network models
the analysis of economic systems exist: the typical, econometric one,
interpreting the Gravity Model specification as the expected link weight of an
arbitrary probability distribution, and the one rooted into statistical
physics, constructing maximum-entropy distributions constrained to satisfy
certain network properties. In a couple of recent, companion papers they have
been successfully integrated within the framework induced by the constrained
minimisation of the Kullback-Leibler divergence: specifically, two, broad
classes of models have been devised, i.e. the integrated and the conditional
ones, defined by different, probabilistic rules to place links, load them with
weights and turn them into proper, econometric prescriptions. Still, the
recipes adopted by the two approaches to estimate the parameters entering into
the definition of each model differ. In econometrics, a likelihood that
decouples the binary and weighted parts of a model, treating a network as
deterministic, is typically maximised; to restore its random character, two
alternatives exist: either solving the likelihood maximisation on each
configuration of the ensemble and taking the average of the parameters
afterwards or taking the average of the likelihood function and maximising the
latter one. The difference between these approaches lies in the order in which
the operations of averaging and maximisation are taken - a difference that is
reminiscent of the quenched and annealed ways of averaging out the disorder in
spin glasses. The results of the present contribution, devoted to comparing
these recipes in the case of continuous, conditional network models, indicate
that the annealed estimation recipe represents the best alternative to the
deterministic one.
arXiv link: http://arxiv.org/abs/2303.02716v5
Fast Forecasting of Unstable Data Streams for On-Demand Service Platforms
collection of high-frequency regional demand data streams that exhibit
instabilities. This paper develops a novel forecast framework that is fast and
scalable, and automatically assesses changing environments without human
intervention. We empirically test our framework on a large-scale demand data
set from a leading on-demand delivery platform in Europe, and find strong
performance gains from using our framework against several industry benchmarks,
across all geographical regions, loss functions, and both pre- and post-Covid
periods. We translate forecast gains to economic impacts for this on-demand
service platform by computing financial gains and reductions in computing
costs.
arXiv link: http://arxiv.org/abs/2303.01887v2
Constructing High Frequency Economic Indicators by Imputation
common factor estimated from high and low frequency data, either separately or
jointly. To incorporate mixed frequency information without directly modeling
them, we target a low frequency diffusion index that is already available, and
treat high frequency values as missing. We impute these values using multiple
factors estimated from the high frequency data. In the empirical examples
considered, static matrix completion that does not account for serial
correlation in the idiosyncratic errors yields imprecise estimates of the
missing values irrespective of how the factors are estimated. Single equation
and systems-based dynamic procedures that account for serial correlation yield
imputed values that are closer to the observed low frequency ones. This is the
case in the counterfactual exercise that imputes the monthly values of consumer
sentiment series before 1978 when the data was released only on a quarterly
basis. This is also the case for a weekly version of the CFNAI index of
economic activity that is imputed using seasonally unadjusted data. The imputed
series reveals episodes of increased variability of weekly economic information
that are masked by the monthly data, notably around the 2014-15 collapse in oil
prices.
arXiv link: http://arxiv.org/abs/2303.01863v3
Debiased Machine Learning of Aggregated Intersection Bounds and Other Causal Parameters
regression functions, where the target parameter is obtained by averaging the
minimum (or maximum) of a collection of regression functions over the covariate
space. Such quantities include the lower and upper bounds on distributional
effects (Frechet-Hoeffding, Makarov) and the optimal welfare in the statistical
treatment choice problem. The proposed estimator -- the envelope score
estimator -- is shown to have an oracle property, where the oracle knows the
identity of the minimizer for each covariate value. I apply this result to the
bounds in the Roy model and the Horowitz-Manski-Lee bounds with a discrete
outcome. The proposed approach performs well empirically on the data from the
Oregon Health Insurance Experiment.
arXiv link: http://arxiv.org/abs/2303.00982v3
$21^{st}$ Century Statistical Disclosure Limitation: Motivations and Challenges
statistical agencies approach statistical disclosure limitation for official
data product releases. It discusses the implications for agencies' broader data
governance and decision-making, and it identifies challenges that agencies will
likely face along the way. In conclusion, the chapter proposes some principles
and best practices that we believe can help guide agencies in navigating the
transformation of their confidentiality programs.
arXiv link: http://arxiv.org/abs/2303.00845v1
Generalized Cumulative Shrinkage Process Priors with Applications to Sparse Bayesian Factor Analysis
sequence of parameters. We review the cumulative shrinkage process (CUSP) prior
of Legramanti et al. (2020), which is a spike-and-slab shrinkage prior where
the spike probability is stochastically increasing and constructed from the
stick-breaking representation of a Dirichlet process prior. As a first
contribution, this CUSP prior is extended by involving arbitrary stick-breaking
representations arising from beta distributions. As a second contribution, we
prove that exchangeable spike-and-slab priors, which are popular and widely
used in sparse Bayesian factor analysis, can be represented as a finite
generalized CUSP prior, which is easily obtained from the decreasing order
statistics of the slab probabilities. Hence, exchangeable spike-and-slab
shrinkage priors imply increasing shrinkage as the column index in the loading
matrix increases, without imposing explicit order constraints on the slab
probabilities. An application to sparse Bayesian factor analysis illustrates
the usefulness of the findings of this paper. A new exchangeable spike-and-slab
shrinkage prior based on the triple gamma prior of Cadonna et al. (2020) is
introduced and shown to be helpful for estimating the unknown number of factors
in a simulation study.
arXiv link: http://arxiv.org/abs/2303.00473v1
Consumer Welfare Under Individual Heterogeneity
welfare from cross-sectional data with no restrictions on individual
preferences. First demonstrating that moments of demand identify the curvature
of the expenditure function, we use these moments to approximate money-metric
welfare measures. Our approach captures both nonhomotheticity and heterogeneity
in preferences in the behavioral responses to price changes. We apply our
method to US household scanner data to evaluate the impacts of the price shock
between December 2020 and 2021 on the cost-of-living index. We document
substantial heterogeneity in welfare losses within and across demographic
groups. For most groups, a naive measure of consumer welfare would
significantly underestimate the welfare loss. By decomposing the behavioral
responses into the components arising from nonhomotheticity and heterogeneity
in preferences, we find that both factors are essential for accurate welfare
measurement, with heterogeneity contributing more substantially.
arXiv link: http://arxiv.org/abs/2303.01231v5
Disentangling Structural Breaks in Factor Models for Macroeconomic Data
estimating factor models in macroeconomics do not distinguish between breaks of
the factor variance and factor loadings. We argue that it is important to
distinguish between structural breaks in the factor variance and loadings
within factor models commonly employed in macroeconomics as both can lead to
markedly different interpretations when viewed via the lens of the underlying
dynamic factor model. We then develop a projection-based decomposition that
leads to two standard and easy-to-implement Wald tests to disentangle
structural breaks in the factor variance and factor loadings. Applying our
procedure to U.S. macroeconomic data, we find evidence of both types of breaks
associated with the Great Moderation and the Great Recession. Through our
projection-based decomposition, we estimate that the Great Moderation is
associated with an over 60% reduction in the total factor variance,
highlighting the relevance of disentangling breaks in the factor structure.
arXiv link: http://arxiv.org/abs/2303.00178v2
Transition Probabilities and Moment Restrictions in Dynamic Fixed Effects Logit Models
dependence. This paper introduces a new method to derive moment restrictions in
a large class of such models with strictly exogenous regressors and fixed
effects. We exploit the common structure of logit-type transition probabilities
and elementary properties of rational fractions, to formulate a systematic
procedure that scales naturally with model complexity (e.g the lag order or the
number of observed time periods). We detail the construction of moment
restrictions in binary response models of arbitrary lag order as well as
first-order panel vector autoregressions and dynamic multinomial logit models.
Identification of common parameters and average marginal effects is also
discussed for the binary response case. Finally, we illustrate our results by
studying the dynamics of drug consumption amongst young people inspired by Deza
(2015).
arXiv link: http://arxiv.org/abs/2303.00083v2
The First-stage F Test with Many Weak Instruments
first-stage $F$ statistic. While this method was developed with a fixed number
of instruments, its performance with many instruments remains insufficiently
explored. We show that the first-stage $F$ test exhibits distorted sizes for
detecting many weak instruments, regardless of the choice of pretested
estimators or Wald tests. These distortions occur due to the inadequate
approximation using classical noncentral Chi-squared distributions. As a
byproduct of our main result, we present an alternative approach to pre-test
many weak instruments with the corrected first-stage $F$ statistic. An
empirical illustration with Angrist and Keueger (1991)'s returns to education
data confirms its usefulness.
arXiv link: http://arxiv.org/abs/2302.14423v2
A specification test for the strength of instrumental variables
the number of instruments $K_n$ is large with a magnitude comparable to the
sample size $n$. The test relies on the fact that the difference between the
two-stage least squares (2SLS) estimator and the ordinary least squares (OLS)
estimator asymptotically disappears when there are many weak instruments, but
otherwise converges to a non-zero limit. We establish the limiting distribution
of the difference within the above two specifications, and introduce a
delete-$d$ Jackknife procedure to consistently estimate the asymptotic
variance/covariance of the difference. Monte Carlo experiments demonstrate the
good performance of the test procedure for both cases of single and multiple
endogenous variables. Additionally, we re-examine the analysis of returns to
education data in Angrist and Keueger (1991) using our proposed test. Both the
simulation results and empirical analysis indicate the reliability of the test.
arXiv link: http://arxiv.org/abs/2302.14396v1
Unified and robust Lagrange multiplier type tests for cross-sectional independence in large panel data models
of no cross-sectional dependence in large panel data models. We propose a
unified test procedure and its power enhancement version, which show robustness
for a wide class of panel model contexts. Specifically, the two procedures are
applicable to both heterogeneous and fixed effects panel data models with the
presence of weakly exogenous as well as lagged dependent regressors, allowing
for a general form of nonnormal error distribution. With the tools from Random
Matrix Theory, the asymptotic validity of the test procedures is established
under the simultaneous limit scheme where the number of time periods and the
number of cross-sectional units go to infinity proportionally. The derived
theories are accompanied by detailed Monte Carlo experiments, which confirm the
robustness of the two tests and also suggest the validity of the power
enhancement technique.
arXiv link: http://arxiv.org/abs/2302.14387v1
Identification and Estimation of Categorical Random Coefficient Models
the random coefficients follow parametric categorical distributions. The
distributional parameters are identified based on a linear recurrence structure
of moments of the random coefficients. A Generalized Method of Moments
estimation procedure is proposed also employed by Peter Schmidt and his
coauthors to address heterogeneity in time effects in panel data models. Using
Monte Carlo simulations, we find that moments of the random coefficients can be
estimated reasonably accurately, but large samples are required for estimation
of the parameters of the underlying categorical distribution. The utility of
the proposed estimator is illustrated by estimating the distribution of returns
to education in the U.S. by gender and educational levels. We find that rising
heterogeneity between educational groups is mainly due to the increasing
returns to education for those with postsecondary education, whereas within
group heterogeneity has been rising mostly in the case of individuals with high
school or less education.
arXiv link: http://arxiv.org/abs/2302.14380v1
Macroeconomic Forecasting using Dynamic Factor Models: The Case of Morocco
forecasting, with a focus on the Factor-Augmented Error Correction Model
(FECM). The FECM combines the advantages of cointegration and dynamic factor
models, providing a flexible and reliable approach to macroeconomic
forecasting, especially for non-stationary variables. We evaluate the
forecasting performance of the FECM model on a large dataset of 117 Moroccan
economic series with quarterly frequency. Our study shows that FECM outperforms
traditional econometric models in terms of forecasting accuracy and robustness.
The inclusion of long-term information and common factors in FECM enhances its
ability to capture economic dynamics and leads to better forecasting
performance than other competing models. Our results suggest that FECM can be a
valuable tool for macroeconomic forecasting in Morocco and other similar
economies.
arXiv link: http://arxiv.org/abs/2302.14180v3
Forecasting Macroeconomic Tail Risk in Real Time: Do Textual Data Add Value?
economic indicators for quantile predictions of employment, output, inflation
and consumer sentiment in a high-dimensional setting. Our results suggest that
news data contain valuable information that is not captured by a large set of
economic indicators. We provide empirical evidence that this information can be
exploited to improve tail risk predictions. The added value is largest when
media coverage and sentiment are combined to compute text-based predictors.
Methods that capture quantile-specific non-linearities produce overall superior
forecasts relative to methods that feature linear predictive relationships. The
results are robust along different modeling choices.
arXiv link: http://arxiv.org/abs/2302.13999v2
Multicell experiments for marginal treatment effect estimation of digital ads
tool to measure the impacts of interventions. However, in experimental settings
with one-sided noncompliance extant empirical approaches may not produce the
estimands a decision maker needs to solve the problem of interest. For example,
these experimental designs are common in digital advertising settings but
typical methods do not yield effects that inform the intensive margin: how many
consumers should be reached or how much should be spent on a campaign. We
propose a solution that combines a novel multicell experimental design with
modern estimation techniques that enables decision makers to solve problems
with an intensive margin. Our design is straightforward to implement and does
not require additional budget. We illustrate our method through simulations
calibrated using an advertising experiment at Facebook, demonstrating its
superior performance in various scenarios and its advantage over direct
optimization approaches.
arXiv link: http://arxiv.org/abs/2302.13857v4
Nickell Bias in Panel Local Projection: Financial Crises Are Worse Than You Think
adopted for evaluating the economic consequences of financial crises across
countries. This paper highlights a fundamental methodological issue: the
presence of the Nickell bias in the panel FE estimator due to inherent dynamic
structures of panel predictive specifications, even if the regressors have no
lagged dependent variables. The Nickell bias invalidates the standard
inferential procedure based on the $t$-statistic. We propose the split-panel
jackknife (SPJ) estimator as a simple, easy-to-implement, and yet effective
solution to eliminate the bias and restore valid statistical inference. We
revisit four influential empirical studies on the impact of financial crises,
and find that the FE method underestimates the economic losses of financial
crises relative to the SPJ estimates.
arXiv link: http://arxiv.org/abs/2302.13455v4
Estimating Fiscal Multipliers by Combining Statistical Identification with Potentially Endogenous Proxies
conclusions regarding the size of fiscal multipliers. Our analysis suggests
that the conflicting results may stem from violations of the proxy exogeneity
assumptions. We propose a novel approach to include proxy variables into a
Bayesian non-Gaussian SVAR, tailored to accommodate potentially endogenous
proxies. Using our model, we find that increasing government spending is more
effective in stimulating the economy than reducing taxes.
arXiv link: http://arxiv.org/abs/2302.13066v6
On the Misspecification of Linear Assumptions in Synthetic Control
treatment effects from observational panel data. It rests on a crucial
assumption that we can write the treated unit as a linear combination of the
untreated units. This linearity assumption, however, can be unlikely to hold in
practice and, when violated, the resulting SC estimates are incorrect. In this
paper we examine two questions: (1) How large can the misspecification error
be? (2) How can we limit it? First, we provide theoretical bounds to quantify
the misspecification error. The bounds are comforting: small misspecifications
induce small errors. With these bounds in hand, we then develop new SC
estimators that are specially designed to minimize misspecification error. The
estimators are based on additional data about each unit, which is used to
produce the SC weights. (For example, if the units are countries then the
additional data might be demographic information about each.) We study our
estimators on synthetic data; we find they produce more accurate causal
estimates than standard synthetic controls. We then re-analyze the California
tobacco-program data of the original SC paper, now including additional data
from the US census about per-state demographics. Our estimators show that the
observations in the pre-treatment period lie within the bounds of
misspecification error, and that the observations post-treatment lie outside of
those bounds. This is evidence that our SC methods have uncovered a true
effect.
arXiv link: http://arxiv.org/abs/2302.12777v1
Personalized Pricing with Invalid Instrumental Variables: Identification, Estimation, and Policy Learning
maximize sellers' revenues. This work studies offline personalized pricing
under endogeneity using an instrumental variable approach. Standard
instrumental variable methods in causal inference/econometrics either focus on
a discrete treatment space or require the exclusion restriction of instruments
from having a direct effect on the outcome, which limits their applicability in
personalized pricing. In this paper, we propose a new policy learning method
for Personalized pRicing using Invalid iNsTrumental variables (PRINT) for
continuous treatment that allow direct effects on the outcome. Specifically,
relying on the structural models of revenue and price, we establish the
identifiability condition of an optimal pricing strategy under endogeneity with
the help of invalid instrumental variables. Based on this new identification,
which leads to solving conditional moment restrictions with generalized
residual functions, we construct an adversarial min-max estimator and learn an
optimal pricing strategy. Furthermore, we establish an asymptotic regret bound
to find an optimal pricing strategy. Finally, we demonstrate the effectiveness
of the proposed method via extensive simulation studies as well as a real data
application from an US online auto loan company.
arXiv link: http://arxiv.org/abs/2302.12670v1
Variable Importance Matching for Causal Inference
auditable, easy to troubleshoot, accurate for treatment effect estimation, and
scalable to high-dimensional data. We describe a general framework called
Model-to-Match that achieves these goals by (i) learning a distance metric via
outcome modeling, (ii) creating matched groups using the distance metric, and
(iii) using the matched groups to estimate treatment effects. Model-to-Match
uses variable importance measurements to construct a distance metric, making it
a flexible framework that can be adapted to various applications. Concentrating
on the scalability of the problem in the number of potential confounders, we
operationalize the Model-to-Match framework with LASSO. We derive performance
guarantees for settings where LASSO outcome modeling consistently identifies
all confounders (importantly without requiring the linear model to be correctly
specified). We also provide experimental results demonstrating the method's
auditability, accuracy, and scalability as well as extensions to more general
nonparametric outcome modeling.
arXiv link: http://arxiv.org/abs/2302.11715v2
Decomposition and Interpretation of Treatment Effects in Settings with Delayed Outcomes
and estimating the average direct causal effect of a binary treatment on
an outcome. We consider a setup in which the outcome realization does not get
immediately realized after the treatment assignment, a feature that is
ubiquitous in empirical settings. The period between the treatment and the
realization of the outcome allows other observed actions to occur and affect
the outcome. In this context, we study several regression-based estimands
routinely used in empirical work to capture the average treatment effect and
shed light on interpreting them in terms of ceteris paribus effects, indirect
causal effects, and selection terms. We obtain three main and related takeaways
under a common set of assumptions. First, the three most popular estimands do
not generally satisfy what we call strong sign preservation, in the
sense that these estimands may be negative even when the treatment positively
affects the outcome conditional on any possible combination of other actions.
Second, the most popular regression that includes the other actions as controls
satisfies strong sign preservation if and only if these actions are
mutually exclusive binary variables. Finally, we show that a linear regression
that fully stratifies the other actions leads to estimands that satisfy strong
sign preservation.
arXiv link: http://arxiv.org/abs/2302.11505v5
Attitudes and Latent Class Choice Models using Machine learning
(DCMs) that capture unobserved heterogeneity in the choice process by
segmenting the population based on the assumption of preference similarities.
We present a method of efficiently incorporating attitudinal indicators in the
specification of LCCM, by introducing Artificial Neural Networks (ANN) to
formulate latent variables constructs. This formulation overcomes structural
equations in its capability of exploring the relationship between the
attitudinal indicators and the decision choice, given the Machine Learning (ML)
flexibility and power in capturing unobserved and complex behavioural features,
such as attitudes and beliefs. All of this while still maintaining the
consistency of the theoretical assumptions presented in the Generalized Random
Utility model and the interpretability of the estimated parameters. We test our
proposed framework for estimating a Car-Sharing (CS) service subscription
choice with stated preference data from Copenhagen, Denmark. The results show
that our proposed approach provides a complete and realistic segmentation,
which helps design better policies.
arXiv link: http://arxiv.org/abs/2302.09871v1
Identification-robust inference for the LATE with high-dimensional covariates
effect (LATE) in the presence of high-dimensional covariates, irrespective of
the strength of identification. We propose a novel high-dimensional conditional
test statistic with uniformly correct asymptotic size. We provide an
easy-to-implement algorithm to infer the high-dimensional LATE by inverting our
test statistic and employing the double/debiased machine learning method.
Simulations indicate that our test is robust against both weak identification
and high dimensionality concerning size control and power performance,
outperforming other conventional tests. Applying the proposed method to
railroad and population data to study the effect of railroad access on urban
population growth, we observe that our methodology yields confidence intervals
that are 49% to 92% shorter than conventional results, depending on
specifications.
arXiv link: http://arxiv.org/abs/2302.09756v4
Clustered Covariate Regression
and existing techniques to address this issue typically require sparsity or
discrete heterogeneity of the unobservable parameter vector. However,
neither restriction may be supported by economic theory in some empirical
contexts, leading to severe bias and misleading inference. The clustering-based
grouped parameter estimator (GPE) introduced in this paper drops both
restrictions and maintains the natural one that the parameter support be
bounded. GPE exhibits robust large sample properties under standard conditions
and accommodates both sparse and non-sparse parameters whose support can be
bounded away from zero. Extensive Monte Carlo simulations demonstrate the
excellent performance of GPE in terms of bias reduction and size control
compared to competing estimators. An empirical application of GPE to estimating
price and income elasticities of demand for gasoline highlights its practical
utility.
arXiv link: http://arxiv.org/abs/2302.09255v4
Post Reinforcement Learning Inference
learning (RL) algorithms. These algorithms adaptively experiment by interacting
with individual units over multiple stages, updating their strategies based on
past outcomes. Our goal is to evaluate a counterfactual policy after data
collection and estimate structural parameters, such as dynamic treatment
effects, that support credit assignment and quantify the impact of early
actions on final outcomes. These parameters can often be defined as solutions
to moment equations, motivating moment-based estimation methods developed for
static data. In RL settings, however, data are often collected adaptively under
nonstationary behavior policies. As a result, standard estimators fail to
achieve asymptotic normality due to time-varying variance. We propose a
weighted generalized method of moments (GMM) approach that uses adaptive
weights to stabilize this variance. We characterize weighting schemes that
ensure consistency and asymptotic normality of the weighted GMM estimators,
enabling valid hypothesis testing and uniform confidence region construction.
Key applications include dynamic treatment effect estimation and dynamic
off-policy evaluation.
arXiv link: http://arxiv.org/abs/2302.08854v5
New $\sqrt{n}$-consistent, numerically stable higher-order influence function estimators
constructing rate-optimal estimators for a large class of low-dimensional
(smooth) statistical functionals/parameters (and sometimes even
infinite-dimensional functions) that arise in substantive fields including
epidemiology, economics, and the social sciences. Since the introduction of
HOIFs by Robins et al. (2008), they have been viewed mostly as a theoretical
benchmark rather than a useful tool for statistical practice. Works aimed to
flip the script are scant, but a few recent papers Liu et al. (2017, 2021b)
make some partial progress. In this paper, we take a fresh attempt at achieving
this goal by constructing new, numerically stable HOIF estimators (or sHOIF
estimators for short with “s” standing for “stable”) with provable
statistical, numerical, and computational guarantees. This new class of sHOIF
estimators (up to the 2nd order) was foreshadowed in synthetic experiments
conducted by Liu et al. (2020a).
arXiv link: http://arxiv.org/abs/2302.08097v1
Deep Learning Enhanced Realized GARCH
(LSTM) and realized volatility measures. This LSTM-enhanced realized GARCH
framework incorporates and distills modeling advances from financial
econometrics, high frequency trading data and deep learning. Bayesian inference
via the Sequential Monte Carlo method is employed for statistical inference and
forecasting. The new framework can jointly model the returns and realized
volatility measures, has an excellent in-sample fit and superior predictive
performance compared to several benchmark models, while being able to adapt
well to the stylized facts in volatility. The performance of the new framework
is tested using a wide range of metrics, from marginal likelihood, volatility
forecasting, to tail risk forecasting and option pricing. We report on a
comprehensive empirical study using 31 widely traded stock indices over a time
period that includes COVID-19 pandemic.
arXiv link: http://arxiv.org/abs/2302.08002v2
A Guide to Regression Discontinuity Designs in Medical Applications
(RD) designs in biomedical contexts. We begin by introducing key concepts,
assumptions, and estimands within both the continuity-based framework and the
local randomization framework. We then discuss modern estimation and inference
methods within both frameworks, including approaches for bandwidth or local
neighborhood selection, optimal treatment effect point estimation, and robust
bias-corrected inference methods for uncertainty quantification. We also
overview empirical falsification tests that can be used to support key
assumptions. Our discussion focuses on two particular features that are
relevant in biomedical research: (i) fuzzy RD designs, which often arise when
therapeutic treatments are based on clinical guidelines but patients with
scores near the cutoff are treated contrary to the assignment rule; and (ii) RD
designs with discrete scores, which are ubiquitous in biomedical applications.
We illustrate our discussion with three empirical applications: the effect of
CD4 guidelines for anti-retroviral therapy on retention of HIV patients in
South Africa, the effect of genetic guidelines for chemotherapy on breast
cancer recurrence in the United States, and the effects of age-based patient
cost-sharing on healthcare utilization in Taiwan. We provide replication
materials employing publicly available statistical software in Python, R and
Stata, offering researchers all necessary tools to conduct an RD analysis.
arXiv link: http://arxiv.org/abs/2302.07413v2
Sequential Estimation of Multivariate Factor Stochastic Volatility Models
stochastic volatility models with latent factor structures. These models are
very useful as they alleviate the standard curse of dimensionality, allowing
the number of parameters to increase only linearly with the number of the
return series. Although theoretically very appealing, these models have only
found limited practical application due to huge computational burdens. Our
estimation method is simple in implementation as it consists of two steps:
first, we estimate the loadings and the unconditional variances by maximum
likelihood, and then we use the efficient method of moments to estimate the
parameters of the stochastic volatility structure with GARCH as an auxiliary
model. In a comprehensive Monte Carlo study we show the good performance of our
method to estimate the parameters of interest accurately. The simulation study
and an application to real vectors of daily returns of dimensions up to 148
show the method's computation advantage over the existing estimation
procedures.
arXiv link: http://arxiv.org/abs/2302.07052v1
Quantiled conditional variance, skewness, and kurtosis by Cornish-Fisher expansion
series analysis. These three conditional moments (CMs) are often studied by
some parametric models but with two big issues: the risk of model
mis-specification and the instability of model estimation. To avoid the above
two issues, this paper proposes a novel method to estimate these three CMs by
the so-called quantiled CMs (QCMs). The QCM method first adopts the idea of
Cornish-Fisher expansion to construct a linear regression model, based on $n$
different estimated conditional quantiles. Next, it computes the QCMs simply
and simultaneously by using the ordinary least squares estimator of this
regression model, without any prior estimation of the conditional mean. Under
certain conditions, the QCMs are shown to be consistent with the convergence
rate $n^{-1/2}$. Simulation studies indicate that the QCMs perform well under
different scenarios of Cornish-Fisher expansion errors and quantile estimation
errors. In the application, the study of QCMs for three exchange rates
demonstrates the effectiveness of financial rescue plans during the COVID-19
pandemic outbreak, and suggests that the existing “news impact curve”
functions for the conditional skewness and kurtosis may not be suitable.
arXiv link: http://arxiv.org/abs/2302.06799v2
Individualized Treatment Allocation in Sequential Network Games
equilibrium welfare of interacting agents has many policy-relevant
applications. Focusing on sequential decision games of interacting agents, this
paper develops a method to obtain optimal treatment assignment rules that
maximize a social welfare criterion by evaluating stationary distributions of
outcomes. Stationary distributions in sequential decision games are given by
Gibbs distributions, which are difficult to optimize with respect to a
treatment allocation due to analytical and computational complexity. We apply a
variational approximation to the stationary distribution and optimize the
approximated equilibrium welfare with respect to treatment allocation using a
greedy optimization algorithm. We characterize the performance of the
variational approximation, deriving a performance guarantee for the greedy
optimization algorithm via a welfare regret bound. We implement our proposed
method in simulation exercises and an empirical application using the Indian
microfinance data (Banerjee et al., 2013), and show it delivers significant
welfare gains.
arXiv link: http://arxiv.org/abs/2302.05747v5
Minimax Instrumental Variable Regression and $L_2$ Convergence Guarantees without Identification or Closedness
(IV) regressions. Recently, many flexible machine learning methods have been
developed for instrumental variable estimation. However, these methods have at
least one of the following limitations: (1) restricting the IV regression to be
uniquely identified; (2) only obtaining estimation error rates in terms of
pseudometrics (e.g., projected norm) rather than valid metrics
(e.g., $L_2$ norm); or (3) imposing the so-called closedness condition
that requires a certain conditional expectation operator to be sufficiently
smooth. In this paper, we present the first method and analysis that can avoid
all three limitations, while still permitting general function approximation.
Specifically, we propose a new penalized minimax estimator that can converge to
a fixed IV solution even when there are multiple solutions, and we derive a
strong $L_2$ error rate for our estimator under lax conditions. Notably, this
guarantee only needs a widely-used source condition and realizability
assumptions, but not the so-called closedness condition. We argue that the
source condition and the closedness condition are inherently conflicting, so
relaxing the latter significantly improves upon the existing literature that
requires both conditions. Our estimator can achieve this improvement because it
builds on a novel formulation of the IV estimation problem as a constrained
optimization problem.
arXiv link: http://arxiv.org/abs/2302.05404v1
Policy Learning with Rare Outcomes
(CATE) can guide policy decisions, either by allowing targeting of individuals
with beneficial CATE estimates, or as inputs to decision trees that optimise
overall outcomes. There is limited information available regarding how well
these algorithms perform in real-world policy evaluation scenarios. Using
synthetic data, we compare the finite sample performance of different policy
learning algorithms, machine learning techniques employed during their learning
phases, and methods for presenting estimated policy values. For each algorithm,
we assess the resulting treatment allocation by measuring deviation from the
ideal ("oracle") policy. Our main finding is that policy trees based on
estimated CATEs outperform trees learned from doubly-robust scores. Across
settings, Causal Forests and the Normalised Double-Robust Learner perform
consistently well, while Bayesian Additive Regression Trees perform poorly.
These methods are then applied to a case study targeting optimal allocation of
subsidised health insurance, with the goal of reducing infant mortality in
Indonesia.
arXiv link: http://arxiv.org/abs/2302.05260v2
Structural Break Detection in Quantile Predictive Regression Models with Persistent Covariates
nonstationary quantile predictive regressions. We establish the limit
distributions for a class of Wald and fluctuation type statistics based on both
the ordinary least squares estimator and the endogenous instrumental regression
estimator proposed by Phillips and Magdalinos (2009a, Econometric Inference in
the Vicinity of Unity. Working paper, Singapore Management University).
Although the asymptotic distribution of these test statistics appears to depend
on the chosen estimator, the IVX based tests are shown to be asymptotically
nuisance parameter-free regardless of the degree of persistence and consistent
under local alternatives. The finite-sample performance of both tests is
evaluated via simulation experiments. An empirical application to house pricing
index returns demonstrates the practicality of the proposed break tests for
regression quantiles of nonstationary time series data.
arXiv link: http://arxiv.org/abs/2302.05193v1
On semiparametric estimation of the intercept of the sample selection model: a kernel approach
the intercept of the sample selection model as identification at the boundary
via a transformation of the selection index. This perspective suggests
generalizations of estimation at infinity to kernel regression estimation at
the boundary and further to local linear estimation at the boundary. The
proposed kernel-type estimators with an estimated transformation are proven to
be nonparametric-rate consistent and asymptotically normal under mild
regularity conditions. A fully data-driven method of selecting the optimal
bandwidths for the estimators is developed. The Monte Carlo simulation shows
the desirable finite sample properties of the proposed estimators and bandwidth
selection procedures.
arXiv link: http://arxiv.org/abs/2302.05089v1
Covariate Adjustment in Experiments with Matched Pairs
in which treatment status is determined according to "matched pairs" and it is
additionally desired to adjust for observed, baseline covariates to gain
further precision. By a "matched pairs" design, we mean that units are sampled
i.i.d. from the population of interest, paired according to observed, baseline
covariates and finally, within each pair, one unit is selected at random for
treatment. Importantly, we presume that not all observed, baseline covariates
are used in determining treatment assignment. We study a broad class of
estimators based on a "doubly robust" moment condition that permits us to study
estimators with both finite-dimensional and high-dimensional forms of covariate
adjustment. We find that estimators with finite-dimensional, linear adjustments
need not lead to improvements in precision relative to the unadjusted
difference-in-means estimator. This phenomenon persists even if the adjustments
are interacted with treatment; in fact, doing so leads to no changes in
precision. However, gains in precision can be ensured by including fixed
effects for each of the pairs. Indeed, we show that this adjustment is the
"optimal" finite-dimensional, linear adjustment. We additionally study two
estimators with high-dimensional forms of covariate adjustment based on the
LASSO. For each such estimator, we show that it leads to improvements in
precision relative to the unadjusted difference-in-means estimator and also
provide conditions under which it leads to the "optimal" nonparametric,
covariate adjustment. A simulation study confirms the practical relevance of
our theoretical analysis, and the methods are employed to reanalyze data from
an experiment using a "matched pairs" design to study the effect of
macroinsurance on microenterprise.
arXiv link: http://arxiv.org/abs/2302.04380v3
Consider or Choose? The Role and Power of Consideration Sets
customers often form consideration sets in the first stage and then use a
second-stage choice mechanism to select the product with the highest utility.
While many recent studies aim to improve choice models by incorporating more
sophisticated second-stage choice mechanisms, this paper takes a step back and
goes into the opposite extreme. We simplify the second-stage choice mechanism
to its most basic form and instead focus on modeling customer choice by
emphasizing the role and power of the first-stage consideration set formation.
To this end, we study a model that is parameterized solely by a distribution
over consideration sets with a bounded rationality interpretation.
Intriguingly, we show that this model is characterized by the axiom of
symmetric demand cannibalization, enabling complete statistical identification.
The latter finding highlights the critical role of consideration sets in the
identifiability of two-stage choice models. We also examine the model's
implications for assortment planning, proving that the optimal assortment is
revenue-ordered within each partition block created by consideration sets.
Despite this compelling structure, we establish that the assortment problem
under this model is NP-hard even to approximate, highlighting how consideration
sets contribute to nontractability, even under the simplest uniform
second-stage choice mechanism. Finally, using real-world data, we show that the
model achieves prediction performance comparable to other advanced choice
models. Given the simplicity of the model's second-stage phase, this result
showcases the enormous power of first-stage consideration set formation in
capturing customers' decision-making processes.
arXiv link: http://arxiv.org/abs/2302.04354v4
High-Dimensional Granger Causality for Climatic Attribution
autoregressive models (VARs) to disentangle and interpret the complex causal
chains linking radiative forcings and global temperatures. By allowing for high
dimensionality in the model, we can enrich the information set with relevant
natural and anthropogenic forcing variables to obtain reliable causal
relations. This provides a step forward from existing climatology literature,
which has mostly treated these variables in isolation in small models.
Additionally, our framework allows to disregard the order of integration of the
variables by directly estimating the VAR in levels, thus avoiding accumulating
biases coming from unit-root and cointegration tests. This is of particular
appeal for climate time series which are well known to contain stochastic
trends and long memory. We are thus able to establish causal networks linking
radiative forcings to global temperatures and to connect radiative forcings
among themselves, thereby allowing for tracing the path of dynamic causal
effects through the system.
arXiv link: http://arxiv.org/abs/2302.03996v2
Reevaluating the Taylor Rule with Machine Learning
nonlinear method, such that its estimated federal funds rates match those
actually previously implemented by the Federal Reserve Bank. In the linear
method, this paper uses an OLS regression model to find more accurate
coefficients within the same Taylor Rule equation in which the dependent
variable is the federal funds rate, and the independent variables are the
inflation rate, the inflation gap, and the output gap. The intercept in the OLS
regression model would capture the constant equilibrium target real interest
rate set at 2. The linear OLS method suggests that the Taylor Rule
overestimates the output gap and standalone inflation rate's coefficients for
the Taylor Rule. The coefficients this paper suggests are shown in equation
(2). In the nonlinear method, this paper uses a machine learning system in
which the two inputs are the inflation rate and the output gap and the output
is the federal funds rate. This system utilizes gradient descent error
minimization to create a model that minimizes the error between the estimated
federal funds rate and the actual previously implemented federal funds rate.
Since the machine learning system allows the model to capture the more
realistic nonlinear relationship between the variables, it significantly
increases the estimation accuracy as a result. The actual and estimated federal
funds rates are almost identical besides three recessions caused by bubble
bursts, which the paper addresses in the concluding remarks. Overall, the first
method provides theoretical insight while the second suggests a model with
improved applicability.
arXiv link: http://arxiv.org/abs/2302.08323v1
Covariate Adjustment in Stratified Experiments
effect in stratified experiments. We work in a general framework that includes
matched tuples designs, coarse stratification, and complete randomization as
special cases. Regression adjustment with treatment-covariate interactions is
known to weakly improve efficiency for completely randomized designs. By
contrast, we show that for stratified designs such regression estimators are
generically inefficient, potentially even increasing estimator variance
relative to the unadjusted benchmark. Motivated by this result, we derive the
asymptotically optimal linear covariate adjustment for a given stratification.
We construct several feasible estimators that implement this efficient
adjustment in large samples. In the special case of matched pairs, for example,
the regression including treatment, covariates, and pair fixed effects is
asymptotically optimal. We also provide novel asymptotically exact inference
methods that allow researchers to report smaller confidence intervals, fully
reflecting the efficiency gains from both stratification and adjustment.
Simulations and an empirical application demonstrate the value of our proposed
methods.
arXiv link: http://arxiv.org/abs/2302.03687v4
High-Dimensional Conditionally Gaussian State Space Models with Missing Data
patterns and a large number of missing observations in conditionally Gaussian
state space models. Two important examples are dynamic factor models with
unbalanced datasets and large Bayesian VARs with variables in multiple
frequencies. A key insight underlying the proposed approach is that the joint
distribution of the missing data conditional on the observed data is Gaussian.
Moreover, the inverse covariance or precision matrix of this conditional
distribution is sparse, and this special structure can be exploited to
substantially speed up computations. We illustrate the methodology using two
empirical applications. The first application combines quarterly, monthly and
weekly data using a large Bayesian VAR to produce weekly GDP estimates. In the
second application, we extract latent factors from unbalanced datasets
involving over a hundred monthly variables via a dynamic factor model with
stochastic volatility.
arXiv link: http://arxiv.org/abs/2302.03172v1
Extensions for Inference in Difference-in-Differences with Few Treated Clusters
estimators are not consistent, and are not generally asymptotically normal.
This poses relevant challenges for inference. While there are inference methods
that are valid in these settings, some of these alternatives are not readily
available when there is variation in treatment timing and heterogeneous
treatment effects; or for deriving uniform confidence bands for event-study
plots. We present alternatives in settings with few treated units that are
valid with variation in treatment timing and/or that allow for uniform
confidence bands.
arXiv link: http://arxiv.org/abs/2302.03131v1
Asymptotic Representations for Sequential Decisions, Adaptive Experiments, and Batched Bandits
estimation and inference problems, adaptive randomized controlled trials, and
related settings. In batched adaptive settings where the decision at one stage
can affect the observation of variables in later stages, our asymptotic
representation characterizes all limit distributions attainable through a joint
choice of an adaptive design rule and statistics applied to the adaptively
generated data. This facilitates local power analysis of tests, comparison of
adaptive treatments rules, and other analyses of batchwise sequential
statistical decision rules.
arXiv link: http://arxiv.org/abs/2302.03117v2
Asymptotically Optimal Fixed-Budget Best Arm Identification with Variance-Dependent Bounds
minimizing expected simple regret. In an adaptive experiment, a decision maker
draws one of multiple treatment arms based on past observations and observes
the outcome of the drawn arm. After the experiment, the decision maker
recommends the treatment arm with the highest expected outcome. We evaluate the
decision based on the expected simple regret, which is the difference between
the expected outcomes of the best arm and the recommended arm. Due to inherent
uncertainty, we evaluate the regret using the minimax criterion. First, we
derive asymptotic lower bounds for the worst-case expected simple regret, which
are characterized by the variances of potential outcomes (leading factor).
Based on the lower bounds, we propose the Two-Stage (TS)-Hirano-Imbens-Ridder
(HIR) strategy, which utilizes the HIR estimator (Hirano et al., 2003) in
recommending the best arm. Our theoretical analysis shows that the TS-HIR
strategy is asymptotically minimax optimal, meaning that the leading factor of
its worst-case expected simple regret matches our derived worst-case lower
bound. Additionally, we consider extensions of our method, such as the
asymptotic optimality for the probability of misidentification. Finally, we
validate the proposed method's effectiveness through simulations.
arXiv link: http://arxiv.org/abs/2302.02988v2
In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation
applications -- thus, before deploying a model estimating such effects in
practice, one needs to be sure that the best candidate from the ever-growing
machine learning toolbox for this task was chosen. Unfortunately, due to the
absence of counterfactual information in practice, it is usually not possible
to rely on standard validation metrics for doing so, leading to a well-known
model selection dilemma in the treatment effect estimation literature. While
some solutions have recently been investigated, systematic understanding of the
strengths and weaknesses of different model selection criteria is still
lacking. In this paper, instead of attempting to declare a global `winner', we
therefore empirically investigate success- and failure modes of different
selection criteria. We highlight that there is a complex interplay between
selection strategies, candidate estimators and the data used for comparing
them, and provide interesting insights into the relative (dis)advantages of
different criteria alongside desiderata for the design of further illuminating
empirical studies in this context.
arXiv link: http://arxiv.org/abs/2302.02923v2
Penalized Quasi-likelihood Estimation and Model Selection in Time Series Models with Parameters on the Boundary
estimation and model-selection to statistical and econometric models which
allow for non-negativity constraints on some or all of the parameters, as well
as time-series dependence. It differs from classic non-penalized likelihood
estimation, where limiting distributions of likelihood-based estimators and
test-statistics are non-standard, and depend on the unknown number of
parameters on the boundary of the parameter space. Specifically, we establish
that the joint model selection and estimation, results in standard asymptotic
Gaussian distributed estimators. The results are applied to the rich class of
autoregressive conditional heteroskedastic (ARCH) models for the modelling of
time-varying volatility. We find from simulations that the penalized estimation
and model-selection works surprisingly well even for a large number of
parameters. A simple empirical illustration for stock-market returns data
confirms the ability of the penalized estimation to select ARCH models which
fit nicely the autocorrelation function, as well as confirms the stylized fact
of long-memory in financial time series data.
arXiv link: http://arxiv.org/abs/2302.02867v1
Out of Sample Predictability in Predictive Regressions with Many Predictor Candidates
predictability in linear predictive regressions with a potentially large set of
candidate predictors. We propose a procedure based on out of sample MSE
comparisons that is implemented in a pairwise manner using one predictor at a
time and resulting in an aggregate test statistic that is standard normally
distributed under the global null hypothesis of no linear predictability.
Predictors can be highly persistent, purely stationary or a combination of
both. Upon rejection of the null hypothesis we subsequently introduce a
predictor screening procedure designed to identify the most active predictors.
An empirical application to key predictors of US economic activity illustrates
the usefulness of our methods and highlights the important forward looking role
played by the series of manufacturing new orders.
arXiv link: http://arxiv.org/abs/2302.02866v2
Testing Quantile Forecast Optimality
output of many financial institutions, central banks and international
organisations. This paper proposes misspecification tests for such quantile
forecasts that assess optimality over a set of multiple forecast horizons
and/or quantiles. The tests build on multiple Mincer-Zarnowitz quantile
regressions cast in a moment equality framework. Our main test is for the null
hypothesis of autocalibration, a concept which assesses optimality with respect
to the information contained in the forecasts themselves. We provide an
extension that allows to test for optimality with respect to larger information
sets and a multivariate extension. Importantly, our tests do not just inform
about general violations of optimality, but may also provide useful insights
into specific forms of sub-optimality. A simulation study investigates the
finite sample performance of our tests, and two empirical applications to
financial returns and U.S. macroeconomic series illustrate that our tests can
yield interesting insights into quantile forecast sub-optimality and its
causes.
arXiv link: http://arxiv.org/abs/2302.02747v2
Estimating Time-Varying Networks for High-Dimensional Time Series
series, using the large VAR model framework with both the transition and
(error) precision matrices evolving smoothly over time. Two types of
time-varying graphs are investigated: one containing directed edges of Granger
causality linkages, and the other containing undirected edges of partial
correlation linkages. Under the sparse structural assumption, we propose a
penalised local linear method with time-varying weighted group LASSO to jointly
estimate the transition matrices and identify their significant entries, and a
time-varying CLIME method to estimate the precision matrices. The estimated
transition and precision matrices are then used to determine the time-varying
network structures. Under some mild conditions, we derive the theoretical
properties of the proposed estimates including the consistency and oracle
properties. In addition, we extend the methodology and theory to cover
highly-correlated large-scale time series, for which the sparsity assumption
becomes invalid and we allow for common factors before estimating the
factor-adjusted time-varying networks. We provide extensive simulation studies
and an empirical application to a large U.S. macroeconomic dataset to
illustrate the finite-sample performance of our methods.
arXiv link: http://arxiv.org/abs/2302.02476v1
Testing for Structural Change under Nonstationarity
to the main limit results of the econometric framework for structural break
testing in predictive regression models based on the OLS-Wald and IVX-Wald test
statistics, developed by Katsouris C (2021). In particular, we derive the
asymptotic distributions of the test statistics when the predictive regression
model includes either mildly integrated or persistent regressors. Moreover, we
consider the case in which a model intercept is included in the model vis-a-vis
the case that the predictive regression model has no model intercept. In a
subsequent version of this study we reexamine these particular aspects in more
depth with respect to the demeaned versions of the variables of the predictive
regression.
arXiv link: http://arxiv.org/abs/2302.02370v1
Using bayesmixedlogit and bayesmixedlogitwtp in Stata
bayesmixedlogitwtp Stata packages. It mirrors closely the helpfile obtainable
in Stata (i.e., through help bayesmixedlogit or help bayesmixedlogitwtp).
Further background for the packages can be found in Baker(2014).
arXiv link: http://arxiv.org/abs/2302.01775v1
Agreed and Disagreed Uncertainty
macroeconomic uncertainty based on the forecast error variance have two
distinct drivers: the variance of the economic shock and the variance of the
information dispersion. The former driver increases uncertainty and reduces
agents' disagreement (agreed uncertainty). The latter increases both
uncertainty and disagreement (disagreed uncertainty). We use these implications
to identify empirically the effects of agreed and disagreed uncertainty shocks,
based on a novel measure of consumer disagreement derived from survey
expectations. Disagreed uncertainty has no discernible economic effects and is
benign for economic activity, but agreed uncertainty exerts significant
depressing effects on a broad spectrum of macroeconomic indicators.
arXiv link: http://arxiv.org/abs/2302.01621v1
Inference in Non-stationary High-Dimensional VARs
high-dimensional non-stationary vector autoregressive (VAR) models. Our method
does not require knowledge of the order of integration of the time series under
consideration. We augment the VAR with at least as many lags as the suspected
maximum order of integration, an approach which has been proven to be robust
against the presence of unit roots in low dimensions. We prove that we can
restrict the augmentation to only the variables of interest for the testing,
thereby making the approach suitable for high dimensions. We combine this lag
augmentation with a post-double-selection procedure in which a set of initial
penalized regressions is performed to select the relevant variables for both
the Granger causing and caused variables. We then establish uniform asymptotic
normality of a second-stage regression involving only the selected variables.
Finite sample simulations show good performance, an application to investigate
the (predictive) causes and effects of economic uncertainty illustrates the
need to allow for unknown orders of integration.
arXiv link: http://arxiv.org/abs/2302.01434v2
A Machine Learning Approach to Measuring Climate Adaptation
short-run and long-run changes in damaging weather. I propose a debiased
machine learning approach to flexibly measure these elasticities in panel
settings. In a simulation exercise, I show that debiased machine learning has
considerable benefits relative to standard machine learning or ordinary least
squares, particularly in high-dimensional settings. I then measure adaptation
to damaging heat exposure in United States corn and soy production. Using rich
sets of temperature and precipitation variation, I find evidence that short-run
impacts from damaging heat are significantly offset in the long run. I show
that this is because the impacts of long-run changes in heat exposure do not
follow the same functional form as short-run shocks to heat exposure.
arXiv link: http://arxiv.org/abs/2302.01236v1
Sparse High-Dimensional Vector Autoregressive Bootstrap
based on capturing dependence through a sparsely estimated vector
autoregressive model. We prove its consistency for inference on
high-dimensional means under two different moment assumptions on the errors,
namely sub-gaussian moments and a finite number of absolute moments. In
establishing these results, we derive a Gaussian approximation for the maximum
mean of a linear process, which may be of independent interest.
arXiv link: http://arxiv.org/abs/2302.01233v2
Regression Adjustment, Cross-Fitting, and Randomized Experiments with Many Controls
randomized experiments with many covariates, under a design-based framework
with a deterministic number of treated units. We show that a simple yet
powerful cross-fitted regression adjustment achieves bias-correction and leads
to sharper asymptotic properties than existing alternatives. Specifically, we
derive higher-order stochastic expansions, analyze associated inference
procedures, and propose a modified HC3 variance estimator that accounts for up
to second-order. Our analysis reveals that cross-fitting permits substantially
faster growth in the covariate dimension $p$ relative to sample size $n$, with
asymptotic normality holding under favorable designs when $p = o(n^{3/4}/(\log
n)^{1/2})$, improving on standard rates. We also explain and address the poor
size performance of conventional variance estimators. The methodology extends
naturally to stratified experiments with many strata. Simulations confirm that
the cross-fitted estimator, combined with the modified HC3, delivers accurate
estimation and reliable inference across diverse designs.
arXiv link: http://arxiv.org/abs/2302.00469v4
Adaptive hedging horizon and hedging performance estimation
mode decomposition (EMD) method to extract the adaptive hedging horizon and
build a time series cross-validation method for robust hedging performance
estimation. Basing on the variance reduction criterion and the value-at-risk
(VaR) criterion, we find that the estimation of in-sample hedging performance
is inconsistent with that of the out-sample hedging performance. The EMD
hedging method family exhibits superior performance on the VaR criterion
compared with the minimum variance hedging method. The matching degree of the
spot and futures contracts at the specific time scale is the key determinant of
the hedging performance in the corresponding hedging horizon.
arXiv link: http://arxiv.org/abs/2302.00251v1
Real Estate Property Valuation using Self-Supervised Vision Transformers
growing in recent years. In this paper, we propose a new method for property
valuation that utilizes self-supervised vision transformers, a recent
breakthrough in computer vision and deep learning. Our proposed algorithm uses
a combination of machine learning, computer vision and hedonic pricing models
trained on real estate data to estimate the value of a given property. We
collected and pre-processed a data set of real estate properties in the city of
Boulder, Colorado and used it to train, validate and test our algorithm. Our
data set consisted of qualitative images (including house interiors, exteriors,
and street views) as well as quantitative features such as the number of
bedrooms, bathrooms, square footage, lot square footage, property age, crime
rates, and proximity to amenities. We evaluated the performance of our model
using metrics such as Root Mean Squared Error (RMSE). Our findings indicate
that these techniques are able to accurately predict the value of properties,
with a low RMSE. The proposed algorithm outperforms traditional appraisal
methods that do not leverage property images and has the potential to be used
in real-world applications.
arXiv link: http://arxiv.org/abs/2302.00117v1
Factor Model of Mixtures
response variable conditioned on observing some factors. The proposed approach
possesses desirable properties of flexibility, interpretability, tractability
and extendability. The conditional quantile function is modeled by a mixture
(weighted sum) of basis quantile functions, with the weights depending on
factors. The calibration problem is formulated as a convex optimization
problem. It can be viewed as conducting quantile regressions for all confidence
levels simultaneously while avoiding quantile crossing by definition. The
calibration problem is equivalent to minimizing the continuous ranked
probability score (CRPS). Based on the canonical polyadic (CP) decomposition of
tensors, we propose a dimensionality reduction method that reduces the rank of
the parameter tensor and propose an alternating algorithm for estimation.
Additionally, based on Risk Quadrangle framework, we generalize the approach to
conditional distributions defined by Conditional Value-at-Risk (CVaR),
expectile and other functions of uncertainty measures. Although this paper
focuses on using splines as the weight functions, it can be extended to neural
networks. Numerical experiments demonstrate the effectiveness of our approach.
arXiv link: http://arxiv.org/abs/2301.13843v2
On Using The Two-Way Cluster-Robust Standard Errors
errors. However, the recent econometrics literature points out the potential
non-gaussianity of two-way cluster sample means, and thus invalidity of the
inference based on the TWCR standard errors. Fortunately, simulation studies
nonetheless show that the gaussianity is rather common than exceptional. This
paper provides theoretical support for this encouraging observation.
Specifically, we derive a novel central limit theorem for two-way clustered
triangular arrays that justifies the use of the TWCR under very mild and
interpretable conditions. We, therefore, hope that this paper will provide a
theoretical justification for the legitimacy of most, if not all, of the
thousands of those empirical papers that have used the TWCR standard errors. We
provide a guide in practice as to when a researcher can employ the TWCR
standard errors.
arXiv link: http://arxiv.org/abs/2301.13775v1
Approximate Functional Differencing
fixed effects is a classic example of Neyman and Scott's (1948) incidental
parameter problem (IPP). One solution to this IPP is functional differencing
(Bonhomme 2012), which works when the number of time periods T is fixed (and
may be small), but this solution is not applicable to all panel data models of
interest. Another solution, which applies to a larger class of models, is
"large-T" bias correction (pioneered by Hahn and Kuersteiner 2002 and Hahn and
Newey 2004), but this is only guaranteed to work well when T is sufficiently
large. This paper provides a unified approach that connects those two seemingly
disparate solutions to the IPP. In doing so, we provide an approximate version
of functional differencing, that is, an approximate solution to the IPP that is
applicable to a large class of panel data models even when T is relatively
small.
arXiv link: http://arxiv.org/abs/2301.13736v2
Bridging the Covid-19 Data and the Epidemiological Model using Time-Varying Parameter SIRD Model
allow for time-varying parameters for real-time measurement and prediction of
the trajectory of the Covid-19 pandemic. Time variation in model parameters is
captured using the generalized autoregressive score modeling structure designed
for the typical daily count data related to the pandemic. The resulting
specification permits a flexible yet parsimonious model with a low
computational cost. The model is extended to allow for unreported cases using a
mixed-frequency setting. Results suggest that these cases' effects on the
parameter estimates might be sizeable. Full sample results show that the
flexible framework accurately captures the successive waves of the pandemic. A
real-time exercise indicates that the proposed structure delivers timely and
precise information on the pandemic's current stance. This superior
performance, in turn, transforms into accurate predictions of the confirmed and
death cases.
arXiv link: http://arxiv.org/abs/2301.13692v1
Nonlinearities in Macroeconomic Tail Risk through the Lens of Big Data Quantile Regressions
the selection of appropriate covariates and/or possible forms of nonlinearities
are key in obtaining precise forecasts. In this paper, our focus is on using
large datasets in quantile regression models to forecast the conditional
distribution of US GDP growth. To capture possible non-linearities, we include
several nonlinear specifications. The resulting models will be huge dimensional
and we thus rely on a set of shrinkage priors. Since Markov Chain Monte Carlo
estimation becomes slow in these dimensions, we rely on fast variational Bayes
approximations to the posterior distribution of the coefficients and the latent
states. We find that our proposed set of models produces precise forecasts.
These gains are especially pronounced in the tails. Using Gaussian processes to
approximate the nonlinear component of the model further improves the good
performance, in particular in the right tail.
arXiv link: http://arxiv.org/abs/2301.13604v2
STEEL: Singularity-aware Reinforcement Learning
find an optimal policy that maximizes the expected total rewards in a dynamic
environment. The existing methods require absolutely continuous assumption
(e.g., there do not exist non-overlapping regions) on the distribution induced
by target policies with respect to the data distribution over either the state
or action or both. We propose a new batch RL algorithm that allows for
singularity for both state and action spaces (e.g., existence of
non-overlapping regions between offline data distribution and the distribution
induced by the target policies) in the setting of an infinite-horizon Markov
decision process with continuous states and actions. We call our algorithm
STEEL: SingulariTy-awarE rEinforcement Learning. Our algorithm is motivated by
a new error analysis on off-policy evaluation, where we use maximum mean
discrepancy, together with distributionally robust optimization, to
characterize the error of off-policy evaluation caused by the possible
singularity and to enable model extrapolation. By leveraging the idea of
pessimism and under some technical conditions, we derive a first finite-sample
regret guarantee for our proposed algorithm under singularity. Compared with
existing algorithms,by requiring only minimal data-coverage assumption, STEEL
improves the applicability and robustness of batch RL. In addition, a two-step
adaptive STEEL, which is nearly tuning-free, is proposed. Extensive simulation
studies and one (semi)-real experiment on personalized pricing demonstrate the
superior performance of our methods in dealing with possible singularity in
batch RL.
arXiv link: http://arxiv.org/abs/2301.13152v5
Prediction of Customer Churn in Banking Industry
follow customer retention strategies while they are trying to increase their
market share by acquiring new customers. This study compares the performance of
six supervised classification techniques to suggest an efficient model to
predict customer churn in banking industry, given 10 demographic and personal
attributes from 10000 customers of European banks. The effect of feature
selection, class imbalance, and outliers will be discussed for ANN and random
forest as the two competing models. As shown, unlike random forest, ANN does
not reveal any serious concern regarding overfitting and is also robust to
noise. Therefore, ANN structure with five nodes in a single hidden layer is
recognized as the best performing classifier.
arXiv link: http://arxiv.org/abs/2301.13099v1
Machine Learning with High-Cardinality Categorical Features in Actuarial Applications
occupation in commercial property insurance). Standard categorical encoding
methods like one-hot encoding are inadequate in these settings.
In this work, we present a novel _Generalised Linear Mixed Model Neural
Network_ ("GLMMNet") approach to the modelling of high-cardinality categorical
features. The GLMMNet integrates a generalised linear mixed model in a deep
learning framework, offering the predictive power of neural networks and the
transparency of random effects estimates, the latter of which cannot be
obtained from the entity embedding models. Further, its flexibility to deal
with any distribution in the exponential dispersion (ED) family makes it widely
applicable to many actuarial contexts and beyond.
We illustrate and compare the GLMMNet against existing approaches in a range
of simulation experiments as well as in a real-life insurance case study.
Notably, we find that the GLMMNet often outperforms or at least performs
comparably with an entity embedded neural network, while providing the
additional benefit of transparency, which is particularly valuable in practical
applications.
Importantly, while our model was motivated by actuarial applications, it can
have wider applicability. The GLMMNet would suit any applications that involve
high-cardinality categorical variables and where the response cannot be
sufficiently modelled by a Gaussian distribution.
arXiv link: http://arxiv.org/abs/2301.12710v1
A Note on the Estimation of Job Amenities and Labor Productivity
amenities and labor productivity in a single matching market based on the
observation of equilibrium matches and wages. The estimation procedure
simultaneously fits both the matching patterns and the wage curve. While our
estimator is suited for a wide range of assignment problems, we provide an
application to the estimation of the Value of a Statistical Life using
compensating wage differentials for the risk of fatal injury on the job. Using
US data for 2017, we estimate the Value of Statistical Life at $ 6.3 million
($2017).
arXiv link: http://arxiv.org/abs/2301.12542v1
Multidimensional dynamic factor models
data. In doing so, it develops an interpretable technique to study complex
information sources ranging from repeated surveys with a varying number of
respondents to panels of satellite images. We specialise our results to model
microeconomic data on US households jointly with macroeconomic aggregates. This
results in a powerful tool able to generate localised predictions,
counterfactuals and impulse response functions for individual households,
accounting for traditional time-series complexities depicted in the state-space
literature. The model is also compatible with the growing focus of policymakers
for real-time economic analysis as it is able to process observations online,
while handling missing values and asynchronous data releases.
arXiv link: http://arxiv.org/abs/2301.12499v1
Synthetic Difference In Differences Estimation
difference-in-differences (SDID) estimator of Arkhangelsky et al. (2021) for
Stata. Synthetic difference-in-differences can be used in a wide class of
circumstances where treatment effects on some particular policy or event are
desired, and repeated observations on treated and untreated units are available
over time. We lay out the theory underlying SDID, both when there is a single
treatment adoption date and when adoption is staggered over time, and discuss
estimation and inference in each of these cases. We introduce the sdid command
which implements these methods in Stata, and provide a number of examples of
use, discussing estimation, inference, and visualization of results.
arXiv link: http://arxiv.org/abs/2301.11859v3
Simple Difference-in-Differences Estimation in Fixed-T Panels
when the number of time periods is small, and the parallel trends condition
holds conditional on covariates and unobserved heterogeneity in the form of
interactive fixed effects. The estimator also allow the control variables to be
affected by treatment and it enables estimation of the resulting indirect
effect on the outcome variable. The asymptotic properties of the estimator are
established and their accuracy in small samples is investigated using Monte
Carlo simulations. The empirical usefulness of the estimator is illustrated
using as an example the effect of increased trade competition on firm markups
in China.
arXiv link: http://arxiv.org/abs/2301.11358v2
Automatic Debiased Estimation with Machine Learning-Generated Regressors
generated regressors. Examples in economics include structural parameters in
models with endogenous variables estimated by control functions and in models
with sample selection, treatment effect estimation with propensity score
matching, and marginal treatment effects. More recently, Machine Learning (ML)
generated regressors are becoming ubiquitous for these and other applications
such as imputation with missing regressors, dimension reduction, including
autoencoders, learned proxies, confounders and treatments, and for feature
engineering with unstructured data, among others. We provide the first general
method for valid inference with regressors generated from ML. Inference with
generated regressors is complicated by the very complex expression for
influence functions and asymptotic variances. Additionally, ML-generated
regressors may lead to large biases in downstream inferences. To address these
problems, we propose Automatic Locally Robust/debiased GMM estimators in a
general three-step setting with ML-generated regressors. We illustrate our
results with treatment effects and counterfactual parameters in the partially
linear and nonparametric models with ML-generated regressors. We provide
sufficient conditions for the asymptotic normality of our debiased GMM
estimators and investigate their finite-sample performance through Monte Carlo
simulations.
arXiv link: http://arxiv.org/abs/2301.10643v3
Hierarchical Regularizers for Reverse Unrestricted Mixed Data Sampling Regressions
model high-frequency responses by means of low-frequency variables. However,
due to the periodic structure of RU-MIDAS regressions, the dimensionality grows
quickly if the frequency mismatch between the high- and low-frequency variables
is large. Additionally the number of high-frequency observations available for
estimation decreases. We propose to counteract this reduction in sample size by
pooling the high-frequency coefficients and further reduce the dimensionality
through a sparsity-inducing convex regularizer that accounts for the temporal
ordering among the different lags. To this end, the regularizer prioritizes the
inclusion of lagged coefficients according to the recency of the information
they contain. We demonstrate the proposed method on two empirical applications,
one on realized volatility forecasting with macroeconomic data and another on
demand forecasting for a bicycle-sharing system with ridership data on other
transportation types.
arXiv link: http://arxiv.org/abs/2301.10592v2
Sequential Bayesian Learning for Hidden Semi-Markov Models
flexible extension of the popular Hidden Markov Model (HMM) that allows the
underlying stochastic process to be a semi-Markov chain. HSMMs are typically
used less frequently than their basic HMM counterpart due to the increased
computational challenges when evaluating the likelihood function. Moreover,
while both models are sequential in nature, parameter estimation is mainly
conducted via batch estimation methods. Thus, a major motivation of this paper
is to provide methods to estimate HSMMs (1) in a computationally feasible time,
(2) in an exact manner, i.e. only subject to Monte Carlo error, and (3) in a
sequential setting. We provide and verify an efficient computational scheme for
Bayesian parameter estimation on HSMMs. Additionally, we explore the
performance of HSMMs on the VIX time series using Autoregressive (AR) models
with hidden semi-Markov states and demonstrate how this algorithm can be used
for regime switching, model selection and clustering purposes.
arXiv link: http://arxiv.org/abs/2301.10494v1
Processes analogous to ecological interactions and dispersal shape the dynamics of economic activities
dynamics of biological communities, and analogous eco-evolutionary processes
acting upon economic entities have been proposed to explain economic change.
This hypothesis is compelling because it explains economic change through
endogenous mechanisms, but it has not been quantitatively tested at the global
economy level. Here, we use an inverse modelling technique and 59 years of
economic data covering 77 countries to test whether the collective dynamics of
national economic activities can be characterised by eco-evolutionary
processes. We estimate the statistical support of dynamic community models in
which the dynamics of economic activities are coupled with positive and
negative interactions between the activities, the spatial dispersal of the
activities, and their transformations into other economic activities. We find
strong support for the models capturing positive interactions between economic
activities and spatial dispersal of the activities across countries. These
results suggest that processes akin to those occurring in ecosystems play a
significant role in the dynamics of economic systems. The strength-of-evidence
obtained for each model varies across countries and may be caused by
differences in the distance between countries, specific institutional contexts,
and historical contingencies. Overall, our study provides a new quantitative,
biologically inspired framework to study the forces shaping economic change.
arXiv link: http://arxiv.org/abs/2301.09486v1
ddml: Double/debiased machine learning in Stata
Stata. Estimators of causal parameters for five different econometric models
are supported, allowing for flexible estimation of causal effects of endogenous
variables in settings with unknown functional forms and/or many exogenous
variables. ddml is compatible with many existing supervised machine learning
programs in Stata. We recommend using DDML in combination with stacking
estimation which combines multiple machine learners into a final predictor. We
provide Monte Carlo evidence to support our recommendation.
arXiv link: http://arxiv.org/abs/2301.09397v3
Revisiting Panel Data Discrete Choice Models with Lagged Dependent Variables
semiparametric (distribution-free) panel data binary choice models with lagged
dependent variables, exogenous covariates, and entity fixed effects. We provide
a novel identification strategy, using an "identification at infinity"
argument. In contrast with the celebrated Honore and Kyriazidou (2000), our
method permits time trends of any form and does not suffer from the "curse of
dimensionality". We propose an easily implementable conditional maximum score
estimator. The asymptotic properties of the proposed estimator are fully
characterized. A small-scale Monte Carlo study demonstrates that our approach
performs satisfactorily in finite samples. We illustrate the usefulness of our
method by presenting an empirical application to enrollment in private hospital
insurance using the Household, Income and Labour Dynamics in Australia (HILDA)
Survey data.
arXiv link: http://arxiv.org/abs/2301.09379v5
Labor Income Risk and the Cross-Section of Expected Returns
sectoral shifts. I proxy for this risk using cross-industry dispersion (CID),
defined as a mean absolute deviation of returns of 49 industry portfolios. CID
peaks during periods of accelerated sectoral reallocation and heightened
uncertainty. I find that expected stock returns are related cross-sectionally
to the sensitivities of returns to innovations in CID. Annualized returns of
the stocks with high sensitivity to CID are 5.9% lower than the returns of the
stocks with low sensitivity. Abnormal returns with respect to the best factor
model are 3.5%, suggesting that common factors can not explain this return
spread. Stocks with high sensitivity to CID are likely to be the stocks, which
benefited from sectoral shifts. CID positively predicts unemployment through
its long-term component, consistent with the hypothesis that CID is a proxy for
unemployment risk from sectoral shifts.
arXiv link: http://arxiv.org/abs/2301.09173v1
Inference for Two-stage Experiments under Covariate-Adaptive Randomization
covariate-adaptive randomization. In the initial stage of this experimental
design, clusters (e.g., households, schools, or graph partitions) are
stratified and randomly assigned to control or treatment groups based on
cluster-level covariates. Subsequently, an independent second-stage design is
carried out, wherein units within each treated cluster are further stratified
and randomly assigned to either control or treatment groups, based on
individual-level covariates. Under the homogeneous partial interference
assumption, I establish conditions under which the proposed
difference-in-“average of averages” estimators are consistent and
asymptotically normal for the corresponding average primary and spillover
effects and develop consistent estimators of their asymptotic variances.
Combining these results establishes the asymptotic validity of tests based on
these estimators. My findings suggest that ignoring covariate information in
the design stage can result in efficiency loss, and commonly used inference
methods that ignore or improperly use covariate information can lead to either
conservative or invalid inference. Then, I apply these results to studying
optimal use of covariate information under covariate-adaptive randomization in
large samples, and demonstrate that a specific generalized matched-pair design
achieves minimum asymptotic variance for each proposed estimator. Finally, I
discuss covariate adjustment, which incorporates additional baseline covariates
not used for treatment assignment. The practical relevance of the theoretical
results is illustrated through a simulation study and an empirical application.
arXiv link: http://arxiv.org/abs/2301.09016v6
A Practical Introduction to Regression Discontinuity Designs: Extensions
and Titiunik (2020), collects and expands the instructional materials we
prepared for more than $50$ short courses and workshops on Regression
Discontinuity (RD) methodology that we taught between 2014 and 2023. In this
second monograph, we discuss several topics in RD methodology that build on and
extend the analysis of RD designs introduced in Cattaneo, Idrobo and Titiunik
(2020). Our first goal is to present an alternative RD conceptual framework
based on local randomization ideas. This methodological approach can be useful
in RD designs with discretely-valued scores, and can also be used more broadly
as a complement to the continuity-based approach in other settings. Then,
employing both continuity-based and local randomization approaches, we extend
the canonical Sharp RD design in multiple directions: fuzzy RD designs, RD
designs with discrete scores, and multi-dimensional RD designs. The goal of our
two-part monograph is purposely practical and hence we focus on the empirical
analysis of RD designs.
arXiv link: http://arxiv.org/abs/2301.08958v2
Composite distributions in the social sciences: A comparative empirical study of firms' sales distribution for France, Germany, Italy, Japan, South Korea, and Spain
classical and recent literature to describe a relevant variable in the social
sciences and Economics, namely the firms' sales distribution in six countries
over an ample period. We find that the best results are obtained with mixtures
of lognormal (LN), loglogistic (LL), and log Student's $t$ (LSt) distributions.
The single lognormal, in turn, is strongly not selected. We then find that the
whole firm size distribution is better described by a mixture, and there exist
subgroups of firms. Depending on the method of measurement, the best fitting
distribution cannot be defined by a single one, but as a mixture of at least
three distributions or even four or five. We assess a full sample analysis, an
in-sample and out-of-sample analysis, and a doubly truncated sample analysis.
We also provide the formulation of the preferred models as solutions of the
Fokker--Planck or forward Kolmogorov equation.
arXiv link: http://arxiv.org/abs/2301.09438v1
From prosumer to flexumer: Case study on the value of flexibility in decarbonizing the multi-energy system of a manufacturing company
By using the flexibility of their multi-energy system (MES), they reduce costs
and carbon emissions while stabilizing the electricity system. However, to
identify the necessary investments in energy conversion and storage
technologies to leverage demand response (DR) potentials, companies need to
assess the value of flexibility. Therefore, this study quantifies the
flexibility value of a production company's MES by optimizing the synthesis,
design, and operation of a decarbonizing MES considering self-consumption
optimization, peak shaving, and integrated DR based on hourly prices and carbon
emission factors (CEFs). The detailed case study of a beverage company in
northern Germany considers vehicle-to-X of powered industrial trucks,
power-to-heat on multiple temperatures, wind turbines, photovoltaic systems,
and energy storage systems (thermal energy, electricity, and hydrogen). We
propose and apply novel data-driven metrics to evaluate the intensity of
price-based and CEF-based DR. The results reveal that flexibility usage reduces
decarbonization costs (by 19-80% depending on electricity and carbon removal
prices), total annual costs, operating carbon emissions, energy-weighted
average prices and CEFs, and fossil energy dependency. The results also suggest
that a net-zero operational carbon emission MES requires flexibility, which, in
an economic case, is provided by a combination of different flexible
technologies and storage systems that complement each other. While the value of
flexibility depends on various market and consumer-specific factors such as
electricity or carbon removal prices, this study highlights the importance of
demand flexibility for the decarbonization of MESs.
arXiv link: http://arxiv.org/abs/2301.07997v1
Digital Divide: Empirical Study of CIUS 2020
Central Bank Digital Currencies, understanding how demographic and geographic
factors influence public engagement with digital technologies becomes
increasingly important. This paper uses data from the 2020 Canadian Internet
Use Survey and employs survey-adapted Lasso inference methods to identify
individual socio-economic and demographic characteristics determining the
digital divide in Canada. We also introduce a score to measure and compare the
digital literacy of various segments of Canadian population. Our findings
reveal that disparities in the use of e.g. online banking, emailing, and
digital payments exist across different demographic and socio-economic groups.
In addition, we document the effects of COVID-19 pandemic on internet use in
Canada and describe changes in the characteristics of Canadian internet users
over the last decade.
arXiv link: http://arxiv.org/abs/2301.07855v3
An MCMC Approach to Classical Estimation
called the Laplace type estimators (LTE), which include means and quantiles of
Quasi-posterior distributions defined as transformations of general
(non-likelihood-based) statistical criterion functions, such as those in GMM,
nonlinear IV, empirical likelihood, and minimum distance methods. The approach
generates an alternative to classical extremum estimation and also falls
outside the parametric Bayesian approach. For example, it offers a new
attractive estimation method for such important semi-parametric problems as
censored and instrumental quantile, nonlinear GMM and value-at-risk models. The
LTE's are computed using Markov Chain Monte Carlo methods, which help
circumvent the computational curse of dimensionality. A large sample theory is
obtained for regular cases.
arXiv link: http://arxiv.org/abs/2301.07782v1
Optimal Transport for Counterfactual Estimation: A Method for Causal Inference
"what would have happened if...?" For example, "would the person have had
surgery if he or she had been Black?" To address this kind of questions,
calculating an average treatment effect (ATE) is often uninformative, because
one would like to know how much impact a variable (such as skin color) has on a
specific individual, characterized by certain covariates. Trying to calculate a
conditional ATE (CATE) seems more appropriate. In causal inference, the
propensity score approach assumes that the treatment is influenced by x, a
collection of covariates. Here, we will have the dual view: doing an
intervention, or changing the treatment (even just hypothetically, in a thought
experiment, for example by asking what would have happened if a person had been
Black) can have an impact on the values of x. We will see here that optimal
transport allows us to change certain characteristics that are influenced by
the variable we are trying to quantify the effect of. We propose here a mutatis
mutandis version of the CATE, which will be done simply in dimension one by
saying that the CATE must be computed relative to a level of probability,
associated to the proportion of x (a single covariate) in the control
population, and by looking for the equivalent quantile in the test population.
In higher dimension, it will be necessary to go through transport, and an
application will be proposed on the impact of some variables on the probability
of having an unnatural birth (the fact that the mother smokes, or that the
mother is Black).
arXiv link: http://arxiv.org/abs/2301.07755v1
Unconditional Quantile Partial Effects via Conditional Quantile Regression
unconditional quantile partial effects using quantile regression coefficients.
The estimator is based on an identification result showing that, for continuous
covariates, unconditional quantile effects are a weighted average of
conditional ones at particular quantile levels that depend on the covariates.
We propose a two-step estimator for the unconditional effects where in the
first step one estimates a structural quantile regression model, and in the
second step a nonparametric regression is applied to the first step
coefficients. We establish the asymptotic properties of the estimator, say
consistency and asymptotic normality. Monte Carlo simulations show numerical
evidence that the estimator has very good finite sample performance and is
robust to the selection of bandwidth and kernel. To illustrate the proposed
method, we study the canonical application of the Engel's curve, i.e. food
expenditures as a share of income.
arXiv link: http://arxiv.org/abs/2301.07241v4
Noisy, Non-Smooth, Non-Convex Estimation of Moment Condition Models
accurately minimize a sample objective function which is often non-smooth,
non-convex, or both. This paper proposes a simple algorithm designed to find
accurate solutions without performing an exhaustive search. It augments each
iteration from a new Gauss-Newton algorithm with a grid search step. A finite
sample analysis derives its optimization and statistical properties
simultaneously using only econometric assumptions. After a finite number of
iterations, the algorithm automatically transitions from global to fast local
convergence, producing accurate estimates with high probability. Simulated
examples and an empirical application illustrate the results.
arXiv link: http://arxiv.org/abs/2301.07196v3
Testing Firm Conduct
firm behavior. While researchers test conduct via model selection and
assessment, we present advantages of Rivers and Vuong (2002) (RV) model
selection under misspecification. However, degeneracy of RV invalidates
inference. With a novel definition of weak instruments for testing, we connect
degeneracy to instrument strength, derive weak instrument properties of RV, and
provide a diagnostic for weak instruments by extending the framework of Stock
and Yogo (2005) to model selection. We test vertical conduct (Villas-Boas,
2007) using common instrument sets. Some are weak, providing no power. Strong
instruments support manufacturers setting retail prices.
arXiv link: http://arxiv.org/abs/2301.06720v2
Resolving the Conflict on Conduct Parameter Estimation in Homogeneous Goods Markets between Bresnahan (1982) and Perloff and Shen (2012)
resolve the conflict between Bresnahan (1982) and Perloff and Shen (2012)
regarding the identification and the estimation of conduct parameters. We point
out that Perloff and Shen's (2012) proof is incorrect and its simulation
setting is invalid. Our simulation shows that estimation becomes accurate when
demand shifters are properly added in supply estimation and sample sizes are
increased, supporting Bresnahan (1982).
arXiv link: http://arxiv.org/abs/2301.06665v5
Statistical inference for the logarithmic spatial heteroskedasticity model with exogenous variables
large strand of literature, however, the investigation of spatial dependence in
variance is lagging significantly behind. The existing models for the spatial
dependence in variance are scarce, with neither probabilistic structure nor
statistical inference procedure being explored. To circumvent this deficiency,
this paper proposes a new generalized logarithmic spatial heteroscedasticity
model with exogenous variables (denoted by the log-SHE model) to study the
spatial dependence in variance. For the log-SHE model, its spatial near-epoch
dependence (NED) property is investigated, and a systematic statistical
inference procedure is provided, including the maximum likelihood and
generalized method of moments estimators, the Wald, Lagrange multiplier and
likelihood-ratio-type D tests for model parameter constraints, and the
overidentification test for the model diagnostic checking. Using the tool of
spatial NED, the asymptotics of all proposed estimators and tests are
established under regular conditions. The usefulness of the proposed
methodology is illustrated by simulation results and a real data example on the
house selling price.
arXiv link: http://arxiv.org/abs/2301.06658v1
Robust M-Estimation for Additive Single-Index Cointegrating Time Series Models
(LAD), quantile loss and Huber's loss, to construct its objective function, in
order to for example eschew the impact of outliers, whereas the difficulty in
analysing the resultant estimators rests on the nonsmoothness of these losses.
Generalized functions have advantages over ordinary functions in several
aspects, especially generalized functions possess derivatives of any order.
Generalized functions incorporate local integrable functions, the so-called
regular generalized functions, while the so-called singular generalized
functions (e.g. Dirac delta function) can be obtained as the limits of a
sequence of sufficient smooth functions, so-called regular sequence in
generalized function context. This makes it possible to use these singular
generalized functions through approximation. Nevertheless, a significant
contribution of this paper is to establish the convergence rate of regular
sequence to nonsmooth loss that answers a call from the relevant literature.
For parameter estimation where objective function may be nonsmooth, this paper
first shows as a general paradigm that how generalized function approach can be
used to tackle the nonsmooth loss functions in Section two using a very simple
model. This approach is of general interest and applicability. We further use
the approach in robust M-estimation for additive single-index cointegrating
time series models; the asymptotic theory is established for the proposed
estimators. We evaluate the finite-sample performance of the proposed
estimation method and theory by both simulated data and an empirical analysis
of predictive regression of stock returns.
arXiv link: http://arxiv.org/abs/2301.06631v1
When it counts -- Econometric identification of the basic factor model based on GLT structures
attention has been given to formally address identifiability of these models
beyond standard rotation-based identification such as the positive lower
triangular (PLT) constraint. To fill this gap, we review the advantages of
variance identification in sparse factor analysis and introduce the generalized
lower triangular (GLT) structures. We show that the GLT assumption is an
improvement over PLT without compromise: GLT is also unique but, unlike PLT, a
non-restrictive assumption. Furthermore, we provide a simple counting rule for
variance identification under GLT structures, and we demonstrate that within
this model class the unknown number of common factors can be recovered in an
exploratory factor analysis. Our methodology is illustrated for simulated data
in the context of post-processing posterior draws in Bayesian sparse factor
analysis.
arXiv link: http://arxiv.org/abs/2301.06354v1
Doubly-Robust Inference for Conditional Average Treatment Effects with High-Dimensional Controls
rely on controlling for a large number of variables to account for confounding
factors. In these high-dimensional settings, estimation of the CATE requires
estimating first-stage models whose consistency relies on correctly specifying
their parametric forms. While doubly-robust estimators of the CATE exist,
inference procedures based on the second stage CATE estimator are not
doubly-robust. Using the popular augmented inverse propensity weighting signal,
we propose an estimator for the CATE whose resulting Wald-type confidence
intervals are doubly-robust. We assume a logistic model for the propensity
score and a linear model for the outcome regression, and estimate the
parameters of these models using an $\ell_1$ (Lasso) penalty to address the
high dimensional covariates. Our proposed estimator remains consistent at the
nonparametric rate and our proposed pointwise and uniform confidence intervals
remain asymptotically valid even if one of the logistic propensity score or
linear outcome regression models are misspecified. These results are obtained
under similar conditions to existing analyses in the high-dimensional and
nonparametric literatures.
arXiv link: http://arxiv.org/abs/2301.06283v1
Identification in a Binary Choice Panel Data Model with a Predetermined Covariate
predetermined binary covariate (i.e., a covariate sequentially exogenous
conditional on lagged outcomes and covariates). The choice model is indexed by
a scalar parameter $\theta$, whereas the distribution of unit-specific
heterogeneity, as well as the feedback process that maps lagged outcomes into
future covariate realizations, are left unrestricted. We provide a simple
condition under which $\theta$ is never point-identified, no matter the number
of time periods available. This condition is satisfied in most models,
including the logit one. We also characterize the identified set of $\theta$
and show how to compute it using linear programming techniques. While $\theta$
is not generally point-identified, its identified set is informative in the
examples we analyze numerically, suggesting that meaningful learning about
$\theta$ may be possible even in short panels with feedback. As a complement,
we report calculations of identified sets for an average partial effect, and
find informative sets in this case as well.
arXiv link: http://arxiv.org/abs/2301.05733v2
Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap
on observables). I propose new practical large-sample and finite-sample methods
for estimating and inferring heterogeneous causal effects (under
unconfoundedness) in the empirically relevant context of limited overlap. I
develop a general principle called "Stable Probability Weighting" (SPW) that
can be used as an alternative to the widely used Inverse Probability Weighting
(IPW) technique, which relies on strong overlap. I show that IPW (or its
augmented version), when valid, is a special case of the more general SPW (or
its doubly robust version), which adjusts for the extremeness of the
conditional probabilities of the treatment states. The SPW principle can be
implemented using several existing large-sample parametric, semiparametric, and
nonparametric procedures for conditional moment models. In addition, I provide
new finite-sample results that apply when unconfoundedness is plausible within
fine strata. Since IPW estimation relies on the problematic reciprocal of the
estimated propensity score, I develop a "Finite-Sample Stable Probability
Weighting" (FPW) set-estimator that is unbiased in a sense. I also propose new
finite-sample inference methods for testing a general class of weak null
hypotheses. The associated computationally convenient methods, which can be
used to construct valid confidence sets and to bound the finite-sample
confidence distribution, are of independent interest. My large-sample and
finite-sample frameworks extend to the setting of multivalued treatments.
arXiv link: http://arxiv.org/abs/2301.05703v2
Non-Stochastic CDF Estimation Using Threshold Queries
and fundamental task. In this paper, we tackle the problem of estimating an
empirical distribution in a setting with two challenging features. First, the
algorithm does not directly observe the data; instead, it only asks a limited
number of threshold queries about each sample. Second, the data are not assumed
to be independent and identically distributed; instead, we allow for an
arbitrary process generating the samples, including an adaptive adversary.
These considerations are relevant, for example, when modeling a seller
experimenting with posted prices to estimate the distribution of consumers'
willingness to pay for a product: offering a price and observing a consumer's
purchase decision is equivalent to asking a single threshold query about their
value, and the distribution of consumers' values may be non-stationary over
time, as early adopters may differ markedly from late adopters.
Our main result quantifies, to within a constant factor, the sample
complexity of estimating the empirical CDF of a sequence of elements of $[n]$,
up to $\varepsilon$ additive error, using one threshold query per sample. The
complexity depends only logarithmically on $n$, and our result can be
interpreted as extending the existing logarithmic-complexity results for noisy
binary search to the more challenging setting where noise is non-stochastic.
Along the way to designing our algorithm, we consider a more general model in
which the algorithm is allowed to make a limited number of simultaneous
threshold queries on each sample. We solve this problem using Blackwell's
Approachability Theorem and the exponential weights method. As a side result of
independent interest, we characterize the minimum number of simultaneous
threshold queries required by deterministic CDF estimation algorithms.
arXiv link: http://arxiv.org/abs/2301.05682v1
Randomization Test for the Specification of Interference Structure
inference. We focus on experimental settings in which the treatment assignment
mechanism is known to researchers. We develop a new randomization test
utilizing a hierarchical relationship between different exposures. Compared
with existing approaches, our approach is essentially applicable to any null
exposure specifications and produces powerful test statistics without a priori
knowledge of the true interference structure. As empirical illustrations, we
revisit two existing social network experiments: one on farmers' insurance
adoption and the other on anti-conflict education programs.
arXiv link: http://arxiv.org/abs/2301.05580v2
Unbiased estimation and asymptotically valid inference in multivariable Mendelian randomization with many weak instrumental variables
infer causal relationships between exposures and outcomes with genome-wide
association studies (GWAS) summary data. However, the multivariable
inverse-variance weighting (IVW) approach, which serves as the foundation for
most MR approaches, cannot yield unbiased causal effect estimates in the
presence of many weak IVs. To address this problem, we proposed the MR using
Bias-corrected Estimating Equation (MRBEE) that can infer unbiased causal
relationships with many weak IVs and account for horizontal pleiotropy
simultaneously. While the practical significance of MRBEE was demonstrated in
our parallel work (Lorincz-Comi (2023)), this paper established the statistical
theories of multivariable IVW and MRBEE with many weak IVs. First, we showed
that the bias of the multivariable IVW estimate is caused by the
error-in-variable bias, whose scale and direction are inflated and influenced
by weak instrument bias and sample overlaps of exposures and outcome GWAS
cohorts, respectively. Second, we investigated the asymptotic properties of
multivariable IVW and MRBEE, showing that MRBEE outperforms multivariable IVW
regarding unbiasedness of causal effect estimation and asymptotic validity of
causal inference. Finally, we applied MRBEE to examine myopia and revealed that
education and outdoor activity are causal to myopia whereas indoor activity is
not.
arXiv link: http://arxiv.org/abs/2301.05130v6
Interacting Treatments with Endogenous Takeup
following a $2\times 2$ factorial design. There are two treatments, denoted $A$
and $B$, and units are randomly assigned to one of four categories: treatment
$A$ alone, treatment $B$ alone, joint treatment, or none. Allowing for
endogenous non-compliance with the two binary instruments representing the
intended assignment, as well as unrestricted interference across the two
treatments, we derive the causal interpretation of various instrumental
variable estimands under more general compliance conditions than in the
literature. In general, if treatment takeup is driven by both instruments for
some units, it becomes difficult to separate treatment interaction from
treatment effect heterogeneity. We provide auxiliary conditions and various
bounding strategies that may help zero in on causally interesting parameters.
As an empirical illustration, we apply our results to a program randomly
offering two different treatments, namely tutoring and financial incentives, to
first year college students, in order to assess the treatments' effects on
academic performance.
arXiv link: http://arxiv.org/abs/2301.04876v2
Testing for Coefficient Randomness in Local-to-Unity Autoregressions
autoregressive models where the autoregressive coefficient is local to unity,
which is empirically relevant given the results of earlier studies. Under this
specification, we theoretically analyze the effect of the correlation between
the random coefficient and disturbance on tests' properties, which remains
largely unexplored in the literature. Our analysis reveals that the correlation
crucially affects the power of tests for coefficient randomness and that tests
proposed by earlier studies can perform poorly when the degree of the
correlation is moderate to large. The test we propose in this paper is designed
to have a power function robust to the correlation. Because the asymptotic null
distribution of our test statistic depends on the correlation $\psi$ between
the disturbance and its square as earlier tests do, we also propose a modified
version of the test statistic such that its asymptotic null distribution is
free from the nuisance parameter $\psi$. The modified test is shown to have
better power properties than existing ones in large and finite samples.
arXiv link: http://arxiv.org/abs/2301.04853v2
A Framework for Generalization and Transportation of Causal Estimates Under Covariate Shift
causal effects with the sample at hand, but their external validity is
frequently debated. While classical results on the estimation of Population
Average Treatment Effects (PATE) implicitly assume random selection into
experiments, this is typically far from true in many medical,
social-scientific, and industry experiments. When the experimental sample is
different from the target sample along observable or unobservable dimensions,
experimental estimates may be of limited use for policy decisions. We begin by
decomposing the extrapolation bias from estimating the Target Average Treatment
Effect (TATE) using the Sample Average Treatment Effect (SATE) into covariate
shift, overlap, and effect modification components, which researchers can
reason about in order to diagnose the severity of extrapolation bias. Next, We
cast covariate shift as a sample selection problem and propose estimators that
re-weight the doubly-robust scores from experimental subjects to estimate
treatment effects in the overall sample (=: generalization) or in an alternate
target sample (=: transportation). We implement these estimators in the
open-source R package causalTransportR and illustrate its performance in a
simulation study and discuss diagnostics to evaluate its performance.
arXiv link: http://arxiv.org/abs/2301.04776v1
Inference on quantile processes with a finite number of clusters
quantile processes in the presence of a finite number of large and arbitrarily
heterogeneous clusters. The method asymptotically controls size by generating
statistics that exhibit enough distributional symmetry such that randomization
tests can be applied. The randomization test does not require ex-ante matching
of clusters, is free of user-chosen parameters, and performs well at
conventional significance levels with as few as five clusters. The method tests
standard (non-sharp) hypotheses and can even be asymptotically similar in
empirically relevant situations. The main focus of the paper is inference on
quantile treatment effects but the method applies more broadly. Numerical and
empirical examples are provided.
arXiv link: http://arxiv.org/abs/2301.04687v2
Fast and Reliable Jackknife and Bootstrap Methods for Cluster-Robust Inference
cluster-robust variance matrix estimators (CRVEs) for linear regression models
estimated by least squares. We also propose several new variants of the wild
cluster bootstrap, which involve these CRVEs, jackknife-based bootstrap
data-generating processes, or both. Extensive simulation experiments suggest
that the new methods can provide much more reliable inferences than existing
ones in cases where the latter are not trustworthy, such as when the number of
clusters is small and/or cluster sizes vary substantially. Three empirical
examples illustrate the new methods.
arXiv link: http://arxiv.org/abs/2301.04527v2
Testing for the appropriate level of clustering in linear regression models
inference assumes that the clustering structure is known, even though there are
often several possible ways in which a dataset could be clustered. We propose
two tests for the correct level of clustering in regression models. One test
focuses on inference about a single coefficient, and the other on inference
about two or more coefficients. We provide both asymptotic and wild bootstrap
implementations. The proposed tests work for a null hypothesis of either no
clustering or “fine” clustering against alternatives of “coarser”
clustering. We also propose a sequential testing procedure to determine the
appropriate level of clustering. Simulations suggest that the bootstrap tests
perform very well under the null hypothesis and can have excellent power. An
empirical example suggests that using the tests leads to sensible inferences.
arXiv link: http://arxiv.org/abs/2301.04522v2
Uniform Inference in Linear Error-in-Variables Models: Divide-and-Conquer
moments of observables. This moments-based estimator is consistent only when
the coefficient of the latent regressor is assumed to be non-zero. We develop a
new estimator based on the divide-and-conquer principle that is consistent for
any value of the coefficient of the latent regressor. In an application on the
relation between investment, (mismeasured) Tobin's $q$ and cash flow, we find
time periods in which the effect of Tobin's $q$ is not statistically different
from zero. The implausibly large higher-order moment estimates in these periods
disappear when using the proposed estimator.
arXiv link: http://arxiv.org/abs/2301.04439v1
Asymptotic Theory for Two-Way Clustering
two-way dependence and heterogeneity across clusters. Statistical inference for
situations with both two-way dependence and cluster heterogeneity has thus far
been an open issue. The existing theory for two-way clustering inference
requires identical distributions across clusters (implied by the so-called
separate exchangeability assumption). Yet no such homogeneity requirement is
needed in the existing theory for one-way clustering. The new result therefore
theoretically justifies the view that two-way clustering is a more robust
version of one-way clustering, consistent with applied practice. In an
application to linear regression, I show that a standard plug-in variance
estimator is valid for inference.
arXiv link: http://arxiv.org/abs/2301.03805v3
Quantile Autoregression-based Non-causality Testing
and Finance for their ability to display nonlinear behaviors such as asymmetric
dynamics, clustering volatility, and local explosiveness. In this paper, we
investigate the statistical properties of empirical conditional quantiles of
non-causal processes. Specifically, we show that the quantile autoregression
(QAR) estimates for non-causal processes do not remain constant across
different quantiles in contrast to their causal counterparts. Furthermore, we
demonstrate that non-causal autoregressive processes admit nonlinear
representations for conditional quantiles given past observations. Exploiting
these properties, we propose three novel testing strategies of non-causality
for non-Gaussian processes within the QAR framework. The tests are constructed
either by verifying the constancy of the slope coefficients or by applying a
misspecification test of the linear QAR model over different quantiles of the
process. Some numerical experiments are included to examine the finite sample
performance of the testing strategies, where we compare different specification
tests for dynamic quantiles with the Kolmogorov-Smirnov constancy test. The new
methodology is applied to some time series from financial markets to
investigate the presence of speculative bubbles. The extension of the approach
based on the specification tests to AR processes driven by innovations with
heteroskedasticity is studied through simulations. The performance of QAR
estimates of non-causal processes at extreme quantiles is also explored.
arXiv link: http://arxiv.org/abs/2301.02937v1
Climate change heterogeneity: A new quantitative approach
quantitative methodology to characterize, measure, and test the existence of
climate change heterogeneity. It consists of three steps. First, we introduce a
new testable warming typology based on the evolution of the trend of the whole
temperature distribution and not only on the average. Second, we define the
concepts of warming acceleration and warming amplification in a testable
format. And third, we introduce the new testable concept of warming dominance
to determine whether region A is suffering a worse warming process than region
B. Applying this three-step methodology, we find that Spain and the Globe
experience a clear distributional warming process (beyond the standard average)
but of different types. In both cases, this process is accelerating over time
and asymmetrically amplified. Overall, warming in Spain dominates the Globe in
all the quantiles except the lower tail of the global temperature distribution
that corresponds to the Arctic region. Our climate change heterogeneity results
open the door to the need for a non-uniform causal-effect climate analysis that
goes beyond the standard causality in mean as well as for a more efficient
design of the mitigation-adaptation policies. In particular, the heterogeneity
we find suggests that these policies should contain a common global component
and a clear local-regional element. Future climate agreements should take the
whole temperature distribution into account.
arXiv link: http://arxiv.org/abs/2301.02648v1
Relaxing Instrument Exogeneity with Common Confounders
unobserved confounding, under the famous relevance and exogeneity
(unconfoundedness and exclusion) assumptions. As exogeneity is difficult to
justify and to some degree untestable, it often invites criticism in
applications. Hoping to alleviate this problem, we propose a novel
identification approach, which relaxes traditional IV exogeneity to exogeneity
conditional on some unobserved common confounders. We assume there exist some
relevant proxies for the unobserved common confounders. Unlike typical proxies,
our proxies can have a direct effect on the endogenous regressor and the
outcome. We provide point identification results with a linearly separable
outcome model in the disturbance, and alternatively with strict monotonicity in
the first stage. General doubly robust and Neyman orthogonal moments are
derived consecutively to enable the straightforward root-n estimation of
low-dimensional parameters despite the high-dimensionality of nuisances,
themselves non-uniquely defined by Fredholm integral equations. Using this
novel method with NLS97 data, we separate ability bias from general selection
bias in the economic returns to education problem.
arXiv link: http://arxiv.org/abs/2301.02052v3
Measuring tail risk at high-frequency: An $L_1$-regularized extreme value regression approach with unit-root predictors
connection with trading activity and market uncertainty. We introduce a dynamic
extreme value regression model accommodating both stationary and local
unit-root predictors to appropriately capture the time-varying behaviour of the
distribution of high-frequency extreme losses. To characterize trading activity
and market uncertainty, we consider several volatility and liquidity
predictors, and propose a two-step adaptive $L_1$-regularized maximum
likelihood estimator to select the most appropriate ones. We establish the
oracle property of the proposed estimator for selecting both stationary and
local unit-root predictors, and show its good finite sample properties in an
extensive simulation study. Studying the high-frequency extreme losses of nine
large liquid U.S. stocks using 42 liquidity and volatility predictors, we find
the severity of extreme losses to be well predicted by low levels of price
impact in period of high volatility of liquidity and volatility.
arXiv link: http://arxiv.org/abs/2301.01362v1
On the causality-preservation capabilities of generative modelling
for a wide variety of tasks. The rise and development of machine learning and
deep learning models have created many opportunities to improve our modeling
toolbox. Breakthroughs in these fields often come with the requirement of large
amounts of data. Such large datasets are often not publicly available in
finance and insurance, mainly due to privacy and ethics concerns. This lack of
data is currently one of the main hurdles in developing better models. One
possible option to alleviating this issue is generative modeling. Generative
models are capable of simulating fake but realistic-looking data, also referred
to as synthetic data, that can be shared more freely. Generative Adversarial
Networks (GANs) is such a model that increases our capacity to fit very
high-dimensional distributions of data. While research on GANs is an active
topic in fields like computer vision, they have found limited adoption within
the human sciences, like economics and insurance. Reason for this is that in
these fields, most questions are inherently about identification of causal
effects, while to this day neural networks, which are at the center of the GAN
framework, focus mostly on high-dimensional correlations. In this paper we
study the causal preservation capabilities of GANs and whether the produced
synthetic data can reliably be used to answer causal questions. This is done by
performing causal analyses on the synthetic data, produced by a GAN, with
increasingly more lenient assumptions. We consider the cross-sectional case,
the time series case and the case with a complete structural model. It is shown
that in the simple cross-sectional scenario where correlation equals causation
the GAN preserves causality, but that challenges arise for more advanced
analyses.
arXiv link: http://arxiv.org/abs/2301.01109v1
Fitting mixed logit random regret minimization models using maximum simulated likelihood
randregret command introduced in Guti\'errez-Vargas et al. (2021, The Stata
Journal 21: 626-658) incorporating random coefficients for Random Regret
Minimization models. The newly developed command mixrandregret allows the
inclusion of random coefficients in the regret function of the classical RRM
model introduced in Chorus (2010, European Journal of Transport and
Infrastructure Research 10: 181-196). The command allows the user to specify a
combination of fixed and random coefficients. In addition, the user can specify
normal and log-normal distributions for the random coefficients using the
commands' options. The models are fitted using simulated maximum likelihood
using numerical integration to approximate the choice probabilities.
arXiv link: http://arxiv.org/abs/2301.01091v1
The Chained Difference-in-Differences
(binary) treatment effect parameters when balanced panel data is not available,
or consists of only a subset of the available data. We develop a new estimator:
the chained difference-in-differences, which leverages the overlapping
structure of many unbalanced panel data sets. This approach consists in
aggregating a collection of short-term treatment effects estimated on multiple
incomplete panels. Our estimator accommodates (1) multiple time periods, (2)
variation in treatment timing, (3) treatment effect heterogeneity, (4) general
missing data patterns, and (5) sample selection on observables. We establish
the asymptotic properties of the proposed estimator and discuss identification
and efficiency gains in comparison to existing methods. Finally, we illustrate
its relevance through (i) numerical simulations, and (ii) an application about
the effects of an innovation policy in France.
arXiv link: http://arxiv.org/abs/2301.01085v4
Time-Varying Coefficient DAR Model and Stability Measures for Stablecoin Prices: An Application to Tether
market capitalization. We show that the distributional and dynamic properties
of Tether/USD rates have been evolving from 2017 to 2021. We use local analysis
methods to detect and describe the local patterns, such as short-lived trends,
time-varying volatility and persistence. To accommodate these patterns, we
consider a time varying parameter Double Autoregressive tvDAR(1) model under
the assumption of local stationarity of Tether/USD rates. We estimate the tvDAR
model non-parametrically and test hypotheses on the functional parameters. In
the application to Tether, the model provides a good fit and reliable
out-of-sample forecasts at short horizons, while being robust to time-varying
persistence and volatility. In addition, the model yields a simple plug-in
measure of stability for Tether and other stablecoins for assessing and
comparing their stability.
arXiv link: http://arxiv.org/abs/2301.00509v1
Inference for Large Panel Data with Many Covariates
covariates that explains a large dimensional panel. Our selection method
provides correct false detection control while having higher power than
existing approaches. We develop the inferential theory for large panels with
many covariates by combining post-selection inference with a novel multiple
testing adjustment. Our data-driven hypotheses are conditional on the sparse
covariate selection. We control for family-wise error rates for covariate
discovery for large cross-sections. As an easy-to-use and practically relevant
procedure, we propose Panel-PoSI, which combines the data-driven adjustment for
panel multiple testing with valid post-selection p-values of a generalized
LASSO, that allows us to incorporate priors. In an empirical study, we select a
small number of asset pricing factors that explain a large cross-section of
investment strategies. Our method dominates the benchmarks out-of-sample due to
its better size and power.
arXiv link: http://arxiv.org/abs/2301.00292v6
Higher-order Refinements of Small Bandwidth Asymptotics for Density-Weighted Average Derivative Estimators
canonical parameter of interest in economics. Classical first-order large
sample distribution theory for kernel-based DWAD estimators relies on tuning
parameter restrictions and model assumptions that imply an asymptotic linear
representation of the point estimator. These conditions can be restrictive, and
the resulting distributional approximation may not be representative of the
actual sampling distribution of the statistic of interest. In particular, the
approximation is not robust to bandwidth choice. Small bandwidth asymptotics
offers an alternative, more general distributional approximation for
kernel-based DWAD estimators that allows for, but does not require, asymptotic
linearity. The resulting inference procedures based on small bandwidth
asymptotics were found to exhibit superior finite sample performance in
simulations, but no formal theory justifying that empirical success is
available in the literature. Employing Edgeworth expansions, this paper shows
that small bandwidth asymptotic approximations lead to inference procedures
with higher-order distributional properties that are demonstrably superior to
those of procedures based on asymptotic linear approximations.
arXiv link: http://arxiv.org/abs/2301.00277v2
Feature Selection for Personalized Policy Analysis
analyzing policy effect heterogeneity in a more flexible and comprehensive
manner than is typically available with conventional methods. In particular,
our method is able to capture policy effect heterogeneity both within and
across subgroups of the population defined by observable characteristics. To
achieve this, we employ partial least squares to identify target components of
the population and causal forests to estimate personalized policy effects
across these components. We show that the method is consistent and leads to
asymptotically normally distributed policy effects. To demonstrate the efficacy
of our approach, we apply it to the data from the Pennsylvania Reemployment
Bonus Experiments, which were conducted in 1988-1989. The analysis reveals that
financial incentives can motivate some young non-white individuals to enter the
labor market. However, these incentives may also provide a temporary financial
cushion for others, dissuading them from actively seeking employment. Our
findings highlight the need for targeted, personalized measures for young
non-white male participants.
arXiv link: http://arxiv.org/abs/2301.00251v3
Inference on Time Series Nonparametric Conditional Moment Restrictions Using General Sieves
approximate nonlinear functions of high dimensional variables much more
flexibly than various linear sieves (or series). This paper considers general
nonlinear sieve quasi-likelihood ratio (GN-QLR) based inference on expectation
functionals of time series data, where the functionals of interest are based on
some nonparametric function that satisfy conditional moment restrictions and
are learned using multilayer neural networks. While the asymptotic normality of
the estimated functionals depends on some unknown Riesz representer of the
functional space, we show that the optimally weighted GN-QLR statistic is
asymptotically Chi-square distributed, regardless whether the expectation
functional is regular (root-$n$ estimable) or not. This holds when the data are
weakly dependent beta-mixing condition. We apply our method to the off-policy
evaluation in reinforcement learning, by formulating the Bellman equation into
the conditional moment restriction framework, so that we can make inference
about the state-specific value functional using the proposed GN-QLR method with
time series data. In addition, estimating the averaged partial means and
averaged partial derivatives of nonparametric instrumental variables and
quantile IV models are also presented as leading examples. Finally, a Monte
Carlo study shows the finite sample performance of the procedure
arXiv link: http://arxiv.org/abs/2301.00092v2
Identifying causal effects with subjective ordinal outcomes
the meanings of the categories are subjective, leaving each individual free to
apply their own definitions in answering. This paper studies the use of these
responses as an outcome variable in causal inference, accounting for variation
in interpretation of the categories across individuals. I find that when a
continuous treatment variable is statistically independent of both i) potential
outcomes; and ii) heterogeneity in reporting styles, a nonparametric regression
of response category number on that treatment variable recovers a quantity
proportional to an average causal effect among individuals who are on the
margin between successive response categories. The magnitude of a given
regression coefficient is not meaningful on its own, but the ratio of local
regression derivatives with respect to two such treatment variables identifies
the relative magnitudes of convex averages of their effects. These results can
be seen as limiting cases of analogous results for binary treatment variables,
though comparisons of magnitude involving discrete treatments are not as
readily interpretable outside of the limit. I obtain a partial identification
result for comparisons involving discrete treatments under further assumptions.
An empirical application illustrates the results by revisiting the effects of
income comparisons on subjective well-being, without assuming cardinality or
interpersonal comparability of responses.
arXiv link: http://arxiv.org/abs/2212.14622v4
Empirical Bayes When Estimation Precision Predicts Parameters
assumption: The unknown parameters of interest are independent from the known
standard errors of the estimates. This assumption is often theoretically
questionable and empirically rejected. This paper proposes to model the
conditional distribution of the parameter given the standard errors as a
flexibly parametrized location-scale family of distributions, leading to a
family of methods that we call CLOSE. The CLOSE framework unifies and
generalizes several proposals under precision dependence. We argue that the
most flexible member of the CLOSE family is a minimalist and computationally
efficient default for accounting for precision dependence. We analyze this
method and show that it is competitive in terms of the regret of subsequent
decisions rules. Empirically, using CLOSE leads to sizable gains for selecting
high-mobility Census tracts.
arXiv link: http://arxiv.org/abs/2212.14444v5
Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences with Possibly Dependent Observations
arbitrary stopping times, promise flexible statistical inference and on-the-fly
decision making. However, strong guarantees are limited to parametric
sequential tests that under-cover in practice or concentration-bound-based
sequences that over-cover and have suboptimal rejection times. In this work, we
consider classic delayed-start normal-mixture sequential probability ratio
tests, and we provide the first asymptotic type-I-error and
expected-rejection-time guarantees under general non-parametric data generating
processes, where the asymptotics are indexed by the test's burn-in time. The
type-I-error results primarily leverage a martingale strong invariance
principle and establish that these tests (and their implied confidence
sequences) have type-I error rates asymptotically equivalent to the desired
(possibly varying) $\alpha$-level. The expected-rejection-time results
primarily leverage an identity inspired by It\^o's lemma and imply that, in
certain asymptotic regimes, the expected rejection time is asymptotically
equivalent to the minimum possible among $\alpha$-level tests. We show how to
apply our results to sequential inference on parameters defined by estimating
equations, such as average treatment effects. Together, our results establish
these (ostensibly parametric) tests as general-purpose, non-parametric, and
near-optimal. We illustrate this via numerical simulations and a real-data
application to A/B testing at Netflix.
arXiv link: http://arxiv.org/abs/2212.14411v5
What Estimators Are Unbiased For Linear Models?
the Gauss-Markov theorem continues to hold without the requirement that
competing estimators are linear in the vector of outcomes. Despite the elegant
proof, it was shown by the authors and other researchers that the main result
in the earlier version of Hansen's paper does not extend the classic
Gauss-Markov theorem because no nonlinear unbiased estimator exists under his
conditions. To address the issue, Hansen [2022] added statements in the latest
version with new conditions under which nonlinear unbiased estimators exist.
Motivated by the lively discussion, we study a fundamental problem: what
estimators are unbiased for a given class of linear models? We first review a
line of highly relevant work dating back to the 1960s, which, unfortunately,
have not drawn enough attention. Then, we introduce notation that allows us to
restate and unify results from earlier work and Hansen [2022]. The new
framework also allows us to highlight differences among previous conclusions.
Lastly, we establish new representation theorems for unbiased estimators under
different restrictions on the linear model, allowing the coefficients and
covariance matrix to take only a finite number of values, the higher moments of
the estimator and the dependent variable to exist, and the error distribution
to be discrete, absolutely continuous, or dominated by another probability
measure. Our results substantially generalize the claims of parallel
commentaries on Hansen [2022] and a remarkable result by Koopmann [1982].
arXiv link: http://arxiv.org/abs/2212.14185v1
Supercompliers
supercompliers as the subpopulation whose treatment take-up positively responds
to eligibility and whose outcome positively responds to take-up. Supercompliers
are the only subpopulation to benefit from treatment eligibility and, hence,
are important for policy. We provide tools to characterize supercompliers under
a set of jointly testable assumptions. Specifically, we require standard
assumptions from the local average treatment effect literature plus an outcome
monotonicity assumption. Estimation and inference can be conducted with
instrumental variable regression. In two job-training experiments, we
demonstrate our machinery's utility, particularly in incorporating social
welfare weights into marginal-value-of-public-funds analysis.
arXiv link: http://arxiv.org/abs/2212.14105v3
Forward Orthogonal Deviations GMM and the Absence of Large Sample Bias
dynamic panel data regressions can have significant bias when the number of
time periods ($T$) is not small compared to the number of cross-sectional units
($n$). The bias is attributed to the use of many instrumental variables. This
paper shows that if the maximum number of instrumental variables used in a
period increases with $T$ at a rate slower than $T^{1/2}$, then GMM estimators
that exploit the forward orthogonal deviations (FOD) transformation do not have
asymptotic bias, regardless of how fast $T$ increases relative to $n$. This
conclusion is specific to using the FOD transformation. A similar conclusion
does not necessarily apply when other transformations are used to remove fixed
effects. Monte Carlo evidence illustrating the analytical results is provided.
arXiv link: http://arxiv.org/abs/2212.14075v2
Robustifying Markowitz
parameters feature numerous issues in practice. They perform poorly out of
sample due to estimation error, they experience extreme weights together with
high sensitivity to change in input parameters. The heavy-tail characteristics
of financial time series are in fact the cause for these erratic fluctuations
of weights that consequently create substantial transaction costs. In
robustifying the weights we present a toolbox for stabilizing costs and weights
for global minimum Markowitz portfolios. Utilizing a projected gradient descent
(PGD) technique, we avoid the estimation and inversion of the covariance
operator as a whole and concentrate on robust estimation of the gradient
descent increment. Using modern tools of robust statistics we construct a
computationally efficient estimator with almost Gaussian properties based on
median-of-means uniformly over weights. This robustified Markowitz approach is
confirmed by empirical studies on equity markets. We demonstrate that
robustified portfolios reach the lowest turnover compared to shrinkage-based
and constrained portfolios while preserving or slightly improving out-of-sample
performance.
arXiv link: http://arxiv.org/abs/2212.13996v1
Spectral and post-spectral estimators for grouped panel data models
panel data models. Both estimators are consistent in the asymptotics where the
number of observations $N$ and the number of time periods $T$ simultaneously
grow large. In addition, the post-spectral estimator is $NT$-consistent
and asymptotically normal with mean zero under the assumption of well-separated
groups even if $T$ is growing much slower than $N$. The post-spectral estimator
has, therefore, theoretical properties that are comparable to those of the
grouped fixed-effect estimator developed by Bonhomme and Manresa (2015). In
contrast to the grouped fixed-effect estimator, however, our post-spectral
estimator is computationally straightforward.
arXiv link: http://arxiv.org/abs/2212.13324v2
An Effective Treatment Approach to Difference-in-Differences with General Treatment Patterns
variable of interest may be non-binary and its value may change in each period.
It is generally difficult to estimate treatment parameters defined with the
potential outcome given the entire path of treatment adoption, because each
treatment path may be experienced by only a small number of observations. We
propose an alternative approach using the concept of effective treatment, which
summarizes the treatment path into an empirically tractable low-dimensional
variable, and develop doubly robust identification, estimation, and inference
methods. We also provide a companion R software package.
arXiv link: http://arxiv.org/abs/2212.13226v3
Orthogonal Series Estimation for the Ratio of Conditional Expectation Functions
estimating the ratio of conditional expectation functions (CEFR). Specifically
in causal inference problems, it is sometimes natural to consider ratio-based
treatment effects, such as odds ratios and hazard ratios, and even
difference-based treatment effects are identified as CEFR in some empirically
relevant settings. This chapter develops the general framework for estimation
and inference on CEFR, which allows the use of flexible machine learning for
infinite-dimensional nuisance parameters. In the first stage of the framework,
the orthogonal signals are constructed using debiased machine learning
techniques to mitigate the negative impacts of the regularization bias in the
nuisance estimates on the target estimates. The signals are then combined with
a novel series estimator tailored for CEFR. We derive the pointwise and uniform
asymptotic results for estimation and inference on CEFR, including the validity
of the Gaussian bootstrap, and provide low-level sufficient conditions to apply
the proposed framework to some specific examples. We demonstrate the
finite-sample performance of the series estimator constructed under the
proposed framework by numerical simulations. Finally, we apply the proposed
method to estimate the causal effect of the 401(k) program on household assets.
arXiv link: http://arxiv.org/abs/2212.13145v1
Tensor PCA for Factor Models
with non-negligible cross-sectional and time-series correlations. Factor models
are natural for capturing such dependencies. A tensor factor model describes
the $d$-dimensional panel as a sum of a reduced rank component and an
idiosyncratic noise, generalizing traditional factor models for two-dimensional
panels. We consider a tensor factor model corresponding to the notion of a
reduced multilinear rank of a tensor. We show that for a strong factor model, a
simple tensor principal component analysis algorithm is optimal for estimating
factors and loadings. When the factors are weak, the convergence rate of simple
TPCA can be improved with alternating least-squares iterations. We also provide
inferential results for factors and loadings and propose the first test to
select the number of factors. The new tools are applied to the problem of
imputing missing values in a multidimensional panel of firm characteristics.
arXiv link: http://arxiv.org/abs/2212.12981v3
Efficient Sampling for Realized Variance Estimation in Time-Changed Diffusion Models
time for the realized variance (RV) estimator. We theoretically show in finite
samples that depending on the permitted sampling information, the RV estimator
is most efficient under either hitting time sampling that samples whenever the
price changes by a pre-determined threshold, or under the new concept of
realized business time that samples according to a combination of observed
trades and estimated tick variance. The analysis builds on the assumption that
asset prices follow a diffusion that is time-changed with a jump process that
separately models the transaction times. This provides a flexible model that
allows for leverage specifications and Hawkes-type jump processes and
separately captures the empirically varying trading intensity and tick variance
processes, which are particularly relevant for disentangling the driving forces
of the sampling schemes. Extensive simulations confirm our theoretical results
and show that for low levels of noise, hitting time sampling remains superior
while for increasing noise levels, realized business time becomes the
empirically most efficient sampling scheme. An application to stock data
provides empirical evidence for the benefits of using these intrinsic sampling
schemes to construct more efficient RV estimators as well as for an improved
forecast performance.
arXiv link: http://arxiv.org/abs/2212.11833v3
A Bootstrap Specification Test for Semiparametric Models with Generated Regressors
nonparametrically generated regressors. Such variables are not observed by the
researcher but are nonparametrically identified and estimable. Applications of
the test include models with endogenous regressors identified by control
functions, semiparametric sample selection models, or binary games with
incomplete information. The statistic is built from the residuals of the
semiparametric model. A novel wild bootstrap procedure is shown to provide
valid critical values. We consider nonparametric estimators with an automatic
bias correction that makes the test implementable without undersmoothing. In
simulations the test exhibits good small sample performances, and an
application to women's labor force participation decisions shows its
implementation in a real data context.
arXiv link: http://arxiv.org/abs/2212.11112v2
Partly Linear Instrumental Variables Regressions without Smoothing on the Instruments
variables. We propose an estimation method that does not smooth on the
instruments and we extend the Landweber-Fridman regularization scheme to the
estimation of this semiparametric model. We then show the asymptotic normality
of the parametric estimator and obtain the convergence rate for the
nonparametric estimator. Our estimator that does not smooth on the instruments
coincides with a typical estimator that does smooth on the instruments but
keeps the respective bandwidth fixed as the sample size increases. We propose a
data driven method for the selection of the regularization parameter, and in a
simulation study we show the attractive performance of our estimators.
arXiv link: http://arxiv.org/abs/2212.11012v2
Inference for Model Misspecification in Interest Rate Term Structure using Functional Principal Component Analysis
in interest rate term structure and are thus widely used in modeling. This
paper characterizes the heterogeneity of how misspecified such models are
through time. Presenting the orthonormal basis in the Nelson-Siegel model
interpretable as the three factors, we design two nonparametric tests for
whether the basis is equivalent to the data-driven functional principal
component basis underlying the yield curve dynamics, considering the ordering
of eigenfunctions or not, respectively. Eventually, we discover high dispersion
between the two bases when rare events occur, suggesting occasional
misspecification even if the model is overall expressive.
arXiv link: http://arxiv.org/abs/2212.10790v1
Probabilistic Quantile Factor Analysis
incorporates regularization and computationally efficient variational
approximations. We establish through synthetic and real data experiments that
the proposed estimator can, in many cases, achieve better accuracy than a
recently proposed loss-based estimator. We contribute to the factor analysis
literature by extracting new indexes of low, medium, and
high economic policy uncertainty, as well as loose,
median, and tight financial conditions. We show that the high
uncertainty and tight financial conditions indexes have superior predictive
ability for various measures of economic activity. In a high-dimensional
exercise involving about 1000 daily financial series, we find that quantile
factors also provide superior out-of-sample information compared to mean or
median factors.
arXiv link: http://arxiv.org/abs/2212.10301v3
Quantifying fairness and discrimination in predictive models
recent years, the literature in computer science and machine learning has
become interested in the subject, offering an interesting re-reading of the
topic. These questions are the consequences of numerous criticisms of
algorithms used to translate texts or to identify people in images. With the
arrival of massive data, and the use of increasingly opaque algorithms, it is
not surprising to have discriminatory algorithms, because it has become easy to
have a proxy of a sensitive variable, by enriching the data indefinitely.
According to Kranzberg (1986), "technology is neither good nor bad, nor is it
neutral", and therefore, "machine learning won't give you anything like gender
neutrality `for free' that you didn't explicitely ask for", as claimed by
Kearns et a. (2019). In this article, we will come back to the general context,
for predictive models in classification. We will present the main concepts of
fairness, called group fairness, based on independence between the sensitive
variable and the prediction, possibly conditioned on this or that information.
We will finish by going further, by presenting the concepts of individual
fairness. Finally, we will see how to correct a potential discrimination, in
order to guarantee that a model is more ethical
arXiv link: http://arxiv.org/abs/2212.09868v1
Robust Design and Evaluation of Predictive Algorithms under Unobserved Confounding
outcome is selectively observed given choices made by human decision makers. We
propose a unified framework for the robust design and evaluation of predictive
algorithms in selectively observed data. We impose general assumptions on how
much the outcome may vary on average between unselected and selected units
conditional on observed covariates and identified nuisance parameters,
formalizing popular empirical strategies for imputing missing data such as
proxy outcomes and instrumental variables. We develop debiased machine learning
estimators for the bounds on a large class of predictive performance estimands,
such as the conditional likelihood of the outcome, a predictive algorithm's
mean square error, true/false positive rate, and many others, under these
assumptions. In an administrative dataset from a large Australian financial
institution, we illustrate how varying assumptions on unobserved confounding
leads to meaningful changes in default risk predictions and evaluations of
credit scores across sensitive groups.
arXiv link: http://arxiv.org/abs/2212.09844v5
Simultaneous Inference of a Partially Linear Model in Time Series
nonparametric component in partially linear time series regression models where
the nonparametric part is a multivariate unknown function. In particular, we
construct a simultaneous confidence region (SCR) for the multivariate function
by extending the high-dimensional Gaussian approximation to dependent processes
with continuous index sets. Our results allow for a more general dependence
structure compared to previous works and are widely applicable to a variety of
linear and nonlinear autoregressive processes. We demonstrate the validity of
our proposed methodology by examining the finite-sample performance in the
simulation study. Finally, an application in time series, the forward premium
regression, is presented, where we construct the SCR for the foreign exchange
risk premium from the exchange rate and macroeconomic data.
arXiv link: http://arxiv.org/abs/2212.10359v2
Identification of time-varying counterfactual parameters in nonlinear panel models
parameters in a class of nonlinear semiparametric panel models with fixed
effects and time effects. Our method applies to models for discrete outcomes
(e.g., two-way fixed effects binary choice) or continuous outcomes (e.g.,
censored regression), with discrete or continuous regressors. Our results do
not require parametric assumptions on the error terms or time-homogeneity on
the outcome equation. Our main results focus on static models, with a set of
results applying to models without any exogeneity conditions. We show that the
survival distribution of counterfactual outcomes is identified (point or
partial) in this class of models. This parameter is a building block for most
partial and marginal effects of interest in applied practice that are based on
the average structural function as defined by Blundell and Powell (2003, 2004).
To the best of our knowledge, ours are the first results on average partial and
marginal effects for binary choice and ordered choice models with two-way fixed
effects and non-logistic errors.
arXiv link: http://arxiv.org/abs/2212.09193v2
PAC-Bayesian Treatment Allocation Under Budget Constraints
policy maker faces a general budget or resource constraint. Utilizing the
PAC-Bayesian framework, we propose new treatment assignment rules that allow
for flexible notions of treatment outcome, treatment cost, and a budget
constraint. For example, the constraint setting allows for cost-savings, when
the costs of non-treatment exceed those of treatment for a subpopulation, to be
factored into the budget. It also accommodates simpler settings, such as
quantity constraints, and doesn't require outcome responses and costs to have
the same unit of measurement. Importantly, the approach accounts for settings
where budget or resource limitations may preclude treating all that can
benefit, where costs may vary with individual characteristics, and where there
may be uncertainty regarding the cost of treatment rules of interest. Despite
the nomenclature, our theoretical analysis examines frequentist properties of
the proposed rules. For stochastic rules that typically approach
budget-penalized empirical welfare maximizing policies in larger samples, we
derive non-asymptotic generalization bounds for the target population costs and
sharp oracle-type inequalities that compare the rules' welfare regret to that
of optimal policies in relevant budget categories. A closely related,
non-stochastic, model aggregation treatment assignment rule is shown to inherit
desirable attributes.
arXiv link: http://arxiv.org/abs/2212.09007v2
A smooth transition autoregressive model for matrix-variate time series
Matrix-variate time series modeling is a new branch of econometrics. Although
stylized facts in several fields, the existing models do not account for regime
switches in the dynamics of matrices that are not abrupt. In this paper, we
extend linear matrix-variate autoregressive models by introducing a
regime-switching model capable of accounting for smooth changes, the matrix
smooth transition autoregressive model. We present the estimation processes
with the asymptotic properties demonstrated with simulated and real data.
arXiv link: http://arxiv.org/abs/2212.08615v1
Moate Simulation of Stochastic Processes
numerical evolution of probability distribution functions represented on grids
arising from stochastic differential processes where initial conditions are
specified. Where the variables of stochastic differential equations may be
transformed via It\^o-Doeblin calculus into stochastic differentials with a
constant diffusion term, the probability distribution function for these
variables can be simulated in discrete time steps. The drift is applied
directly to a volume element of the distribution while the stochastic diffusion
term is applied through the use of convolution techniques such as Fast or
Discrete Fourier Transforms. This allows for highly accurate distributions to
be efficiently simulated to a given time horizon and may be employed in one,
two or higher dimensional expectation integrals, e.g. for pricing of financial
derivatives. The Moate Simulation approach forms a more accurate and
considerably faster alternative to Monte Carlo Simulation for many applications
while retaining the opportunity to alter the distribution in mid-simulation.
arXiv link: http://arxiv.org/abs/2212.08509v1
The finite sample performance of instrumental variable-based estimators of the Local Average Treatment Effect when controlling for covariates
parametric, semi-parametric, and non-parametric instrumental variable
estimators when controlling for a fixed set of covariates to evaluate the local
average treatment effect. Our simulation designs are based on empirical labor
market data from the US and vary in several dimensions, including effect
heterogeneity, instrument selectivity, instrument strength, outcome
distribution, and sample size. Among the estimators and simulations considered,
non-parametric estimation based on the random forest (a machine learner
controlling for covariates in a data-driven way) performs competitive in terms
of the average coverage rates of the (bootstrap-based) 95% confidence
intervals, while also being relatively precise. Non-parametric kernel
regression as well as certain versions of semi-parametric radius matching on
the propensity score, pair matching on the covariates, and inverse probability
weighting also have a decent coverage, but are less precise than the random
forest-based method. In terms of the average root mean squared error of LATE
estimation, kernel regression performs best, closely followed by the random
forest method, which has the lowest average absolute bias.
arXiv link: http://arxiv.org/abs/2212.07379v1
Smoothing volatility targeting
volatility-managed portfolios based on smoothing the predictive density of an
otherwise standard stochastic volatility model. Specifically, we develop a
novel variational Bayes estimation method that flexibly encompasses different
smoothness assumptions irrespective of the persistence of the underlying latent
state. Using a large set of equity trading strategies, we show that smoothing
volatility targeting helps to regularise the extreme leverage/turnover that
results from commonly used realised variance estimates. This has important
implications for both the risk-adjusted returns and the mean-variance
efficiency of volatility-managed portfolios, once transaction costs are
factored in. An extensive simulation study shows that our variational inference
scheme compares favourably against existing state-of-the-art Bayesian
estimation methods for stochastic volatility models.
arXiv link: http://arxiv.org/abs/2212.07288v1
Robust Estimation of the non-Gaussian Dimension in Structural Linear Models
structural errors: (i) to be an i.i.d process, (ii) to be mutually independent
across components, and (iii) each of them must be non-Gaussian distributed.
Hence, provided the first two requisites, it is crucial to evaluate the
non-Gaussian identification condition. We address this problem by relating the
non-Gaussian dimension of structural errors vector to the rank of a matrix
built from the higher-order spectrum of reduced-form errors. This makes our
proposal robust to the roots location of the lag polynomials, and generalizes
the current procedures designed for the restricted case of a causal structural
VAR model. Simulation exercises show that our procedure satisfactorily
estimates the number of non-Gaussian components.
arXiv link: http://arxiv.org/abs/2212.07263v2
On LASSO for High Dimensional Predictive Regression
in high dimensional linear predictive regressions, particularly when the number
of potential predictors exceeds the sample size and numerous unit root
regressors are present. The consistency of LASSO is contingent upon two key
components: the deviation bound of the cross product of the regressors and the
error term, and the restricted eigenvalue of the Gram matrix. We present new
probabilistic bounds for these components, suggesting that LASSO's rates of
convergence are different from those typically observed in cross-sectional
cases. When applied to a mixture of stationary, nonstationary, and cointegrated
predictors, LASSO maintains its asymptotic guarantee if predictors are
scale-standardized. Leveraging machine learning and macroeconomic domain
expertise, LASSO demonstrates strong performance in forecasting the
unemployment rate, as evidenced by its application to the FRED-MD database.
arXiv link: http://arxiv.org/abs/2212.07052v2
Policy learning for many outcomes of interest: Combining optimal policy trees with multi-objective Bayesian optimisation
create human-interpretable rules for making choices around the allocation of
different policy interventions. However, in realistic policy-making contexts,
decision-makers often care about trade-offs between outcomes, not just
single-mindedly maximising utility for one outcome. This paper proposes an
approach termed Multi-Objective Policy Learning (MOPoL) which combines optimal
decision trees for policy learning with a multi-objective Bayesian optimisation
approach to explore the trade-off between multiple outcomes. It does this by
building a Pareto frontier of non-dominated models for different hyperparameter
settings which govern outcome weighting. The key here is that a low-cost greedy
tree can be an accurate proxy for the very computationally costly optimal tree
for the purposes of making decisions which means models can be repeatedly fit
to learn a Pareto frontier. The method is applied to a real-world case-study of
non-price rationing of anti-malarial medication in Kenya.
arXiv link: http://arxiv.org/abs/2212.06312v2
Logs with zeros? Some problems and solutions
earnings), researchers frequently estimate an average treatment effect (ATE)
for a "log-like" transformation that behaves like $\log(Y)$ for large $Y$ but
is defined at zero (e.g. $\log(1+Y)$, $arcsinh(Y)$). We argue that
ATEs for log-like transformations should not be interpreted as approximating
percentage effects, since unlike a percentage, they depend on the units of the
outcome. In fact, we show that if the treatment affects the extensive margin,
one can obtain a treatment effect of any magnitude simply by re-scaling the
units of $Y$ before taking the log-like transformation. This arbitrary
unit-dependence arises because an individual-level percentage effect is not
well-defined for individuals whose outcome changes from zero to non-zero when
receiving treatment, and the units of the outcome implicitly determine how much
weight the ATE for a log-like transformation places on the extensive margin. We
further establish a trilemma: when the outcome can equal zero, there is no
treatment effect parameter that is an average of individual-level treatment
effects, unit-invariant, and point-identified. We discuss several alternative
approaches that may be sensible in settings with an intensive and extensive
margin, including (i) expressing the ATE in levels as a percentage (e.g. using
Poisson regression), (ii) explicitly calibrating the value placed on the
intensive and extensive margins, and (iii) estimating separate effects for the
two margins (e.g. using Lee bounds). We illustrate these approaches in three
empirical applications.
arXiv link: http://arxiv.org/abs/2212.06080v7
Measuring the Driving Forces of Predictive Performance: Application to Credit Scoring
credit scoring models are under growing scrutiny from banking supervisors and
internal model validators. These authorities need to monitor the model
performance and identify its key drivers. To facilitate this, we introduce the
XPER methodology to decompose a performance metric (e.g., AUC, $R^2$) into
specific contributions associated with the various features of a forecasting
model. XPER is theoretically grounded on Shapley values and is both
model-agnostic and performance metric-agnostic. Furthermore, it can be
implemented either at the model level or at the individual level. Using a novel
dataset of car loans, we decompose the AUC of a machine-learning model trained
to forecast the default probability of loan applicants. We show that a small
number of features can explain a surprisingly large part of the model
performance. Notably, the features that contribute the most to the predictive
performance of the model may not be the ones that contribute the most to
individual forecasts (SHAP). Finally, we show how XPER can be used to deal with
heterogeneity issues and improve performance.
arXiv link: http://arxiv.org/abs/2212.05866v4
Dominant Drivers of National Inflation
inflation rates. We propose a novel approach christened D2ML to identify
drivers of national inflation. D2ML combines machine learning for model
selection with time dependent data and graphical models to estimate the inverse
of the covariance matrix, which is then used to identify dominant drivers.
Using a dataset of 33 countries, we find that the US inflation rate and oil
prices are dominant drivers of national inflation rates. For a more general
framework, we carry out Monte Carlo simulations to show that our estimator
correctly identifies dominant drivers.
arXiv link: http://arxiv.org/abs/2212.05841v1
Robust Inference in High Dimensional Linear Model with Cluster Dependence
researchers to account for cluster dependence in linear model. It is well known
that this standard error is biased. We show that the bias does not vanish under
high dimensional asymptotics by revisiting Chesher and Jewitt (1987)'s
approach. An alternative leave-cluster-out crossfit (LCOC) estimator that is
unbiased, consistent and robust to cluster dependence is provided under high
dimensional setting introduced by Cattaneo, Jansson and Newey (2018). Since
LCOC estimator nests the leave-one-out crossfit estimator of Kline, Saggio and
Solvsten (2019), the two papers are unified. Monte Carlo comparisons are
provided to give insights on its finite sample properties. The LCOC estimator
is then applied to Angrist and Lavy's (2009) study of the effects of high
school achievement award and Donohue III and Levitt's (2001) study of the
impact of abortion on crime.
arXiv link: http://arxiv.org/abs/2212.05554v1
On regression-adjusted imputation estimators of the average treatment effect
a natural idea for estimating causal effects. In the literature, estimators
that combine imputation and regression adjustments are believed to be
comparable to augmented inverse probability weighting. Accordingly, people for
a long time conjectured that such estimators, while avoiding directly
constructing the weights, are also doubly robust (Imbens, 2004; Stuart, 2010).
Generalizing an earlier result of the authors (Lin et al., 2021), this paper
formalizes this conjecture, showing that a large class of regression-adjusted
imputation methods are indeed doubly robust for estimating the average
treatment effect. In addition, they are provably semiparametrically efficient
as long as both the density and regression models are correctly specified.
Notable examples of imputation methods covered by our theory include kernel
matching, (weighted) nearest neighbor matching, local linear matching, and
(honest) random forests.
arXiv link: http://arxiv.org/abs/2212.05424v2
The Falsification Adaptive Set in Linear Models with Instrumental Variables that Violate the Exclusion or Conditional Exogeneity Restriction
linear models with a single endogenous variable estimated with multiple
correlated instrumental variables (IVs). The FAS reflects the model uncertainty
that arises from falsification of the baseline model. We show that it applies
to cases where a conditional exogeneity assumption holds and invalid
instruments violate the exclusion assumption only. We propose a generalized FAS
that reflects the model uncertainty when some instruments violate the exclusion
assumption and/or some instruments violate the conditional exogeneity
assumption. Under the assumption that invalid instruments are not themselves
endogenous explanatory variables, if there is at least one relevant instrument
that satisfies both the exclusion and conditional exogeneity assumptions then
this generalized FAS is guaranteed to contain the parameter of interest.
arXiv link: http://arxiv.org/abs/2212.04814v2
On the Non-Identification of Revenue Production Functions
proxy for output. I formalize and strengthen this common knowledge by showing
that neither the production function nor Hicks-neutral productivity can be
identified with such a revenue proxy. This result obtains when relaxing the
standard assumptions used in the literature to allow for imperfect competition.
It holds for a large class of production functions, including all commonly used
parametric forms. Among the prevalent approaches to address this issue, only
those that impose assumptions on the underlying demand system can possibly
identify the production function.
arXiv link: http://arxiv.org/abs/2212.04620v3
Optimal Model Selection in RDD and Related Settings Using Placebo Zones
Design, Regression Kink Design, and related IV estimators. Candidate models are
assessed within a 'placebo zone' of the running variable, where the true
effects are known to be zero. The approach yields an optimal combination of
bandwidth, polynomial, and any other choice parameters. It can also inform
choices between classes of models (e.g. RDD versus cohort-IV) and any other
choices, such as covariates, kernel, or other weights. We outline sufficient
conditions under which the approach is asymptotically optimal. The approach
also performs favorably under more general conditions in a series of Monte
Carlo simulations. We demonstrate the approach in an evaluation of changes to
Minimum Supervised Driving Hours in the Australian state of New South Wales. We
also re-evaluate evidence on the effects of Head Start and Minimum Legal
Drinking Age. Our Stata commands implement the procedure and compare its
performance to other approaches.
arXiv link: http://arxiv.org/abs/2212.04043v1
Semiparametric Distribution Regression with Instruments and Monotonicity
regression model in the presence of an endogenous regressor, which are based on
an extension of IV probit estimators. We discuss the causal interpretation of
the estimators and two methods (monotone rearrangement and isotonic regression)
to ensure a monotonically increasing distribution function. Asymptotic
properties and simulation evidence are provided. An application to wage
equations reveals statistically significant and heterogeneous differences to
the inconsistent OLS-based estimator.
arXiv link: http://arxiv.org/abs/2212.03704v1
Neighborhood Adaptive Estimators for Causal Inference under Network Interference
In this work we consider the violation of the classical no-interference
assumption with units connected by a network. For tractability, we consider a
known network that describes how interference may spread. Unlike previous work
the radius (and intensity) of the interference experienced by a unit is unknown
and can depend on different (local) sub-networks and the assigned treatments.
We study estimators for the average direct treatment effect on the treated in
such a setting under additive treatment effects. We establish rates of
convergence and distributional results. The proposed estimators considers all
possible radii for each (local) treatment assignment pattern. In contrast to
previous work, we approximate the relevant network interference patterns that
lead to good estimates of the interference. To handle feature engineering, a
key innovation is to propose the use of synthetic treatments to decouple the
dependence. We provide simulations, an empirical illustration and insights for
the general study of interference.
arXiv link: http://arxiv.org/abs/2212.03683v2
Bayesian Forecasting in Economics and Finance: A Modern Review
to probabilistic forecasting. Uncertainty about all unknowns that characterize
any forecasting problem -- model, parameters, latent states -- is able to be
quantified explicitly, and factored into the forecast distribution via the
process of integration or averaging. Allied with the elegance of the method,
Bayesian forecasting is now underpinned by the burgeoning field of Bayesian
computation, which enables Bayesian forecasts to be produced for virtually any
problem, no matter how large, or complex. The current state of play in Bayesian
forecasting in economics and finance is the subject of this review. The aim is
to provide the reader with an overview of modern approaches to the field, set
in some historical context; and with sufficient computational detail given to
assist the reader with implementation.
arXiv link: http://arxiv.org/abs/2212.03471v2
The long-term effect of childhood exposure to technology using surrogates
of the parents affects the ability to climb the social ladder in terms of
income at ages 45-49 using the Danish micro data from years 1961-2019. Our
measure of technology exposure covers the degree to which using computers
(hardware and software) is required to perform an occupation, and it is created
by merging occupational codes with detailed data from O*NET. The challenge in
estimating this effect is that long-term outcome is observed over a different
time horizon than our treatment of interest. We therefore adapt the surrogate
index methodology, linking the effect of our childhood treatment on
intermediate surrogates, such as income and education at ages 25-29, to the
effect on adulthood income. We estimate that a one standard error increase in
exposure to technology increases the income rank by 2%-points, which is
economically and statistically significant and robust to cluster-correlation
within families. The derived policy recommendation is to update the educational
curriculum to expose children to computers to a higher degree, which may then
act as a social leveler.
arXiv link: http://arxiv.org/abs/2212.03351v2
Identification of Unobservables in Observations
interest in an economic model. This paper shows the identification of
unobserved variables in observations at the population level. When the
observables are distinct in each observation, there exists a function mapping
from the observables to the unobservables. Such a function guarantees the
uniqueness of the latent value in each observation. The key lies in the
identification of the joint distribution of observables and unobservables from
the distribution of observables. The joint distribution of observables and
unobservables then reveal the latent value in each observation. Three examples
of this result are discussed.
arXiv link: http://arxiv.org/abs/2212.02585v1
Educational Inequality of Opportunity and Mobility in Europe
intrinsic value for individuals. We study Inequality of Opportunity (IOp) and
intergenerational mobility in the distribution of educational attainment. We
propose to use debiased IOp estimators based on the Gini coefficient and the
Mean Logarithmic Deviation (MLD) which are robust to machine learning biases.
We also measure the effect of each circumstance on IOp, we provide tests to
compare IOp in two populations and to test joint significance of a group of
circumstances. We find that circumstances explain between 38% and 74% of
total educational inequality in European countries. Mother's education is the
most important circumstance in most countries. There is high intergenerational
persistence and there is evidence of an educational Great Gatsby curve. We also
construct IOp aware educational Great Gatsby curves and find that high income
IOp countries are also high educational IOp and less mobile countries.
arXiv link: http://arxiv.org/abs/2212.02407v3
A Data Fusion Approach for Ride-sourcing Demand Estimation: A Discrete Choice Model with Sampling and Endogeneity Corrections
rapidly in the last decade. Understanding the demand for these services is
essential for planning and managing modern transportation systems. Existing
studies develop statistical models for ride-sourcing demand estimation at an
aggregate level due to limited data availability. These models lack foundations
in microeconomic theory, ignore competition of ride-sourcing with other travel
modes, and cannot be seamlessly integrated into existing individual-level
(disaggregate) activity-based models to evaluate system-level impacts of
ride-sourcing services. In this paper, we present and apply an approach for
estimating ride-sourcing demand at a disaggregate level using discrete choice
models and multiple data sources. We first construct a sample of trip-based
mode choices in Chicago, USA by enriching household travel survey with publicly
available ride-sourcing and taxi trip records. We then formulate a multivariate
extreme value-based discrete choice with sampling and endogeneity corrections
to account for the construction of the estimation sample from multiple data
sources and endogeneity biases arising from supply-side constraints and surge
pricing mechanisms in ride-sourcing systems. Our analysis of the constructed
dataset reveals insights into the influence of various socio-economic, land use
and built environment features on ride-sourcing demand. We also derive
elasticities of ride-sourcing demand relative to travel cost and time. Finally,
we illustrate how the developed model can be employed to quantify the welfare
implications of ride-sourcing policies and regulations such as terminating
certain types of services and introducing ride-sourcing taxes.
arXiv link: http://arxiv.org/abs/2212.02178v1
Counterfactual Learning with General Data-generating Policies
counterfactual policies using log data from a different policy. We extend its
applicability by developing an OPE method for a class of both full support and
deficient support logging policies in contextual-bandit settings. This class
includes deterministic bandit (such as Upper Confidence Bound) as well as
deterministic decision-making based on supervised and unsupervised learning. We
prove that our method's prediction converges in probability to the true
performance of a counterfactual policy as the sample size increases. We
validate our method with experiments on partly and entirely deterministic
logging policies. Finally, we apply it to evaluate coupon targeting policies by
a major online platform and show how to improve the existing policy.
arXiv link: http://arxiv.org/abs/2212.01925v1
mCube: Multinomial Micro-level reserving Model
denoted mCube. We propose a unified framework for modelling the time and the
payment process for IBNR and RBNS claims and for modeling IBNR claim counts. We
use multinomial distributions for the time process and spliced mixture models
for the payment process. We illustrate the excellent performance of the
proposed model on a real data set of a major insurance company consisting of
bodily injury claims. It is shown that the proposed model produces a best
estimate distribution that is centered around the true reserve.
arXiv link: http://arxiv.org/abs/2212.00101v1
Incorporating Prior Knowledge of Latent Group Structure in Panel Data Models
models. We develop a constrained Bayesian grouped estimator that exploits
researchers' prior beliefs on groups in a form of pairwise constraints,
indicating whether a pair of units is likely to belong to a same group or
different groups. We propose a prior to incorporate the pairwise constraints
with varying degrees of confidence. The whole framework is built on the
nonparametric Bayesian method, which implicitly specifies a distribution over
the group partitions, and so the posterior analysis takes the uncertainty of
the latent group structure into account. Monte Carlo experiments reveal that
adding prior knowledge yields more accurate estimates of coefficient and scores
predictive gains over alternative estimators. We apply our method to two
empirical applications. In a first application to forecasting U.S. CPI
inflation, we illustrate that prior knowledge of groups improves density
forecasts when the data is not entirely informative. A second application
revisits the relationship between a country's income and its democratic
transition; we identify heterogeneous income effects on democracy with five
distinct groups over ninety countries.
arXiv link: http://arxiv.org/abs/2211.16714v3
Score-based calibration testing for multivariate forecast distributions
routinely used to assess the quality of univariate distributional forecasts.
However, PIT-based calibration tests for multivariate distributional forecasts
face various challenges. We propose two new types of tests based on proper
scoring rules, which overcome these challenges. They arise from a general
framework for calibration testing in the multivariate case, introduced in this
work. The new tests have good size and power properties in simulations and
solve various problems of existing tests. We apply the tests to forecast
distributions for macroeconomic and financial time series data.
arXiv link: http://arxiv.org/abs/2211.16362v3
Double Robust Bayesian Inference on Average Treatment Effects
treatment effect (ATE) under unconfoundedness. For our new Bayesian approach,
we first adjust the prior distributions of the conditional mean functions, and
then correct the posterior distribution of the resulting ATE. Both adjustments
make use of pilot estimators motivated by the semiparametric influence function
for ATE estimation. We prove asymptotic equivalence of our Bayesian procedure
and efficient frequentist ATE estimators by establishing a new semiparametric
Bernstein-von Mises theorem under double robustness; i.e., the lack of
smoothness of conditional mean functions can be compensated by high regularity
of the propensity score and vice versa. Consequently, the resulting Bayesian
credible sets form confidence intervals with asymptotically exact coverage
probability. In simulations, our method provides precise point estimates of the
ATE through the posterior mean and credible intervals that closely align with
the nominal coverage probability. Furthermore, our approach achieves a shorter
interval length in comparison to existing methods. We illustrate our method in
an application to the National Supported Work Demonstration following LaLonde
[1986] and Dehejia and Wahba [1999].
arXiv link: http://arxiv.org/abs/2211.16298v6
Bayesian Multivariate Quantile Regression with alternative Time-varying Volatility Specifications
forecast the tail behavior of energy commodities, where the homoskedasticity
assumption is relaxed to allow for time-varying volatility. In particular, we
exploit the mixture representation of the multivariate asymmetric Laplace
likelihood and the Cholesky-type decomposition of the scale matrix to introduce
stochastic volatility and GARCH processes and then provide an efficient MCMC to
estimate them. The proposed models outperform the homoskedastic benchmark
mainly when predicting the distribution's tails. We provide a model combination
using a quantile score-based weighting scheme, which leads to improved
performances, notably when no single model uniformly outperforms the other
across quantiles, time, or variables.
arXiv link: http://arxiv.org/abs/2211.16121v2
Synthetic Principal Component Design: Fast Covariate Balancing with Synthetic Controls
combinatorial optimization problem. In this paper, we aim to develop a globally
convergent and practically efficient optimization algorithm. Specifically, we
consider a setting where the pre-treatment outcome data is available and the
synthetic control estimator is invoked. The average treatment effect is
estimated via the difference between the weighted average outcomes of the
treated and control units, where the weights are learned from the observed
data. {Under this setting, we surprisingly observed that the optimal
experimental design problem could be reduced to a so-called phase
synchronization problem.} We solve this problem via a normalized variant of
the generalized power method with spectral initialization. On the theoretical
side, we establish the first global optimality guarantee for experiment design
when pre-treatment data is sampled from certain data-generating processes.
Empirically, we conduct extensive experiments to demonstrate the effectiveness
of our method on both the US Bureau of Labor Statistics and the
Abadie-Diemond-Hainmueller California Smoking Data. In terms of the root mean
square error, our algorithm surpasses the random design by a large margin.
arXiv link: http://arxiv.org/abs/2211.15241v1
Inference in Cluster Randomized Trials with Matched Pairs
status is determined according to a "matched pairs" design. Here, by a cluster
randomized experiment, we mean one in which treatment is assigned at the level
of the cluster; by a "matched pairs" design, we mean that a sample of clusters
is paired according to baseline, cluster-level covariates and, within each
pair, one cluster is selected at random for treatment. We study the
large-sample behavior of a weighted difference-in-means estimator and derive
two distinct sets of results depending on if the matching procedure does or
does not match on cluster size. We then propose a single variance estimator
which is consistent in either regime. Combining these results establishes the
asymptotic exactness of tests based on these estimators. Next, we consider the
properties of two common testing procedures based on t-tests constructed from
linear regressions, and argue that both are generally conservative in our
framework. We additionally study the behavior of a randomization test which
permutes the treatment status for clusters within pairs, and establish its
finite-sample and asymptotic validity for testing specific null hypotheses.
Finally, we propose a covariate-adjusted estimator which adjusts for additional
baseline covariates not used for treatment assignment, and establish conditions
under which such an estimator leads to strict improvements in precision. A
simulation study confirms the practical relevance of our theoretical results.
arXiv link: http://arxiv.org/abs/2211.14903v5
Extreme Changes in Changes
outcomes, such as infants with extremely low birth weights. Existing
changes-in-changes (CIC) estimators are tailored to middle quantiles and do not
work well for such subpopulations. This paper proposes a new CIC estimator to
accurately estimate treatment effects at extreme quantiles. With its asymptotic
normality, we also propose a method of statistical inference, which is simple
to implement. Based on simulation studies, we propose to use our extreme CIC
estimator for extreme, such as below 5% and above 95%, quantiles, while the
conventional CIC estimator should be used for intermediate quantiles. Applying
the proposed method, we study the effects of income gains from the 1993 EITC
reform on infant birth weights for those in the most critical conditions. This
paper is accompanied by a Stata command.
arXiv link: http://arxiv.org/abs/2211.14870v2
Machine Learning Algorithms for Time Series Analysis and Forecasting
health evolution metrics. The ability to deal with this data has become a
necessity, and time series analysis and forecasting are used for the same.
Every Machine Learning enthusiast would consider these as very important tools,
as they deepen the understanding of the characteristics of data. Forecasting is
used to predict the value of a variable in the future, based on its past
occurrences. A detailed survey of the various methods that are used for
forecasting has been presented in this paper. The complete process of
forecasting, from preprocessing to validation has also been explained
thoroughly. Various statistical and deep learning models have been considered,
notably, ARIMA, Prophet and LSTMs. Hybrid versions of Machine Learning models
have also been explored and elucidated. Our work can be used by anyone to
develop a good understanding of the forecasting process, and to identify
various state of the art models which are being used today.
arXiv link: http://arxiv.org/abs/2211.14387v1
A Design-Based Approach to Spatial Correlation
finite population framework, we identify three channels of spatial correlation:
sampling scheme, assignment design, and model specification. The
Eicker-Huber-White standard error, the cluster-robust standard error, and the
spatial heteroskedasticity and autocorrelation consistent standard error are
compared under different combinations of the three channels. Then, we provide
guidelines for whether standard errors should be adjusted for spatial
correlation for both linear and nonlinear estimators. As it turns out, the
answer to this question also depends on the magnitude of the sampling
probability.
arXiv link: http://arxiv.org/abs/2211.14354v1
Strategyproof Decision-Making in Panel Data Settings and Beyond
decision-maker gets noisy, repeated measurements of multiple units (or agents).
We consider a setup where there is a pre-intervention period, when the
principal observes the outcomes of each unit, after which the principal uses
these observations to assign a treatment to each unit. Unlike this classical
setting, we permit the units generating the panel data to be strategic, i.e.
units may modify their pre-intervention outcomes in order to receive a more
desirable intervention. The principal's goal is to design a strategyproof
intervention policy, i.e. a policy that assigns units to their
utility-maximizing interventions despite their potential strategizing. We first
identify a necessary and sufficient condition under which a strategyproof
intervention policy exists, and provide a strategyproof mechanism with a simple
closed form when one does exist. Along the way, we prove impossibility results
for strategic multiclass classification, which may be of independent interest.
When there are two interventions, we establish that there always exists a
strategyproof mechanism, and provide an algorithm for learning such a
mechanism. For three or more interventions, we provide an algorithm for
learning a strategyproof mechanism if there exists a sufficiently large gap in
the principal's rewards between different interventions. Finally, we
empirically evaluate our model using real-world panel data collected from
product sales over 18 months. We find that our methods compare favorably to
baselines which do not take strategic interactions into consideration, even in
the presence of model misspecification.
arXiv link: http://arxiv.org/abs/2211.14236v4
Spectral estimation for mixed causal-noncausal autoregressive models
noncausal, and mixed causal-noncausal autoregressive models driven by a
non-Gaussian error sequence. We do not assume any parametric distribution
function for the innovations. Instead, we use the information of higher-order
cumulants, combining the spectrum and the bispectrum in a minimum distance
estimation. We show how to circumvent the nonlinearity of the parameters and
the multimodality in the noncausal and mixed models by selecting the
appropriate initial values in the estimation. In addition, we propose a method
of identification using a simple comparison criterion based on the global
minimum of the estimation function. By means of a Monte Carlo study, we find
unbiased estimated parameters and a correct identification as the data depart
from normality. We propose an empirical application on eight monthly commodity
prices, finding noncausal and mixed causal-noncausal dynamics.
arXiv link: http://arxiv.org/abs/2211.13830v1
Cross-Sectional Dynamics Under Network Structure: Theory and Macroeconomic Applications
develop an econometric framework that rationalizes the dynamics of
cross-sectional variables as the innovation transmission along fixed bilateral
links and that can accommodate rich patterns of how network effects of higher
order accumulate over time. The proposed Network-VAR (NVAR) can be used to
estimate dynamic network effects, with the network given or inferred from
dynamic cross-correlations in the data. In the latter case, it also offers a
dimensionality-reduction technique for modeling high-dimensional
(cross-sectional) processes, owing to networks' ability to summarize complex
relations among variables (units) by relatively few bilateral links. In a first
application, I show that sectoral output growth in an RBC economy with lagged
input-output conversion follows an NVAR. I characterize impulse-responses to
TFP shocks in this environment, and I estimate that the lagged transmission of
productivity shocks along supply chains can account for a third of the
persistence in aggregate output growth. The remainder is due to persistence in
the aggregate TFP process, leaving a negligible role for persistence in
sectoral TFP. In a second application, I forecast macroeconomic aggregates
across OECD countries by assuming and estimating a network that underlies the
dynamics. In line with an equivalence result I provide, this reduces
out-of-sample mean squared errors relative to a dynamic factor model. The
reductions range from -12% for quarterly real GDP growth to -68% for monthly
CPI inflation.
arXiv link: http://arxiv.org/abs/2211.13610v6
Simulation-based Forecasting for Intraday Power Markets: Modelling Fundamental Drivers for Location, Shape and Scale of the Price Distribution
for balancing forecast errors due to the rising volumes of intermittent
renewable generation. However, compared to day-ahead markets, the drivers for
the intraday price process are still sparsely researched. In this paper, we
propose a modelling strategy for the location, shape and scale parameters of
the return distribution in intraday markets, based on fundamental variables. We
consider wind and solar forecasts and their intraday updates, outages, price
information and a novel measure for the shape of the merit-order, derived from
spot auction curves as explanatory variables. We validate our modelling by
simulating price paths and compare the probabilistic forecasting performance of
our model to benchmark models in a forecasting study for the German market. The
approach yields significant improvements in the forecasting performance,
especially in the tails of the distribution. At the same time, we are able to
derive the contribution of the driving variables. We find that, apart from the
first lag of the price changes, none of our fundamental variables have
explanatory power for the expected value of the intraday returns. This implies
weak-form market efficiency as renewable forecast changes and outage
information seems to be priced in by the market. We find that the volatility is
driven by the merit-order regime, the time to delivery and the closure of
cross-border order books. The tail of the distribution is mainly influenced by
past price differences and trading activity. Our approach is directly
transferable to other continuous intraday markets in Europe.
arXiv link: http://arxiv.org/abs/2211.13002v1
Macroeconomic Effects of Active Labour Market Policies: A Novel Instrumental Variables Approach
policies (ALMP) in Germany over the period 2005 to 2018. We propose a novel
identification strategy to overcome the simultaneity of ALMP and labour market
outcomes at the regional level. It exploits the imperfect overlap of local
labour markets and local employment agencies that decide on the local
implementation of policies. Specifically, we instrument for the use of ALMP in
a local labour market with the mix of ALMP implemented outside this market but
in local employment agencies that partially overlap with this market. We find
no effects of short-term activation measures and further vocational training on
aggregate labour market outcomes. In contrast, wage subsidies substantially
increase the share of workers in unsubsidised employment while lowering
long-term unemployment and welfare dependency. Our results suggest that
negative externalities of ALMP partially offset the effects for program
participants and that some segments of the labour market benefit more than
others.
arXiv link: http://arxiv.org/abs/2211.12437v1
Peer Effects in Labor Market Training
market training programs for jobseekers. Using rich administrative data from
Germany and a novel measure of employability, I find that participants benefit
from greater average exposure to highly employable peers through increased
long-term employment and earnings. The effects vary significantly by own
employability: jobseekers with a low employability experience larger long-term
gains, whereas highly employable individuals benefit primarily in the short
term through higher entry wages. An analysis of mechanisms suggests that
within-group competition in job search attenuates part of the positive effects
that operate through knowledge spillovers.
arXiv link: http://arxiv.org/abs/2211.12366v3
Asymptotic Properties of the Synthetic Control Method
synthetic control method (SCM). We show that the synthetic control (SC) weight
converges to a limiting weight that minimizes the mean squared prediction risk
of the treatment-effect estimator when the number of pretreatment periods goes
to infinity, and we also quantify the rate of convergence. Observing the link
between the SCM and model averaging, we further establish the asymptotic
optimality of the SC estimator under imperfect pretreatment fit, in the sense
that it achieves the lowest possible squared prediction error among all
possible treatment effect estimators that are based on an average of control
units, such as matching, inverse probability weighting and
difference-in-differences. The asymptotic optimality holds regardless of
whether the number of control units is fixed or divergent. Thus, our results
provide justifications for the SCM in a wide range of applications. The
theoretical results are verified via simulations.
arXiv link: http://arxiv.org/abs/2211.12095v1
Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning
learn a targeted treatment assignment policy, where the goal is to use a
participant's survey responses to determine which charity to expose them to in
a donation solicitation. The design balances two competing objectives:
optimizing the outcomes for the subjects in the experiment (“cumulative regret
minimization”) and gathering data that will be most useful for policy
learning, that is, for learning an assignment rule that will maximize welfare
if used after the experiment (“simple regret minimization”). We evaluate
alternative experimental designs by collecting pilot data and then conducting a
simulation study. Next, we implement our selected algorithm. Finally, we
perform a second simulation study anchored to the collected data that evaluates
the benefits of the algorithm we chose. Our first result is that the value of a
learned policy in this setting is higher when data is collected via a uniform
randomization rather than collected adaptively using standard cumulative regret
minimization or policy learning algorithms. We propose a simple heuristic for
adaptive experimentation that improves upon uniform randomization from the
perspective of policy learning at the expense of increasing cumulative regret
relative to alternative bandit algorithms. The heuristic modifies an existing
contextual bandit algorithm by (i) imposing a lower bound on assignment
probabilities that decay slowly so that no arm is discarded too quickly, and
(ii) after adaptively collecting data, restricting policy learning to select
from arms where sufficient data has been gathered.
arXiv link: http://arxiv.org/abs/2211.12004v1
A Misuse of Specification Tests
Hausman tests and overidentifying restrictions tests, to assess the validity of
estimators rather than that of models. This paper examines the effectiveness of
such specification pretests in detecting invalid estimators. We analyze the
local asymptotic properties of test statistics and estimators and show that
locally unbiased specification tests cannot determine whether asymptotically
efficient estimators are asymptotically biased. In particular, an estimator may
remain valid even when the null hypothesis of correct model specification is
false, and it may be invalid even when the null hypothesis is true. The main
message of the paper is that correct model specification and valid estimation
are distinct issues: correct specification is neither necessary nor sufficient
for asymptotically unbiased estimation.
arXiv link: http://arxiv.org/abs/2211.11915v2
Structural Modelling of Dynamic Networks and Identifying Maximum Likelihood
interest is a nonnegative matrix characterizing the network (contagion)
effects. This network matrix is usually constrained either by assuming a
limited number of nonzero elements (sparsity), or by considering a reduced rank
approach for nonnegative matrix factorization (NMF). We follow the latter
approach and develop a new probabilistic NMF method. We introduce a new
Identifying Maximum Likelihood (IML) method for consistent estimation of the
identified set of admissible NMF's and derive its asymptotic distribution.
Moreover, we propose a maximum likelihood estimator of the parameter matrix for
a given non-negative rank, derive its asymptotic distribution and the
associated efficiency bound.
arXiv link: http://arxiv.org/abs/2211.11876v1
Fractional integration and cointegration
fractional integration and cointegration literature. We do not attempt to give
a complete survey of this enormous literature, but rather a more introductory
treatment suitable for a researcher or graduate student wishing to learn about
this exciting field of research. With this aim, we have surely overlooked many
relevant references for which we apologize in advance. Knowledge of standard
time series methods, and in particular methods related to nonstationary time
series, at the level of a standard graduate course or advanced undergraduate
course is assumed.
arXiv link: http://arxiv.org/abs/2211.10235v1
Cointegration with Occasionally Binding Constraints
relates to how a (nonlinear) vector autoregression, which provides a unified
description of the short- and long-run dynamics of a vector of time series, can
generate 'nonlinear cointegration' in the profound sense of those series
sharing common nonlinear stochastic trends. We consider this problem in the
setting of the censored and kinked structural VAR (CKSVAR), which provides a
flexible yet tractable framework within which to model time series that are
subject to threshold-type nonlinearities, such as those arising due to
occasionally binding constraints, of which the zero lower bound (ZLB) on
short-term nominal interest rates provides a leading example. We provide a
complete characterisation of how common linear and nonlinear stochastic trends
may be generated in this model, via unit roots and appropriate generalisations
of the usual rank conditions, providing the first extension to date of the
Granger-Johansen representation theorem to a nonlinearly cointegrated setting,
and thereby giving the first successful treatment of the open problem. The
limiting common trend processes include regulated, censored and kinked Brownian
motions, none of which have previously appeared in the literature on
cointegrated VARs. Our results and running examples illustrate that the CKSVAR
is capable of supporting a far richer variety of long-run behaviour than is a
linear VAR, in ways that may be particularly useful for the identification of
structural parameters.
arXiv link: http://arxiv.org/abs/2211.09604v4
On the Role of the Zero Conditional Mean Assumption for Causal Inference in Linear Models
regressors and the error term, the OLS parameters have a causal interpretation.
We show that even when this assumption is satisfied, OLS might identify a
pseudo-parameter that does not have a causal interpretation. Even assuming that
the linear model is "structural" creates some ambiguity in what the regression
error represents and whether the OLS estimand is causal. This issue applies
equally to linear IV and panel data models. To give these estimands a causal
interpretation, one needs to impose assumptions on a "causal" model, e.g.,
using the potential outcome framework. This highlights that causal inference
requires causal, and not just stochastic, assumptions.
arXiv link: http://arxiv.org/abs/2211.09502v1
Estimating Dynamic Spillover Effects along Multiple Networks in a Linear Panel Model
distinguishing their separate roles is important in empirical research. For
example, the direction of spillover between two groups (such as banks and
industrial sectors linked in a bipartite graph) has important economic
implications, and a researcher may want to learn which direction is supported
in the data. For this, we need to have an empirical methodology that allows for
both directions of spillover simultaneously. In this paper, we develop a
dynamic linear panel model and asymptotic inference with large $n$ and small
$T$, where both directions of spillover are accommodated through multiple
networks. Using the methodology developed here, we perform an empirical study
of spillovers between bank weakness and zombie-firm congestion in industrial
sectors, using firm-bank matched data from Spain between 2005 and 2012.
Overall, we find that there is positive spillover in both directions between
banks and sectors.
arXiv link: http://arxiv.org/abs/2211.08995v1
Causal Bandits: Online Decision-Making in Endogenous Settings
economic applications. However, regret guarantees for even state-of-the-art
linear bandit algorithms (such as Optimism in the Face of Uncertainty Linear
bandit (OFUL)) make strong exogeneity assumptions w.r.t. arm covariates. This
assumption is very often violated in many economic contexts and using such
algorithms can lead to sub-optimal decisions. Further, in social science
analysis, it is also important to understand the asymptotic distribution of
estimated parameters. To this end, in this paper, we consider the problem of
online learning in linear stochastic contextual bandit problems with endogenous
covariates. We propose an algorithm we term $\epsilon$-BanditIV, that uses
instrumental variables to correct for this bias, and prove an
$\mathcal{O}(kT)$ upper bound for the expected regret of the
algorithm. Further, we demonstrate the asymptotic consistency and normality of
the $\epsilon$-BanditIV estimator. We carry out extensive Monte Carlo
simulations to demonstrate the performance of our algorithms compared to other
methods. We show that $\epsilon$-BanditIV significantly outperforms other
existing methods in endogenous settings. Finally, we use data from real-time
bidding (RTB) system to demonstrate how $\epsilon$-BanditIV can be used to
estimate the causal impact of advertising in such settings and compare its
performance with other existing methods.
arXiv link: http://arxiv.org/abs/2211.08649v2
Robust estimation for Threshold Autoregressive Moving-Average models
series analysis due to their ability to parsimoniously describe several complex
dynamical features. However, neither theory nor estimation methods are
currently available when the data present heavy tails or anomalous
observations, which is often the case in applications. In this paper, we
provide the first theoretical framework for robust M-estimation for TARMA
models and also study its practical relevance. Under mild conditions, we show
that the robust estimator for the threshold parameter is super-consistent,
while the estimators for autoregressive and moving-average parameters are
strongly consistent and asymptotically normal. The Monte Carlo study shows that
the M-estimator is superior, in terms of both bias and variance, to the least
squares estimator, which can be heavily affected by outliers. The findings
suggest that robust M-estimation should be generally preferred to the least
squares method. Finally, we apply our methodology to a set of commodity price
time series; the robust TARMA fit presents smaller standard errors and leads to
superior forecasting accuracy compared to the least squares fit. The results
support the hypothesis of a two-regime, asymmetric nonlinearity around zero,
characterised by slow expansions and fast contractions.
arXiv link: http://arxiv.org/abs/2211.08205v1
Identification and Auto-debiased Machine Learning for Outcome Conditioned Average Structural Derivatives
outcome conditioned average structural derivatives (OASD) in a general
nonseparable model. OASD is the average partial effect of a marginal change in
a continuous treatment on the individuals located at different parts of the
outcome distribution, irrespective of individuals' characteristics. OASD
combines both features of ATE and QTE: it is interpreted as straightforwardly
as ATE while at the same time more granular than ATE by breaking the entire
population up according to the rank of the outcome distribution.
One contribution of this paper is that we establish some close relationships
between the outcome conditioned average partial effects and a class of
parameters measuring the effect of counterfactually changing the distribution
of a single covariate on the unconditional outcome quantiles. By exploiting
such relationship, we can obtain root-$n$ consistent estimator and calculate
the semi-parametric efficiency bound for these counterfactual effect
parameters. We illustrate this point by two examples: equivalence between OASD
and the unconditional partial quantile effect (Firpo et al. (2009)), and
equivalence between the marginal partial distribution policy effect (Rothe
(2012)) and a corresponding outcome conditioned parameter.
Because identification of OASD is attained under a conditional exogeneity
assumption, by controlling for a rich information about covariates, a
researcher may ideally use high-dimensional controls in data. We propose for
OASD a novel automatic debiased machine learning estimator, and present
asymptotic statistical guarantees for it. We prove our estimator is root-$n$
consistent, asymptotically normal, and semiparametrically efficient. We also
prove the validity of the bootstrap procedure for uniform inference on the OASD
process.
arXiv link: http://arxiv.org/abs/2211.07903v1
Graph Neural Networks for Causal Inference Under Network Confounding
large network. We consider a nonparametric model with interference in potential
outcomes and selection into treatment. Both stages may be the outcomes of
simultaneous equation models, which allow for endogenous peer effects. This
results in high-dimensional network confounding where the network and
covariates of all units constitute sources of selection bias. In contrast, the
existing literature assumes that confounding can be summarized by a known,
low-dimensional function of these objects. We propose to use graph neural
networks (GNNs) to adjust for network confounding. When interference decays
with network distance, we argue that the model has low-dimensional structure
that makes estimation feasible and justifies the use of shallow GNN
architectures.
arXiv link: http://arxiv.org/abs/2211.07823v4
Type I Tobit Bayesian Additive Regression Trees for Censored Outcome Regression
Methods that do not account for censoring produce biased predictions of the
unobserved outcome. This paper introduces Type I Tobit Bayesian Additive
Regression Tree (TOBART-1) models for censored outcomes. Simulation results and
real data applications demonstrate that TOBART-1 produces accurate predictions
of censored outcomes. TOBART-1 provides posterior intervals for the conditional
expectation and other quantities of interest. The error term distribution can
have a large impact on the expectation of the censored outcome. Therefore the
error is flexibly modeled as a Dirichlet process mixture of normal
distributions.
arXiv link: http://arxiv.org/abs/2211.07506v4
Robust Difference-in-differences Models
effects on the treated (ATT) under mainly the so-called parallel trends (PT)
assumption. The most common and widely used approach to justify the PT
assumption is the pre-treatment period examination. If a null hypothesis of the
same trend in the outcome means for both treatment and control groups in the
pre-treatment periods is rejected, researchers believe less in PT and the DID
results. This paper develops a robust generalized DID method that utilizes all
the information available not only from the pre-treatment periods but also from
multiple data sources. Our approach interprets PT in a different way using a
notion of selection bias, which enables us to generalize the standard DID
estimand by defining an information set that may contain multiple pre-treatment
periods or other baseline covariates. Our main assumption states that the
selection bias in the post-treatment period lies within the convex hull of all
selection biases in the pre-treatment periods. We provide a sufficient
condition for this assumption to hold. Based on the baseline information set we
construct, we provide an identified set for the ATT that always contains the
true ATT under our identifying assumption, and also the standard DID estimand.
We extend our proposed approach to multiple treatment periods DID settings. We
propose a flexible and easy way to implement the method. Finally, we illustrate
our methodology through some numerical and empirical examples.
arXiv link: http://arxiv.org/abs/2211.06710v5
Multiple Structural Breaks in Interactive Effects Panel Data and the Impact of Quantitative Easing on Bank Lending
panel data models with interactive effects. The toolbox includes tests for the
presence of structural breaks, a break date estimator, and a break date
confidence interval. The new toolbox is applied to a large panel of US banks
for a period characterized by massive quantitative easing programs aimed at
lessening the impact of the global financial crisis and the COVID--19 pandemic.
The question we ask is: Have these programs been successful in spurring bank
lending in the US economy? The short answer turns out to be: “No”.
arXiv link: http://arxiv.org/abs/2211.06707v2
A Residuals-Based Nonparametric Variance Ratio Test for Cointegration
Econometrics 108, 343-363) nonparameteric variance ratio unit root test when
applied to regression residuals. The test requires neither the specification of
the correlation structure in the data nor the choice of tuning parameters.
Compared with popular residuals-based no-cointegration tests, the variance
ratio test is less prone to size distortions but has smaller local asymptotic
power. However, this paper shows that local asymptotic power properties do not
serve as a useful indicator for the power of residuals-based no-cointegration
tests in finite samples. In terms of size-corrected power, the variance ratio
test performs relatively well and, in particular, does not suffer from power
reversal problems detected for, e.g., the frequently used augmented
Dickey-Fuller type no-cointegration test. An application to daily prices of
cryptocurrencies illustrates the usefulness of the variance ratio test in
practice.
arXiv link: http://arxiv.org/abs/2211.06288v3
Bayesian Neural Networks for Macroeconomic Analysis
(small T), many time series (big K) but also by featuring temporal dependence.
Neural networks, by contrast, are designed for datasets with millions of
observations and covariates. In this paper, we develop Bayesian neural networks
(BNNs) that are well-suited for handling datasets commonly used for
macroeconomic analysis in policy institutions. Our approach avoids extensive
specification searches through a novel mixture specification for the activation
function that appropriately selects the form of nonlinearities. Shrinkage
priors are used to prune the network and force irrelevant neurons to zero. To
cope with heteroskedasticity, the BNN is augmented with a stochastic volatility
model for the error term. We illustrate how the model can be used in a policy
institution by first showing that our different BNNs produce precise density
forecasts, typically better than those from other machine learning methods.
Finally, we showcase how our model can be used to recover nonlinearities in the
reaction of macroeconomic aggregates to financial shocks.
arXiv link: http://arxiv.org/abs/2211.04752v4
Crises Do Not Cause Lower Short-Term Growth
country during the two-year recession period, which can be reflected by their
post-crisis GDP growth. However, by contrasting a causal model with a standard
prediction model, this paper argues that such a belief is non-causal. To make
causal inferences, we design a two-stage staggered difference-in-differences
model to estimate the average treatment effects. Interpreting the residuals as
the contribution of each crisis to the treatment effects, we astonishingly
conclude that cross-sectional crises are often limited to providing relevant
causal information to policymakers.
arXiv link: http://arxiv.org/abs/2211.04558v3
On the Past, Present, and Future of the Diebold-Yilmaz Approach to Dynamic Network Connectedness
connectedness research program, combined with personal recollections of its
development. Its centerpiece in many respects is Diebold and Yilmaz (2014),
around which our discussion is organized.
arXiv link: http://arxiv.org/abs/2211.04184v2
Bootstraps for Dynamic Panel Threshold Models
panel threshold regression. We demonstrate that the standard nonparametric
bootstrap is inconsistent for the first-differenced generalized method of
moments (GMM) estimator. The inconsistency arises from an $n^{1/4}$-consistent
non-normal asymptotic distribution of the threshold estimator when the true
parameter lies in the continuity region of the parameter space, which stems
from the rank deficiency of the approximate Jacobian of the sample moment
conditions on the continuity region. To address this, we propose a grid
bootstrap to construct confidence intervals for the threshold and a residual
bootstrap to construct confidence intervals for the coefficients. They are
shown to be valid regardless of the model's continuity. Moreover, we establish
a uniform validity for the grid bootstrap. A set of Monte Carlo experiments
demonstrates that the proposed bootstraps improve upon the standard
nonparametric bootstrap. An empirical application to a firm investment model
illustrates our methods.
arXiv link: http://arxiv.org/abs/2211.04027v4
Fast, Robust Inference for Linear Instrumental Variables Models using Self-Normalized Moments
variables models which is simultaneously robust and computationally tractable.
Inference is based on self-normalization of sample moment conditions, and
allows for (but does not require) many (relative to the sample size), weak,
potentially invalid or potentially endogenous instruments, as well as for many
regressors and conditional heteroskedasticity. Our coverage results are uniform
and can deliver a small sample guarantee. We develop a new computational
approach based on semidefinite programming, which we show can equally be
applied to rapidly invert existing tests (e.g,. AR, LM, CLR, etc.).
arXiv link: http://arxiv.org/abs/2211.02249v3
Boosted p-Values for High-Dimensional Vector Autoregression
step in high-dimensional vector autoregression modeling. Using the
least-squares boosting method, we compute the p-value for each selected
parameter at every boosting step in a linear model. The p-values are
asymptotically valid and also adapt to the iterative nature of the boosting
procedure. Our simulation experiment shows that the p-values can keep false
positive rate under control in high-dimensional vector autoregressions. In an
application with more than 100 macroeconomic time series, we further show that
the p-values can not only select a sparser model with good prediction
performance but also help control model stability. A companion R package
boostvar is developed.
arXiv link: http://arxiv.org/abs/2211.02215v2
Asymptotic Theory of Principal Component Analysis for High-Dimensional Time Series Data under a Factor Structure
model for a panel of $n$ stationary time series and we provide new derivations
of the asymptotic properties of the estimators, which are derived under a
minimal set of assumptions requiring only the existence of 4th order moments.
To this end, we also review various alternative sets of primitive sufficient
conditions for mean-squared consistency of the sample covariance matrix.
Finally, we discuss in detail the issue of identification of the loadings and
factors as well as its implications for inference.
arXiv link: http://arxiv.org/abs/2211.01921v4
Are Synthetic Control Weights Balancing Score?
Synthetic Control (SC) weights emulates a randomized control trial where the
treatment status is independent of potential outcomes. Specifically, I
demonstrate that if there exist SC weights such that (i) the treatment effects
are exactly identified and (ii) these weights are uniformly and cumulatively
bounded, then SC weights are balancing scores.
arXiv link: http://arxiv.org/abs/2211.01575v1
Estimating interaction effects with panel data
under economically plausible assumptions in linear panel models with a fixed
$T$-dimension. We advocate for a correlated interaction term estimator
(CITE) and show that it is consistent under conditions that are not sufficient
for consistency of the interaction term estimator that is most common in
applied econometric work. Our paper discusses the empirical content of these
conditions, shows that standard inference procedures can be applied to CITE,
and analyzes consistency, relative efficiency, inference, and their finite
sample properties in a simulation study. In an empirical application, we test
whether labor displacement effects of robots are stronger in countries at
higher income levels. The results are in line with our theoretical and
simulation results and indicate that standard interaction term estimation
underestimates the importance of a country's income level in the relationship
between robots and employment and may prematurely reject a null hypothesis
about interaction effects in the presence of misspecification.
arXiv link: http://arxiv.org/abs/2211.01557v2
A Systematic Paradigm for Detecting, Surfacing, and Characterizing Heterogeneous Treatment Effects (HTE)
investigate the heterogeneity of treatment effects. With the wide range of
users being treated over many online controlled experiments, the typical
approach of manually investigating each dimension of heterogeneity becomes
overly cumbersome and prone to subjective human biases. We need an efficient
way to search through thousands of experiments with hundreds of target
covariates and hundreds of breakdown dimensions. In this paper, we propose a
systematic paradigm for detecting, surfacing and characterizing heterogeneous
treatment effects. First, we detect if treatment effect variation is present in
an experiment, prior to specifying any breakdowns. Second, we surface the most
relevant dimensions for heterogeneity. Finally, we characterize the
heterogeneity beyond just the conditional average treatment effects (CATE) by
studying the conditional distributions of the estimated individual treatment
effects. We show the effectiveness of our methods using simulated data and
empirical studies.
arXiv link: http://arxiv.org/abs/2211.01547v1
Stochastic Treatment Choice with Empirical Welfare Updating
assignment rules. The method is designed to find rules that are stochastic,
reflecting uncertainty in estimation of an assignment rule and about its
welfare performance. Our approach is to form a prior distribution over
assignment rules, not over data generating processes, and to update this prior
based upon an empirical welfare criterion, not likelihood. The social planner
then assigns treatment by drawing a policy from the resulting posterior. We
show analytically a welfare-optimal way of updating the prior using empirical
welfare; this posterior is not feasible to compute, so we propose a variational
Bayes approximation for the optimal posterior. We characterise the welfare
regret convergence of the assignment rule based upon this variational Bayes
approximation, showing that it converges to zero at a rate of ln(n)/sqrt(n). We
apply our methods to experimental data from the Job Training Partnership Act
Study to illustrate the implementation of our methods.
arXiv link: http://arxiv.org/abs/2211.01537v3
A New Test for Market Efficiency and Uncovered Interest Parity
based on a dynamic regression approach. The method provides consistent and
asymptotically efficient parameter estimates, and is not dependent on
assumptions of strict exogeneity. This new approach is asymptotically more
efficient than the common approach of using OLS with HAC robust standard errors
in the static forward premium regression. The coefficient estimates when spot
return changes are regressed on the forward premium are all positive and
remarkably stable across currencies. These estimates are considerably larger
than those of previous studies, which frequently find negative coefficients.
The method also has the advantage of showing dynamic effects of risk premia, or
other events that may lead to rejection of UIP or the efficient markets
hypothesis.
arXiv link: http://arxiv.org/abs/2211.01344v1
Effects of syndication network on specialisation and performance of venture capital firms
financial subsector. Gaining a deeper understanding of the investment
behaviours of VC firms is crucial for the development of a more sustainable and
healthier market and economy. Contrasting evidence supports that either
specialisation or diversification helps to achieve a better investment
performance. However, the impact of the syndication network is overlooked.
Syndication network has a great influence on the propagation of information and
trust. By exploiting an authoritative VC dataset of thirty-five-year investment
information in China, we construct a joint-investment network of VC firms and
analyse the effects of syndication and diversification on specialisation and
investment performance. There is a clear correlation between the syndication
network degree and specialisation level of VC firms, which implies that the
well-connected VC firms are diversified. More connections generally bring about
more information or other resources, and VC firms are more likely to enter a
new stage or industry with some new co-investing VC firms when compared to a
randomised null model. Moreover, autocorrelation analysis of both
specialisation and success rate on the syndication network indicates that
clustering of similar VC firms is roughly limited to the secondary
neighbourhood. When analysing local clustering patterns, we discover that,
contrary to popular beliefs, there is no apparent successful club of investors.
In contrast, investors with low success rates are more likely to cluster. Our
discoveries enrich the understanding of VC investment behaviours and can assist
policymakers in designing better strategies to promote the development of the
VC industry.
arXiv link: http://arxiv.org/abs/2211.00873v1
Cover It Up! Bipartite Graphs Uncover Identifiability in Sparse Factor Analysis
attention has been given to formally address identifiability of these models
beyond standard rotation-based identification such as the positive lower
triangular constraint. To fill this gap, we present a counting rule on the
number of nonzero factor loadings that is sufficient for achieving generic
uniqueness of the variance decomposition in the factor representation. This is
formalized in the framework of sparse matrix spaces and some classical elements
from graph and network theory. Furthermore, we provide a computationally
efficient tool for verifying the counting rule. Our methodology is illustrated
for real data in the context of post-processing posterior draws in Bayesian
sparse factor analysis.
arXiv link: http://arxiv.org/abs/2211.00671v4
Population and Technological Growth: Evidence from Roe v. Wade
Supreme Court, which ruled most abortion restrictions unconstitutional. Our
identifying assumption is that states which had not liberalized their abortion
laws prior to Roe would experience a negative birth shock of greater proportion
than states which had undergone pre-Roe reforms. We estimate the
difference-in-difference in births and use estimated births as an exogenous
treatment variable to predict patents per capita. Our results show that one
standard deviation increase in cohort starting population increases per capita
patents by 0.24 standard deviation. These results suggest that at the margins,
increasing fertility can increase patent production. Insofar as patent
production is a sufficient proxy for technological growth, increasing births
has a positive impact on technological growth. This paper and its results do
not pertain to the issue of abortion itself.
arXiv link: http://arxiv.org/abs/2211.00410v1
Reservoir Computing for Macroeconomic Forecasting with Mixed Frequency Data
deal with large-scale datasets and series with unequal release periods.
MIxed-DAta Sampling (MIDAS) and Dynamic Factor Models (DFM) are the two main
state-of-the-art approaches that allow modeling series with non-homogeneous
frequencies. We introduce a new framework called the Multi-Frequency Echo State
Network (MFESN) based on a relatively novel machine learning paradigm called
reservoir computing. Echo State Networks (ESN) are recurrent neural networks
formulated as nonlinear state-space systems with random state coefficients
where only the observation map is subject to estimation. MFESNs are
considerably more efficient than DFMs and allow for incorporating many series,
as opposed to MIDAS models, which are prone to the curse of dimensionality. All
methods are compared in extensive multistep forecasting exercises targeting US
GDP growth. We find that our MFESN models achieve superior or comparable
performance over MIDAS and DFMs at a much lower computational cost.
arXiv link: http://arxiv.org/abs/2211.00363v3
Weak Identification in Low-Dimensional Factor Models with One or Two Factors
one or two factors to fit weak identification theory developed for generalized
method of moments models. Some identification-robust tests, here called
"plug-in" tests, require a reparameterization to distinguish weakly identified
parameters from strongly identified parameters. The reparameterizations in this
paper make plug-in tests available for subvector hypotheses in low-dimensional
factor models with one or two factors. Simulations show that the plug-in tests
are less conservative than identification-robust tests that use the original
parameterization. An empirical application to a factor model of parental
investments in children is included.
arXiv link: http://arxiv.org/abs/2211.00329v2
Shrinkage Methods for Treatment Choice
based on observed covariates. The most common decision rule is the conditional
empirical success (CES) rule proposed by Manski (2004), which assigns
individuals to treatments that yield the best experimental outcomes conditional
on the observed covariates. Conversely, using shrinkage estimators, which
shrink unbiased but noisy preliminary estimates toward the average of these
estimates, is a common approach in statistical estimation problems because it
is well-known that shrinkage estimators may have smaller mean squared errors
than unshrunk estimators. Inspired by this idea, we propose a computationally
tractable shrinkage rule that selects the shrinkage factor by minimizing an
upper bound of the maximum regret. Then, we compare the maximum regret of the
proposed shrinkage rule with those of the CES and pooling rules when the space
of conditional average treatment effects (CATEs) is correctly specified or
misspecified. Our theoretical results demonstrate that the shrinkage rule
performs well in many cases and these findings are further supported by
numerical experiments. Specifically, we show that the maximum regret of the
shrinkage rule can be strictly smaller than those of the CES and pooling rules
in certain cases when the space of CATEs is correctly specified. In addition,
we find that the shrinkage rule is robust against misspecification of the space
of CATEs. Finally, we apply our method to experimental data from the National
Job Training Partnership Act Study.
arXiv link: http://arxiv.org/abs/2210.17063v4
Non-Robustness of the Cluster-Robust Inference: with a Proposal of a New Robust Method
are vulnerable to data that contain a small number of large clusters. When a
researcher uses the 51 states in the U.S. as clusters, the largest cluster
(California) consists of about 10% of the total sample. Such a case in fact
violates the assumptions under which the widely used CR methods are guaranteed
to work. We formally show that the conventional CR methods fail if the
distribution of cluster sizes follows a power law with exponent less than two.
Besides the example of 51 state clusters, some examples are drawn from a list
of recent original research articles published in a top journal. In light of
these negative results about the existing CR methods, we propose a weighted CR
(WCR) method as a simple fix. Simulation studies support our arguments that the
WCR method is robust while the conventional CR methods are not.
arXiv link: http://arxiv.org/abs/2210.16991v3
Flexible machine learning estimation of conditional average treatment effects: a blessing and a curse
assumptions. If these assumptions apply, machine learning (ML) methods can be
used to study complex forms of causal effect heterogeneity. Recently, several
ML methods were developed to estimate the conditional average treatment effect
(CATE). If the features at hand cannot explain all heterogeneity, the
individual treatment effects (ITEs) can seriously deviate from the CATE. In
this work, we demonstrate how the distributions of the ITE and the CATE can
differ when a causal random forest (CRF) is applied. We extend the CRF to
estimate the difference in conditional variance between treated and controls.
If the ITE distribution equals the CATE distribution, this estimated difference
in variance should be small. If they differ, an additional causal assumption is
necessary to quantify the heterogeneity not captured by the CATE distribution.
The conditional variance of the ITE can be identified when the individual
effect is independent of the outcome under no treatment given the measured
features. Then, in the cases where the ITE and CATE distributions differ, the
extended CRF can appropriately estimate the variance of the ITE distribution
while the CRF fails to do so.
arXiv link: http://arxiv.org/abs/2210.16547v2
Spectral Representation Learning for Conditional Moment Models
framework of conditional moment models, which characterize the target function
through a collection of conditional moment restrictions. For nonparametric
conditional moment models, efficient estimation often relies on preimposed
conditions on various measures of ill-posedness of the hypothesis space, which
are hard to validate when flexible models are used. In this work, we address
this issue by proposing a procedure that automatically learns representations
with controlled measures of ill-posedness. Our method approximates a linear
representation defined by the spectral decomposition of a conditional
expectation operator, which can be used for kernelized estimators and is known
to facilitate minimax optimal estimation in certain settings. We show this
representation can be efficiently estimated from data, and establish L2
consistency for the resulting estimator. We evaluate the proposed method on
proximal causal inference tasks, exhibiting promising performance on
high-dimensional, semi-synthetic data.
arXiv link: http://arxiv.org/abs/2210.16525v2
Eigenvalue tests for the number of latent factors in short panels
cross-sectional factor model with small time dimension. These tests are based
on the eigenvalues of variance-covariance matrices of (possibly weighted) asset
returns, and rely on either the assumption of spherical errors, or instrumental
variables for factor betas. We establish the asymptotic distributional results
using expansion theorems based on perturbation theory for symmetric matrices.
Our framework accommodates semi-strong factors in the systematic components. We
propose a novel statistical test for weak factors against strong or semi-strong
factors. We provide an empirical application to US equity data. Evidence for a
different number of latent factors according to market downturns and market
upturns, is statistically ambiguous in the considered subperiods. In
particular, our results contradicts the common wisdom of a single factor model
in bear markets.
arXiv link: http://arxiv.org/abs/2210.16042v1
How to sample and when to stop sampling: The generalized Wald problem and minimax policies
aims to determine the best treatment for full scale implementation by (1)
adaptively allocating units between two possible treatments, and (2) stopping
the experiment when the expected welfare (inclusive of sampling costs) from
implementing the chosen treatment is maximized. Working under a continuous time
limit, we characterize the optimal policies under the minimax regret criterion.
We show that the same policies also remain optimal under both parametric and
non-parametric outcome distributions in an asymptotic regime where sampling
costs approach zero. The minimax optimal sampling rule is just the Neyman
allocation: it is independent of sampling costs and does not adapt to observed
outcomes. The decision-maker halts sampling when the product of the average
treatment difference and the number of observations surpasses a specific
threshold. The results derived also apply to the so-called best-arm
identification problem, where the number of observations is exogenously
specified.
arXiv link: http://arxiv.org/abs/2210.15841v7
Estimation of Heterogeneous Treatment Effects Using a Conditional Moment Based Approach
linear model (PLM) with multiple exogenous covariates and a potentially
endogenous treatment variable. Our approach integrates a Robinson
transformation to handle the nonparametric component, the Smooth Minimum
Distance (SMD) method to leverage conditional mean independence restrictions,
and a Neyman-Orthogonalized first-order condition (FOC). By employing
regularized model selection techniques like the Lasso method, our estimator
accommodates numerous covariates while exhibiting reduced bias, consistency,
and asymptotic normality. Simulations demonstrate its robust performance with
diverse instrument sets compared to traditional GMM-type estimators. Applying
this method to estimate Medicaid's heterogeneous treatment effects from the
Oregon Health Insurance Experiment reveals more robust and reliable results
than conventional GMM approaches.
arXiv link: http://arxiv.org/abs/2210.15829v4
Unit Averaging for Heterogeneous Panels
unit-specific parameters in a heterogeneous panel model. The procedure consists
in estimating the parameter of a given unit using a weighted average of all the
unit-specific parameter estimators in the panel. The weights of the average are
determined by minimizing an MSE criterion we derive. We analyze the properties
of the resulting minimum MSE unit averaging estimator in a local heterogeneity
framework inspired by the literature on frequentist model averaging, and we
derive the local asymptotic distribution of the estimator and the corresponding
weights. The benefits of the procedure are showcased with an application to
forecasting unemployment rates for a panel of German regions.
arXiv link: http://arxiv.org/abs/2210.14205v3
GLS under Monotone Heteroskedasticity
regression analyses. A major issue in implementing the GLS is estimation of the
conditional variance function of the error term, which typically requires a
restrictive functional form assumption for parametric estimation or smoothing
parameters for nonparametric estimation. In this paper, we propose an
alternative approach to estimate the conditional variance function under
nonparametric monotonicity constraints by utilizing the isotonic regression
method. Our GLS estimator is shown to be asymptotically equivalent to the
infeasible GLS estimator with knowledge of the conditional error variance, and
involves only some tuning to trim boundary observations, not only for point
estimation but also for interval estimation or hypothesis testing. Our analysis
extends the scope of the isotonic regression method by showing that the
isotonic estimates, possibly with generated variables, can be employed as first
stage estimates to be plugged in for semiparametric objects. Simulation studies
illustrate excellent finite sample performances of the proposed method. As an
empirical example, we revisit Acemoglu and Restrepo's (2017) study on the
relationship between an aging population and economic growth to illustrate how
our GLS estimator effectively reduces estimation errors.
arXiv link: http://arxiv.org/abs/2210.13843v2
Prediction intervals for economic fixed-event forecasts
sequence of forecasts of the same (`fixed') predictand, so that the difficulty
of the forecasting problem decreases over time. Fixed-event point forecasts are
typically published without a quantitative measure of uncertainty. To construct
such a measure, we consider forecast postprocessing techniques tailored to the
fixed-event case. We develop regression methods that impose constraints
motivated by the problem at hand, and use these methods to construct prediction
intervals for gross domestic product (GDP) growth in Germany and the US.
arXiv link: http://arxiv.org/abs/2210.13562v3
Spatio-temporal Event Studies for Air Quality Assessment under Cross-sectional Dependence
event of interest has caused changes in the level of one or more relevant time
series. We are interested in ES applied to multivariate time series
characterized by high spatial (cross-sectional) and temporal dependence. We
pursue two goals. First, we propose to extend the existing taxonomy on ES,
mainly deriving from the financial field, by generalizing the underlying
statistical concepts and then adapting them to the time series analysis of
airborne pollutant concentrations. Second, we address the spatial
cross-sectional dependence by adopting a twofold adjustment. Initially, we use
a linear mixed spatio-temporal regression model (HDGM) to estimate the
relationship between the response variable and a set of exogenous factors,
while accounting for the spatio-temporal dynamics of the observations. Later,
we apply a set of sixteen ES test statistics, both parametric and
nonparametric, some of which directly adjusted for cross-sectional dependence.
We apply ES to evaluate the impact on NO2 concentrations generated by the
lockdown restrictions adopted in the Lombardy region (Italy) during the
COVID-19 pandemic in 2020. The HDGM model distinctly reveals the level shift
caused by the event of interest, while reducing the volatility and isolating
the spatial dependence of the data. Moreover, all the test statistics
unanimously suggest that the lockdown restrictions generated significant
reductions in the average NO2 concentrations.
arXiv link: http://arxiv.org/abs/2210.17529v1
Choosing The Best Incentives for Belief Elicitation with an Application to Political Protests
variable to assess how information affects one's own actions. However, beliefs
are multi-dimensional objects, and experimenters often only elicit a single
response from subjects. In this paper, we discuss how the incentives offered by
experimenters map subjects' true belief distributions to what profit-maximizing
subjects respond in the elicitation task. In particular, we show how slightly
different incentives may induce subjects to report the mean, mode, or median of
their belief distribution. If beliefs are not symmetric and unimodal, then
using an elicitation scheme that is mismatched with the research question may
affect both the magnitude and the sign of identified effects, or may even make
identification impossible. As an example, we revisit Cantoni et al.'s (2019)
study of whether political protests are strategic complements or substitutes.
We show that they elicit modal beliefs, while modal and mean beliefs may be
updated in opposite directions following their experiment. Hence, the sign of
their effects may change, allowing an alternative interpretation of their
results.
arXiv link: http://arxiv.org/abs/2210.12549v1
Allowing for weak identification when testing GARCH-X type models
allow for parameters to be near or at the boundary of the parameter space, to
derive the asymptotic distributions of the two test statistics that are used in
the two-step (testing) procedure proposed by Pedersen and Rahbek (2019). The
latter aims at testing the null hypothesis that a GARCH-X type model, with
exogenous covariates (X), reduces to a standard GARCH type model, while
allowing the "GARCH parameter" to be unidentified. We then provide a
characterization result for the asymptotic size of any test for testing this
null hypothesis before numerically establishing a lower bound on the asymptotic
size of the two-step procedure at the 5% nominal level. This lower bound
exceeds the nominal level, revealing that the two-step procedure does not
control asymptotic size. In a simulation study, we show that this finding is
relevant for finite samples, in that the two-step procedure can suffer from
overrejection in finite samples. We also propose a new test that, by
construction, controls asymptotic size and is found to be more powerful than
the two-step procedure when the "ARCH parameter" is "very small" (in which case
the two-step procedure underrejects).
arXiv link: http://arxiv.org/abs/2210.11398v1
Network Synthetic Interventions: A Causal Framework for Panel Data Under Network Interference
interventions methodology to incorporate network interference. We consider the
estimation of unit-specific potential outcomes from panel data in the presence
of spillover across units and unobserved confounding. Key to our approach is a
novel latent factor model that takes into account network interference and
generalizes the factor models typically used in panel data settings. We propose
an estimator, Network Synthetic Interventions (NSI), and show that it
consistently estimates the mean outcomes for a unit under an arbitrary set of
counterfactual treatments for the network. We further establish that the
estimator is asymptotically normal. We furnish two validity tests for whether
the NSI estimator reliably generalizes to produce accurate counterfactual
estimates. We provide a novel graph-based experiment design that guarantees the
NSI estimator produces accurate counterfactual estimates, and also analyze the
sample complexity of the proposed design. We conclude with simulations that
corroborate our theoretical findings.
arXiv link: http://arxiv.org/abs/2210.11355v2
Low-rank Panel Quantile Regression: Estimation and Inference
models which allow for unobserved slope heterogeneity over both individuals and
time. We estimate the heterogeneous intercept and slope matrices via nuclear
norm regularization followed by sample splitting, row- and column-wise quantile
regressions and debiasing. We show that the estimators of the factors and
factor loadings associated with the intercept and slope matrices are
asymptotically normally distributed. In addition, we develop two specification
tests: one for the null hypothesis that the slope coefficient is a constant
over time and/or individuals under the case that true rank of slope matrix
equals one, and the other for the null hypothesis that the slope coefficient
exhibits an additive structure under the case that the true rank of slope
matrix equals two. We illustrate the finite sample performance of estimation
and inference via Monte Carlo simulations and real datasets.
arXiv link: http://arxiv.org/abs/2210.11062v1
Efficient variational approximations for state space models
state space models. However, existing methods are inaccurate or computationally
infeasible for many state space models. This paper proposes a variational
approximation that is accurate and fast for any model with a closed-form
measurement density function and a state transition distribution within the
exponential family of distributions. We show that our method can accurately and
quickly estimate a multivariate Skellam stochastic volatility model with
high-frequency tick-by-tick discrete price changes of four stocks, and a
time-varying parameter vector autoregression with a stochastic volatility model
using eight macroeconomic variables.
arXiv link: http://arxiv.org/abs/2210.11010v3
Synthetic Blips: Generalizing Synthetic Controls for Dynamic Treatment Effects
methods to the setting with dynamic treatment effects. We consider the
estimation of unit-specific treatment effects from panel data collected under a
general treatment sequence. Here, each unit receives multiple treatments
sequentially, according to an adaptive policy that depends on a latent,
endogenously time-varying confounding state. Under a low-rank latent factor
model assumption, we develop an identification strategy for any unit-specific
mean outcome under any sequence of interventions. The latent factor model we
propose admits linear time-varying and time-invariant dynamical systems as
special cases. Our approach can be viewed as an identification strategy for
structural nested mean models -- a widely used framework for dynamic treatment
effects -- under a low-rank latent factor assumption on the blip effects.
Unlike these models, however, it is more permissive in observational settings,
thereby broadening its applicability. Our method, which we term synthetic blip
effects, is a backwards induction process in which the blip effect of a
treatment at each period and for a target unit is recursively expressed as a
linear combination of the blip effects of a group of other units that received
the designated treatment. This strategy avoids the combinatorial explosion in
the number of units that would otherwise be required by a naive application of
prior synthetic control and intervention methods in dynamic treatment settings.
We provide estimation algorithms that are easy to implement in practice and
yield estimators with desirable properties. Using unique Korean firm-level
panel data, we demonstrate how the proposed framework can be used to estimate
individualized dynamic treatment effects and to derive optimal treatment
allocation rules in the context of financial support for exporting firms.
arXiv link: http://arxiv.org/abs/2210.11003v2
Linear Regression with Centrality Measures
when network data is sparse -- that is, when there are many more agents than
links per agent -- and when they are measured with error. We make three
contributions in this setting: (1) We show that OLS estimators can become
inconsistent under sparsity and characterize the threshold at which this
occurs, with and without measurement error. This threshold depends on the
centrality measure used. Specifically, regression on eigenvector is less robust
to sparsity than on degree and diffusion. (2) We develop distributional theory
for OLS estimators under measurement error and sparsity, finding that OLS
estimators are subject to asymptotic bias even when they are consistent.
Moreover, bias can be large relative to their variances, so that bias
correction is necessary for inference. (3) We propose novel bias correction and
inference methods for OLS with sparse noisy networks. Simulation evidence
suggests that our theory and methods perform well, particularly in settings
where the usual OLS estimators and heteroskedasticity-consistent/robust t-tests
are deficient. Finally, we demonstrate the utility of our results in an
application inspired by De Weerdt and Deacon (2006), in which we consider
consumption smoothing and social insurance in Nyakatoke, Tanzania.
arXiv link: http://arxiv.org/abs/2210.10024v1
Modelling Large Dimensional Datasets with Markov Switching Factor Models
changes in the loadings driven by a latent first order Markov process. By
exploiting the equivalent linear representation of the model, we first recover
the latent factors by means of Principal Component Analysis. We then cast the
model in state-space form, and we estimate loadings and transition
probabilities through an EM algorithm based on a modified version of the
Baum-Lindgren-Hamilton-Kim filter and smoother that makes use of the factors
previously estimated. Our approach is appealing as it provides closed form
expressions for all estimators. More importantly, it does not require knowledge
of the true number of factors. We derive the theoretical properties of the
proposed estimation procedure, and we show their good finite sample performance
through a comprehensive set of Monte Carlo experiments. The empirical
usefulness of our approach is illustrated through three applications to large
U.S. datasets of stock returns, macroeconomic variables, and inflation indexes.
arXiv link: http://arxiv.org/abs/2210.09828v5
Party On: The Labor Market Returns to Social Networks in Adolescence
using data from the National Longitudinal Study of Adolescent to Adult Health.
Because both education and friendships are jointly determined in adolescence,
OLS estimates of their returns are likely biased. We implement a novel
procedure to obtain bounds on the causal returns to friendships: we assume that
the returns to schooling range from 5 to 15% (based on prior literature), and
instrument for friendships using similarity in age among peers. Having one more
friend in adolescence increases earnings between 7 and 14%, substantially more
than OLS estimates would suggest.
arXiv link: http://arxiv.org/abs/2210.09426v5
Concentration inequalities of MLE and robust MLE
and machine learning. In this article, for i.i.d. variables, we obtain
constant-specified and sharp concentration inequalities and oracle inequalities
for the MLE only under exponential moment conditions. Furthermore, in a robust
setting, the sub-Gaussian type oracle inequalities of the log-truncated maximum
likelihood estimator are derived under the second-moment condition.
arXiv link: http://arxiv.org/abs/2210.09398v2
Modified Wilcoxon-Mann-Whitney tests of stochastic dominance
Wilcoxon-Mann-Whitney statistics may be used to conduct rank-based tests of
stochastic dominance. We broaden the scope of applicability of such tests by
showing that the bootstrap may be used to conduct valid inference in a matched
pairs sampling framework permitting dependence between the two samples.
Further, we show that a modified bootstrap incorporating an implicit estimate
of a contact set may be used to improve power. Numerical simulations indicate
that our test using the modified bootstrap effectively controls the null
rejection rates and can deliver more or less power than that of the Donald-Hsu
test. In the course of establishing our results we obtain a weak approximation
to the empirical ordinance dominance curve permitting its population density to
diverge to infinity at zero or one at arbitrary rates.
arXiv link: http://arxiv.org/abs/2210.08892v1
A General Design-Based Framework and Estimator for Randomized Experiments
randomized experiments. Causal effects are defined as linear functionals
evaluated at unit-level potential outcome functions. Assumptions about the
potential outcome functions are encoded as function spaces. This makes the
framework expressive, allowing experimenters to formulate and investigate a
wide range of causal questions, including about interference, that previously
could not be investigated with design-based methods. We describe a class of
estimators for estimands defined using the framework and investigate their
properties. We provide necessary and sufficient conditions for unbiasedness and
consistency. We also describe a class of conservative variance estimators,
which facilitate the construction of confidence intervals. Finally, we provide
several examples of empirical settings that previously could not be examined
with design-based methods to illustrate the use of our approach in practice.
arXiv link: http://arxiv.org/abs/2210.08698v3
Inference on Extreme Quantiles of Unobserved Individual Heterogeneity
unobserved individual heterogeneity (e.g., heterogeneous coefficients,
treatment effects) in panel data and meta-analysis settings. Inference is
challenging in such settings: only noisy estimates of heterogeneity are
available, and central limit approximations perform poorly in the tails. We
derive a necessary and sufficient condition under which noisy estimates are
informative about extreme quantiles, along with sufficient rate and moment
conditions. Under these conditions, we establish an extreme value theorem and
an intermediate order theorem for noisy estimates. These results yield simple
optimization-free confidence intervals for extreme quantiles. Simulations show
that our confidence intervals have favorable coverage and that the rate
conditions matter for the validity of inference. We illustrate the method with
an application to firm productivity differences between denser and less dense
areas.
arXiv link: http://arxiv.org/abs/2210.08524v4
Fair Effect Attribution in Parallel Online Experiments
introduced in online services. It is common for online platforms to run a large
number of simultaneous experiments by splitting incoming user traffic randomly
in treatment and control groups. Despite a perfect randomization between
different groups, simultaneous experiments can interact with each other and
create a negative impact on average population outcomes such as engagement
metrics. These are measured globally and monitored to protect overall user
experience. Therefore, it is crucial to measure these interaction effects and
attribute their overall impact in a fair way to the respective experimenters.
We suggest an approach to measure and disentangle the effect of simultaneous
experiments by providing a cost sharing approach based on Shapley values. We
also provide a counterfactual perspective, that predicts shared impact based on
conditional average treatment effects making use of causal inference
techniques. We illustrate our approach in real world and synthetic data
experiments.
arXiv link: http://arxiv.org/abs/2210.08338v1
Distance and Kernel-Based Measures for Global and Local Two-Sample Conditional Distribution Testing
modern applications, including transfer learning and causal inference. Despite
its importance, this fundamental problem has received surprisingly little
attention in the literature, with existing works focusing exclusively on global
two-sample conditional distribution testing. Based on distance and kernel
methods, this paper presents the first unified framework for both global and
local two-sample conditional distribution testing. To this end, we introduce
distance and kernel-based measures that characterize the homogeneity of two
conditional distributions. Drawing from the concept of conditional
U-statistics, we propose consistent estimators for these measures.
Theoretically, we derive the convergence rates and the asymptotic distributions
of the estimators under both the null and alternative hypotheses. Utilizing
these measures, along with a local bootstrap approach, we develop global and
local tests that can detect discrepancies between two conditional distributions
at global and local levels, respectively. Our tests demonstrate reliable
performance through simulations and real data analysis.
arXiv link: http://arxiv.org/abs/2210.08149v3
A New Method for Generating Random Correlation Matrices
it simple to control both location and dispersion. The method is based on a
vector parameterization, gamma = g(C), which maps any distribution on R^d, d =
n(n-1)/2 to a distribution on the space of non-singular nxn correlation
matrices. Correlation matrices with certain properties, such as being
well-conditioned, having block structures, and having strictly positive
elements, are simple to generate. We compare the new method with existing
methods.
arXiv link: http://arxiv.org/abs/2210.08147v1
Conditional Likelihood Ratio Test with Many Weak Instruments
developed by Moreira (2003) to instrumental variable regression models with
unknown error variance and many weak instruments. In this setting, we argue
that the conventional CLR test with estimated error variance loses exact
similarity and is asymptotically invalid. We propose a modified critical value
function for the likelihood ratio (LR) statistic with estimated error variance,
and prove that this modified test achieves asymptotic validity under many weak
instrument asymptotics. Our critical value function is constructed by
representing the LR using four statistics, instead of two as in Moreira (2003).
A simulation study illustrates the desirable properties of our test.
arXiv link: http://arxiv.org/abs/2210.07680v1
Fast Estimation of Bayesian State Space Models Using Amortized Simulation-Based Inference
state space models. The algorithm is a variation of amortized simulation-based
inference algorithms, where a large number of artificial datasets are generated
at the first stage, and then a flexible model is trained to predict the
variables of interest. In contrast to those proposed earlier, the procedure
described in this paper makes it possible to train estimators for hidden states
by concentrating only on certain characteristics of the marginal posterior
distributions and introducing inductive bias. Illustrations using the examples
of the stochastic volatility model, nonlinear dynamic stochastic general
equilibrium model, and seasonal adjustment procedure with breaks in seasonality
show that the algorithm has sufficient accuracy for practical use. Moreover,
after pretraining, which takes several hours, finding the posterior
distribution for any dataset takes from hundredths to tenths of a second.
arXiv link: http://arxiv.org/abs/2210.07154v1
Robust Estimation and Inference in Panels with Interactive Fixed Effects
with interactive fixed effects (i.e., with a factor structure). We demonstrate
that existing estimators and confidence intervals (CIs) can be heavily biased
and size-distorted when some of the factors are weak. We propose estimators
with improved rates of convergence and bias-aware CIs that remain valid
uniformly, regardless of factor strength. Our approach applies the theory of
minimax linear estimation to form a debiased estimate, using a nuclear norm
bound on the error of an initial estimate of the interactive fixed effects. Our
resulting bias-aware CIs take into account the remaining bias caused by weak
factors. Monte Carlo experiments show substantial improvements over
conventional methods when factors are weak, with minimal costs to estimation
accuracy when factors are strong.
arXiv link: http://arxiv.org/abs/2210.06639v4
Sample Constrained Treatment Effect Estimation
focus on designing efficient randomized controlled trials, to accurately
estimate the effect of some treatment on a population of $n$ individuals. In
particular, we study sample-constrained treatment effect estimation, where we
must select a subset of $s \ll n$ individuals from the population to experiment
on. This subset must be further partitioned into treatment and control groups.
Algorithms for partitioning the entire population into treatment and control
groups, or for choosing a single representative subset, have been well-studied.
The key challenge in our setting is jointly choosing a representative subset
and a partition for that set.
We focus on both individual and average treatment effect estimation, under a
linear effects model. We give provably efficient experimental designs and
corresponding estimators, by identifying connections to discrepancy
minimization and leverage-score-based sampling used in randomized numerical
linear algebra. Our theoretical results obtain a smooth transition to known
guarantees when $s$ equals the population size. We also empirically demonstrate
the performance of our algorithms.
arXiv link: http://arxiv.org/abs/2210.06594v1
Estimating Option Pricing Models Using a Characteristic Function-Based Linear State Space Representation
pricing models driven by general affine jump-diffusions. Our procedure is based
on the comparison between an option-implied, model-free representation of the
conditional log-characteristic function and the model-implied conditional
log-characteristic function, which is functionally affine in the model's state
vector. We formally derive an associated linear state space representation and
establish the asymptotic properties of the corresponding measurement errors.
The state space representation allows us to use a suitably modified Kalman
filtering technique to learn about the latent state vector and a quasi-maximum
likelihood estimator of the model parameters, which brings important
computational advantages. We analyze the finite-sample behavior of our
procedure in Monte Carlo simulations. The applicability of our procedure is
illustrated in two case studies that analyze S&P 500 option prices and the
impact of exogenous state variables capturing Covid-19 reproduction and
economic policy uncertainty.
arXiv link: http://arxiv.org/abs/2210.06217v1
Bayesian analysis of mixtures of lognormal distribution with an unknown number of components from grouped data
estimating parameters of lognormal distribution mixtures for income. Using
simulated data examples, we examined the proposed algorithm's performance and
the accuracy of posterior distributions of the Gini coefficients. Results
suggest that the parameters were estimated accurately. Therefore, the posterior
distributions are close to the true distributions even when the different data
generating process is accounted for. Moreover, promising results for Gini
coefficients encouraged us to apply our method to real data from Japan. The
empirical examples indicate two subgroups in Japan (2020) and the Gini
coefficients' integrity.
arXiv link: http://arxiv.org/abs/2210.05115v3
Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption
large class of synthetic control predictions (or estimators) in settings with
staggered treatment adoption, offering precise non-asymptotic coverage
probability guarantees. From a methodological perspective, we provide a
detailed discussion of different causal quantities to be predicted, which we
call causal predictands, allowing for multiple treated units with treatment
adoption at possibly different points in time. From a theoretical perspective,
our uncertainty quantification methods improve on prior literature by (i)
covering a large class of causal predictands in staggered adoption settings,
(ii) allowing for synthetic control methods with possibly nonlinear
constraints, (iii) proposing scalable robust conic optimization methods and
principled data-driven tuning parameter selection, and (iv) offering valid
uniform inference across post-treatment periods. We illustrate our methodology
with an empirical application studying the effects of economic liberalization
on real GDP per capita for Sub-Saharan African countries. Companion software
packages are provided in Python, R, and Stata.
arXiv link: http://arxiv.org/abs/2210.05026v5
Policy Learning with New Treatments
treatment to a heterogeneous population on the basis of experimental data that
includes only a subset of possible treatment values. The effects of new
treatments are partially identified by shape restrictions on treatment
response. Policies are compared according to the minimax regret criterion, and
I show that the empirical analog of the population decision problem has a
tractable linear- and integer-programming formulation. I prove the maximum
regret of the estimated policy converges to the lowest possible maximum regret
at a rate which is the maximum of N^-1/2 and the rate at which conditional
average treatment effects are estimated in the experimental data. In an
application to designing targeted subsidies for electrical grid connections in
rural Kenya, I find that nearly the entire population should be given a
treatment not implemented in the experiment, reducing maximum regret by over
60% compared to the policy that restricts to the treatments implemented in the
experiment.
arXiv link: http://arxiv.org/abs/2210.04703v4
An identification and testing strategy for proxy-SVARs with weak proxies
are weak, inference in proxy-SVARs (SVAR-IVs) is nonstandard and the
construction of asymptotically valid confidence sets for the impulse responses
of interest requires weak-instrument robust methods. In the presence of
multiple target shocks, test inversion techniques require extra restrictions on
the proxy-SVAR parameters other those implied by the proxies that may be
difficult to interpret and test. We show that frequentist asymptotic inference
in these situations can be conducted through Minimum Distance estimation and
standard asymptotic methods if the proxy-SVAR can be identified by using
`strong' instruments for the non-target shocks; i.e. the shocks which are not
of primary interest in the analysis. The suggested identification strategy
hinges on a novel pre-test for the null of instrument relevance based on
bootstrap resampling which is not subject to pre-testing issues, in the sense
that the validity of post-test asymptotic inferences is not affected by the
outcomes of the test. The test is robust to conditionally heteroskedasticity
and/or zero-censored proxies, is computationally straightforward and applicable
regardless of the number of shocks being instrumented. Some illustrative
examples show the empirical usefulness of the suggested identification and
testing strategy.
arXiv link: http://arxiv.org/abs/2210.04523v4
A Structural Equation Modeling Approach to Understand User's Perceptions of Acceptance of Ride-Sharing Services in Dhaka City
users' perceptions of acceptance of ride-sharing services in Dhaka City. A
structured questionnaire is developed based on the users' reported attitudes
and perceived risks. A total of 350 normally distributed responses are
collected from ride-sharing service users and stakeholders of Dhaka City.
Respondents are interviewed to express their experience and opinions on
ride-sharing services through the stated preference questionnaire. Structural
Equation Modeling (SEM) is used to validate the research hypotheses.
Statistical parameters and several trials are used to choose the best SEM. The
responses are also analyzed using the Relative Importance Index (RII) method,
validating the chosen SEM. Inside SEM, the quality of ride-sharing services is
measured by two latent and eighteen observed variables. The latent variable
'safety & security' is more influential than 'service performance' on the
overall quality of service index. Under 'safety & security' the other two
variables, i.e., 'account information' and 'personal information' are found to
be the most significant that impact the decision to share rides with others. In
addition, 'risk of conflict' and 'possibility of accident' are identified using
the perception model as the lowest contributing variables. Factor analysis
reveals the suitability and reliability of the proposed SEM. Identifying the
influential parameters in this will help the service providers understand and
improve the quality of ride-sharing service for users.
arXiv link: http://arxiv.org/abs/2210.04086v2
Empirical Bayes Selection for Value Maximization
/ n \to \alpha \in (0, 1)$, where noisy, heteroskedastic measurements of the
units' true values are available and the decision-maker wishes to maximize the
aggregate true value of the units selected. Given a parametric prior
distribution, the empirical Bayes decision rule incurs $O_p(n^{-1})$ regret
relative to the Bayesian oracle that knows the true prior. More generally, if
the error in the estimated prior is of order $O_p(r_n)$, regret is
$O_p(r_n^2)$. In this sense selection of the best units is fundamentally
easier than estimation of their values. We show this regret bound is
sharp in the parametric case, by giving an example in which it is attained.
Using priors calibrated from a dataset of over four thousand internet
experiments, we confirm that empirical Bayes methods perform well in detecting
the best treatments with only a modest number of experiments.
arXiv link: http://arxiv.org/abs/2210.03905v3
Order Statistics Approaches to Unobserved Heterogeneity in Auctions
and nonseparable unobserved heterogeneity using three consecutive order
statistics of bids. We then propose sieve maximum likelihood estimators for the
joint distribution of unobserved heterogeneity and the private value, as well
as their conditional and marginal distributions. Lastly, we apply our
methodology to a novel dataset from judicial auctions in China. Our estimates
suggest substantial gains from accounting for unobserved heterogeneity when
setting reserve prices. We propose a simple scheme that achieves nearly optimal
revenue by using the appraisal value as the reserve price.
arXiv link: http://arxiv.org/abs/2210.03547v1
On estimating Armington elasticities for Japan's meat imports
we estimate substitution elasticities of Japan's two-stage import aggregation
functions for beef, chicken and pork. While the regression analysis crucially
depends on the price that consumers face, the post-tariff price of imported
meat depends not only on ad valorem duties but also on tariff rate quotas and
gate price system regimes. The effective tariff rate is consequently evaluated
by utilizing monthly transaction data. To address potential endogeneity
problems, we apply exchange rates that we believe to be independent of the
demand shocks for imported meat. The panel nature of the data allows us to
retrieve the first-stage aggregates via time dummy variables, free of demand
shocks, to be used as part of the explanatory variable and as an instrument in
the second-stage regression.
arXiv link: http://arxiv.org/abs/2210.05358v2
Testing the Number of Components in Finite Mixture Normal Regression Model with Panel Data
a M0-component model against an alternative of (M0 + 1)-component model in the
normal mixture panel regression by extending the Expectation-Maximization (EM)
test of Chen and Li (2009a) and Kasahara and Shimotsu (2015) to the case of
panel data. We show that, unlike the cross-sectional normal mixture, the
first-order derivative of the density function for the variance parameter in
the panel normal mixture is linearly independent of its second-order
derivatives for the mean parameter. On the other hand, like the cross-sectional
normal mixture, the likelihood ratio test statistic of the panel normal mixture
is unbounded. We consider the Penalized Maximum Likelihood Estimator to deal
with the unboundedness, where we obtain the data-driven penalty function via
computational experiments. We derive the asymptotic distribution of the
Penalized Likelihood Ratio Test (PLRT) and EM test statistics by expanding the
log-likelihood function up to five times for the reparameterized parameters.
The simulation experiment indicates good finite sample performance of the
proposed EM test. We apply our EM test to estimate the number of production
technology types for the finite mixture Cobb-Douglas production function model
studied by Kasahara et al. (2022) used the panel data of the Japanese and
Chilean manufacturing firms. We find the evidence of heterogeneity in
elasticities of output for intermediate goods, suggesting that production
function is heterogeneous across firms beyond their Hicks-neutral productivity
terms.
arXiv link: http://arxiv.org/abs/2210.02824v2
The Local to Unity Dynamic Tobit Model
nonlinearities in the form of censoring or an occasionally binding constraint,
such as are regularly encountered in macroeconomics. A tractable candidate
model for such series is the dynamic Tobit with a root local to unity. We show
that this model generates a process that converges weakly to a non-standard
limiting process, that is constrained (regulated) to be positive. Surprisingly,
despite the presence of censoring, the OLS estimators of the model parameters
are consistent. We show that this allows OLS-based inferences to be drawn on
the overall persistence of the process (as measured by the sum of the
autoregressive coefficients), and for the null of a unit root to be tested in
the presence of censoring. Our simulations illustrate that the conventional ADF
test substantially over-rejects when the data is generated by a dynamic Tobit
with a unit root, whereas our proposed test is correctly sized. We provide an
application of our methods to testing for a unit root in the Swiss franc / euro
exchange rate, during a period when this was subject to an occasionally binding
lower bound.
arXiv link: http://arxiv.org/abs/2210.02599v3
Regression discontinuity design with right-censored survival data
analysis setting with right-censored data, studied in an intensity based
counting process framework. In particular, a local polynomial regression
version of the Aalen additive hazards estimator is introduced as an estimator
of the difference between two covariate dependent cumulative hazard rate
functions. Large-sample theory for this estimator is developed, including
confidence intervals that take into account the uncertainty associated with
bias correction. As is standard in the causality literature, the models and the
theory are embedded in the potential outcomes framework. Two general results
concerning potential outcomes and the multiplicative hazards model for survival
data are presented.
arXiv link: http://arxiv.org/abs/2210.02548v1
Bikeability and the induced demand for cycling
provision of bicycle infrastructure? In this study, we exploit a large dataset
of observed bicycle trajectories in combination with a fine-grained
representation of the Copenhagen bicycle-relevant network. We apply a novel
model for bicyclists' choice of route from origin to destination that takes the
complete network into account. This enables us to determine bicyclists'
preferences for a range of infrastructure and land-use types. We use the
estimated preferences to compute a subjective cost of bicycle travel, which we
correlate with the number of bicycle trips across a large number of
origin-destination pairs. Simulations suggest that the extensive Copenhagen
bicycle lane network has caused the number of bicycle trips and the bicycle
kilometers traveled to increase by 60% and 90%, respectively, compared with a
counterfactual without the bicycle lane network. This translates into an annual
benefit of EUR 0.4M per km of bicycle lane owing to changes in subjective
travel cost, health, and accidents. Our results thus strongly support the
provision of bicycle infrastructure.
arXiv link: http://arxiv.org/abs/2210.02504v2
Probability of Causation with Sample Selection: A Reanalysis of the Impacts of Jóvenes en Acción on Formality
selection. We show that the probability of causation is partially identified
for individuals who are always observed regardless of treatment status and
derive sharp bounds under three increasingly restrictive sets of assumptions.
The first set imposes an exogenous treatment and a monotone sample selection
mechanism. To tighten these bounds, the second set also imposes the monotone
treatment response assumption, while the third set additionally imposes a
stochastic dominance assumption. Finally, we use experimental data from the
Colombian job training program J\'ovenes en Acci\'on to empirically illustrate
our approach's usefulness. We find that, among always-employed women, at least
10.2% and at most 13.4% transitioned to the formal labor market because of the
program. However, our 90%-confidence region does not reject the null hypothesis
that the lower bound is equal to zero.
arXiv link: http://arxiv.org/abs/2210.01938v6
Revealing Unobservables by Deep Learning: Generative Element Extraction Networks (GEEN)
variable, such as effort, ability, and belief, is unobserved in the sample but
needs to be identified. This paper proposes a novel method for estimating
realizations of a latent variable $X^*$ in a random sample that contains its
multiple measurements. With the key assumption that the measurements are
independent conditional on $X^*$, we provide sufficient conditions under which
realizations of $X^*$ in the sample are locally unique in a class of
deviations, which allows us to identify realizations of $X^*$. To the best of
our knowledge, this paper is the first to provide such identification in
observation. We then use the Kullback-Leibler distance between the two
probability densities with and without the conditional independence as the loss
function to train a Generative Element Extraction Networks (GEEN) that maps
from the observed measurements to realizations of $X^*$ in the sample. The
simulation results imply that this proposed estimator works quite well and the
estimated values are highly correlated with realizations of $X^*$. Our
estimator can be applied to a large class of latent variable models and we
expect it will change how people deal with latent variables.
arXiv link: http://arxiv.org/abs/2210.01300v1
Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees
a human agent based upon the observable history of implemented actions and
visited states. This problem has an inherent nested structure: in the inner
problem, an optimal policy for a given reward function is identified while in
the outer problem, a measure of fit is maximized. Several approaches have been
proposed to alleviate the computational burden of this nested-loop structure,
but these methods still suffer from high complexity when the state space is
either discrete with large cardinality or continuous in high dimensions. Other
approaches in the inverse reinforcement learning (IRL) literature emphasize
policy estimation at the expense of reduced reward estimation accuracy. In this
paper we propose a single-loop estimation algorithm with finite time guarantees
that is equipped to deal with high-dimensional state spaces without
compromising reward estimation accuracy. In the proposed algorithm, each policy
improvement step is followed by a stochastic gradient step for likelihood
maximization. We show that the proposed algorithm converges to a stationary
solution with a finite-time guarantee. Further, if the reward is parameterized
linearly, we show that the algorithm approximates the maximum likelihood
estimator sublinearly. Finally, by using robotics control problems in MuJoCo
and their transfer settings, we show that the proposed algorithm achieves
superior performance compared with other IRL and imitation learning benchmarks.
arXiv link: http://arxiv.org/abs/2210.01282v3
Reconciling econometrics with continuous maximum-entropy network models
traditional Gravity Model specification as the expected link weight coming from
a probability distribution whose functional form can be chosen arbitrarily,
while statistical-physics approaches construct maximum-entropy distributions of
weighted graphs, constrained to satisfy a given set of measurable network
properties. In a recent, companion paper, we integrated the two approaches and
applied them to the World Trade Web, i.e. the network of international trade
among world countries. While the companion paper dealt only with
discrete-valued link weights, the present paper extends the theoretical
framework to continuous-valued link weights. In particular, we construct two
broad classes of maximum-entropy models, namely the integrated and the
conditional ones, defined by different criteria to derive and combine the
probabilistic rules for placing links and loading them with weights. In the
integrated models, both rules follow from a single, constrained optimization of
the continuous Kullback-Leibler divergence; in the conditional models, the two
rules are disentangled and the functional form of the weight distribution
follows from a conditional, optimization procedure. After deriving the general
functional form of the two classes, we turn each of them into a proper family
of econometric models via a suitable identification of the econometric function
relating the corresponding, expected link weights to macroeconomic factors.
After testing the two classes of models on World Trade Web data, we discuss
their strengths and weaknesses.
arXiv link: http://arxiv.org/abs/2210.01179v3
Conditional Distribution Model Specification Testing Using Chi-Square Goodness-of-Fit Tests
conditional distribution model specification. The data is cross-classified
according to the Rosenblatt transform of the dependent variable and the
explanatory variables, resulting in a contingency table with expected joint
frequencies equal to the product of the row and column marginals, which are
independent of the model parameters. The test statistics assess whether the
difference between observed and expected frequencies is due to chance. We
propose three types of test statistics: the classical trinity of tests based on
the likelihood of grouped data, and two statistics based on the efficient raw
data estimator -- namely, a Chernoff-Lehmann and a generalized Wald statistic.
The asymptotic distribution of these statistics is invariant to
sample-dependent partitions. Monte Carlo experiments demonstrate the good
performance of the proposed tests.
arXiv link: http://arxiv.org/abs/2210.00624v4
AI-Assisted Discovery of Quantitative and Formal Models in Social Science
economic growth and collective action, are used to formulate mechanistic
explanations, provide predictions, and uncover questions about observed
phenomena. Here, we demonstrate the use of a machine learning system to aid the
discovery of symbolic models that capture nonlinear and dynamical relationships
in social science datasets. By extending neuro-symbolic methods to find compact
functions and differential equations in noisy and longitudinal data, we show
that our system can be used to discover interpretable models from real-world
data in economics and sociology. Augmenting existing workflows with symbolic
regression can help uncover novel relationships and explore counterfactual
models during the scientific process. We propose that this AI-assisted
framework can bridge parametric and non-parametric models commonly employed in
social science research by systematically exploring the space of nonlinear
models and enabling fine-grained control over expressivity and
interpretability.
arXiv link: http://arxiv.org/abs/2210.00563v3
Large-Scale Allocation of Personalized Incentives
increasing social welfare by providing incentives to a large population of
individuals.
For that purpose, we formalize and solve the problem of finding an optimal
personalized-incentive policy: optimal in the sense that it maximizes social
welfare under an incentive budget constraint, personalized in the sense that
the incentives proposed depend on the alternatives available to each
individual, as well as her preferences.
We propose a polynomial time approximation algorithm that computes a policy
within few seconds and we analytically prove that it is boundedly close to the
optimum.
We then extend the problem to efficiently calculate the Maximum Social
Welfare Curve, which gives the maximum social welfare achievable for a range of
incentive budgets (not just one value).
This curve is a valuable practical tool for the regulator to determine the
right incentive budget to invest.
Finally, we simulate a large-scale application to mode choice in a French
department (about 200 thousands individuals) and illustrate the effectiveness
of the proposed personalized-incentive policy in reducing CO2 emissions.
arXiv link: http://arxiv.org/abs/2210.00463v1
Yurinskii's Coupling for Martingales
distributional analysis in mathematical statistics and applied probability,
offering a Gaussian strong approximation with an explicit error bound under
easily verifiable conditions. Originally stated in $\ell_2$-norm for sums of
independent random vectors, it has recently been extended both to the
$\ell_p$-norm, for $1 \leq p \leq \infty$, and to vector-valued martingales in
$\ell_2$-norm, under some strong conditions. We present as our main result a
Yurinskii coupling for approximate martingales in $\ell_p$-norm, under
substantially weaker conditions than those previously imposed. Our formulation
further allows for the coupling variable to follow a more general Gaussian
mixture distribution, and we provide a novel third-order coupling method which
gives tighter approximations in certain settings. We specialize our main result
to mixingales, martingales, and independent data, and derive uniform Gaussian
mixture strong approximations for martingale empirical processes. Applications
to nonparametric partitioning-based and local polynomial regression procedures
are provided, alongside central limit theorems for high-dimensional martingale
vectors.
arXiv link: http://arxiv.org/abs/2210.00362v4
A Posteriori Risk Classification and Ratemaking with Random Effects in the Mixture-of-Experts Model
automobile insurance is key to insurers' profitability and risk management,
while also ensuring that policyholders are charged a fair premium according to
their risk profile. In this paper, we propose to adapt a flexible regression
model, called the Mixed LRMoE, to the problem of a posteriori risk
classification and ratemaking, where policyholder-level random effects are
incorporated to better infer their risk profile reflected by the claim history.
We also develop a stochastic variational Expectation-Conditional-Maximization
algorithm for estimating model parameters and inferring the posterior
distribution of random effects, which is numerically efficient and scalable to
large insurance portfolios. We then apply the Mixed LRMoE model to a real,
multiyear automobile insurance dataset, where the proposed framework is shown
to offer better fit to data and produce posterior premium which accurately
reflects policyholders' claim history.
arXiv link: http://arxiv.org/abs/2209.15212v1
Statistical Inference for Fisher Market Equilibrium
increasing attention recently. In this paper we focus on the specific case of
linear Fisher markets. They have been widely use in fair resource allocation of
food/blood donations and budget management in large-scale Internet ad auctions.
In resource allocation, it is crucial to quantify the variability of the
resource received by the agents (such as blood banks and food banks) in
addition to fairness and efficiency properties of the systems. For ad auction
markets, it is important to establish statistical properties of the platform's
revenues in addition to their expected values. To this end, we propose a
statistical framework based on the concept of infinite-dimensional Fisher
markets. In our framework, we observe a market formed by a finite number of
items sampled from an underlying distribution (the "observed market") and aim
to infer several important equilibrium quantities of the underlying long-run
market. These equilibrium quantities include individual utilities, social
welfare, and pacing multipliers. Through the lens of sample average
approximation (SSA), we derive a collection of statistical results and show
that the observed market provides useful statistical information of the
long-run market. In other words, the equilibrium quantities of the observed
market converge to the true ones of the long-run market with strong statistical
guarantees. These include consistency, finite sample bounds, asymptotics, and
confidence. As an extension, we discuss revenue inference in quasilinear Fisher
markets.
arXiv link: http://arxiv.org/abs/2209.15422v3
With big data come big problems: pitfalls in measuring basis risk for crop index insurance
yields, showing a great potential for agricultural index insurance. This paper
identifies an important threat to better insurance from these new technologies:
data with many fields and few years can yield downward biased estimates of
basis risk, a fundamental metric in index insurance. To demonstrate this bias,
we use state-of-the-art satellite-based data on agricultural yields in the US
and in Kenya to estimate and simulate basis risk. We find a substantive
downward bias leading to a systematic overestimation of insurance quality.
In this paper, we argue that big data in crop insurance can lead to a new
situation where the number of variables $N$ largely exceeds the number of
observations $T$. In such a situation where $T\ll N$, conventional asymptotics
break, as evidenced by the large bias we find in simulations. We show how the
high-dimension, low-sample-size (HDLSS) asymptotics, together with the spiked
covariance model, provide a more relevant framework for the $T\ll N$ case
encountered in index insurance. More precisely, we derive the asymptotic
distribution of the relative share of the first eigenvalue of the covariance
matrix, a measure of systematic risk in index insurance. Our formula accurately
approximates the empirical bias simulated from the satellite data, and provides
a useful tool for practitioners to quantify bias in insurance quality.
arXiv link: http://arxiv.org/abs/2209.14611v1
Fast Inference for Quantile Regression with Tens of Millions of Observations
challenge of analyzing datasets with tens of millions of observations is
substantial. Conventional econometric methods based on extreme estimators
require large amounts of computing resources and memory, which are often not
readily available. In this paper, we focus on linear quantile regression
applied to "ultra-large" datasets, such as U.S. decennial censuses. A fast
inference framework is presented, utilizing stochastic subgradient descent
(S-subGD) updates. The inference procedure handles cross-sectional data
sequentially: (i) updating the parameter estimate with each incoming "new
observation", (ii) aggregating it as a $Polyak-Ruppert$ average, and
(iii) computing a pivotal statistic for inference using only a solution path.
The methodology draws from time-series regression to create an asymptotically
pivotal statistic through random scaling. Our proposed test statistic is
calculated in a fully online fashion and critical values are calculated without
resampling. We conduct extensive numerical studies to showcase the
computational merits of our proposed inference. For inference problems as large
as $(n, d) \sim (10^7, 10^3)$, where $n$ is the sample size and $d$ is the
number of regressors, our method generates new insights, surpassing current
inference methods in computation. Our method specifically reveals trends in the
gender gap in the U.S. college wage premium using millions of observations,
while controlling over $10^3$ covariates to mitigate confounding effects.
arXiv link: http://arxiv.org/abs/2209.14502v5
Minimax Optimal Kernel Operator Learning via Multilevel Training
empirical success in many disciplines of machine learning, including generative
modeling, functional data analysis, causal inference, and multi-agent
reinforcement learning. In this paper, we study the statistical limit of
learning a Hilbert-Schmidt operator between two infinite-dimensional Sobolev
reproducing kernel Hilbert spaces. We establish the information-theoretic lower
bound in terms of the Sobolev Hilbert-Schmidt norm and show that a
regularization that learns the spectral components below the bias contour and
ignores the ones that are above the variance contour can achieve the optimal
learning rate. At the same time, the spectral components between the bias and
variance contours give us flexibility in designing computationally feasible
machine learning algorithms. Based on this observation, we develop a multilevel
kernel operator learning algorithm that is optimal when learning linear
operators between infinite-dimensional function spaces.
arXiv link: http://arxiv.org/abs/2209.14430v3
The Network Propensity Score: Spillovers, Homophily, and Selection into Treatment
that features heterogeneous treatment effects, spillovers,
selection-on-observables, and network formation. I identify average partial
effects under minimal exchangeability conditions. If social interactions are
also anonymous, I derive a three-dimensional network propensity score,
characterize its support conditions, relate it to recent work on network
pseudo-metrics, and study extensions. I propose a two-step semiparametric
estimator for a random coefficients model which is consistent and
asymptotically normal as the number and size of the networks grows. I apply my
estimator to a political participation intervention Uganda and a microfinance
application in India.
arXiv link: http://arxiv.org/abs/2209.14391v1
Economic effects of Chile FTAs and an eventual CTPP accession
Jan, 2004) and Chile-China (in-force Oct, 2006) FTA on GDP consumer and
producers to conclude that Chile improved its welfare improved after its
subscription. From that point, we extrapolate to show the direct and indirect
benefits of CTPP accession.
arXiv link: http://arxiv.org/abs/2209.14748v1
Linear estimation of global average treatment effects
every member of a population, as opposed to none, using an experiment that
treats only some. We consider settings where spillovers have global support and
decay slowly with (a generalized notion of) distance. We derive the minimax
rate over both estimators and designs, and show that it increases with the
spatial rate of spillover decay. Estimators based on OLS regressions like those
used to analyze recent large-scale experiments are consistent (though only
after de-weighting), achieve the minimax rate when the DGP is linear, and
converge faster than IPW-based alternatives when treatment clusters are small,
providing one justification for OLS's ubiquity. When the DGP is nonlinear they
remain consistent but converge slowly. We further address inference and
bandwidth selection. Applied to the cash transfer experiment studied by Egger
et al. (2022) these methods yield a 20% larger estimated effect on consumption.
arXiv link: http://arxiv.org/abs/2209.14181v6
Sentiment Analysis on Inflation after Covid-19
global tweets from 2017-2022 to build a high-frequency measure of the public's
sentiment index on inflation and analyze its correlation with other online data
sources such as google trend and market-oriented inflation index. We use
manually labeled trigrams to test the prediction performance of several machine
learning models(logistic regression,random forest etc.) and choose Bert model
for final demonstration. Later, we sum daily tweets' sentiment scores gained
from Bert model to obtain the predicted inflation sentiment index, and we
further analyze the regional and pre/post covid patterns of these inflation
indexes. Lastly, we take other empirical inflation-related data as references
and prove that twitter-based inflation sentiment analysis method has an
outstanding capability to predict inflation. The results suggest that Twitter
combined with deep learning methods can be a novel and timely method to utilize
existing abundant data sources on inflation expectations and provide daily
indicators of consumers' perception on inflation.
arXiv link: http://arxiv.org/abs/2209.14737v2
Bayesian Modeling of TVP-VARs Using Regression Trees
models, many time-varying parameter (TVP) models have been proposed. This paper
proposes a nonparametric TVP-VAR model using Bayesian additive regression trees
(BART) that models the TVPs as an unknown function of effect modifiers. The
novelty of this model arises from the fact that the law of motion driving the
parameters is treated nonparametrically. This leads to great flexibility in the
nature and extent of parameter change, both in the conditional mean and in the
conditional variance. Parsimony is achieved through adopting nonparametric
factor structures and use of shrinkage priors. In an application to US
macroeconomic data, we illustrate the use of our model in tracking both the
evolving nature of the Phillips curve and how the effects of business cycle
shocks on inflation measures vary nonlinearly with changes in the effect
modifiers.
arXiv link: http://arxiv.org/abs/2209.11970v3
Revisiting the Analysis of Matched-Pair and Stratified Experiments in the Presence of Attrition
of matched-pair and stratified experimental designs in the presence of
attrition. Our main objective is to clarify a number of well-known claims about
the practice of dropping pairs with an attrited unit when analyzing
matched-pair designs. Contradictory advice appears in the literature about
whether or not dropping pairs is beneficial or harmful, and stratifying into
larger groups has been recommended as a resolution to the issue. To address
these claims, we derive the estimands obtained from the difference-in-means
estimator in a matched-pair design both when the observations from pairs with
an attrited unit are retained and when they are dropped. We find limited
evidence to support the claims that dropping pairs helps recover the average
treatment effect, but we find that it may potentially help in recovering a
convex weighted average of conditional average treatment effects. We report
similar findings for stratified designs when studying the estimands obtained
from a regression of outcomes on treatment with and without strata fixed
effects.
arXiv link: http://arxiv.org/abs/2209.11840v6
Doubly Fair Dynamic Pricing
constraints: a "procedural fairness" which requires the proposed prices to be
equal in expectation among different groups, and a "substantive fairness" which
requires the accepted prices to be equal in expectation among different groups.
A policy that is simultaneously procedural and substantive fair is referred to
as "doubly fair". We show that a doubly fair policy must be random to have
higher revenue than the best trivial policy that assigns the same price to
different groups. In a two-group setting, we propose an online learning
algorithm for the 2-group pricing problems that achieves $O(T)$
regret, zero procedural unfairness and $O(T)$ substantive
unfairness over $T$ rounds of learning. We also prove two lower bounds showing
that these results on regret and unfairness are both information-theoretically
optimal up to iterated logarithmic factors. To the best of our knowledge, this
is the first dynamic pricing algorithm that learns to price while satisfying
two fairness constraints at the same time.
arXiv link: http://arxiv.org/abs/2209.11837v1
Linear Multidimensional Regression with Interactive Fixed-Effects
more dimensions with unobserved interactive fixed-effects. The main estimator
uses double debias methods, and requires two preliminary steps. First, the
model is embedded within a two-dimensional panel framework where factor model
methods in Bai (2009) lead to consistent, but slowly converging, estimates. The
second step develops a weighted-within transformation that is robust to
multidimensional interactive fixed-effects and achieves the parametric rate of
consistency. This is combined with a double debias procedure for asymptotically
normal estimates. The methods are implemented to estimate the demand elasticity
for beer.
arXiv link: http://arxiv.org/abs/2209.11691v6
Treatment Effects with Multidimensional Unobserved Heterogeneity: Identification of the Marginal Treatment Effect
marginal treatment effects with multivalued treatments. Our model is based on a
multinomial choice model with utility maximization. Our MTE generalizes the MTE
defined in Heckman and Vytlacil (2005) in binary treatment models. As in the
binary case, we can interpret the MTE as the treatment effect for persons who
are indifferent between two treatments at a particular level. Our MTE enables
one to obtain the treatment effects of those with specific preference orders
over the choice set. Further, our results can identify other parameters such as
the marginal distribution of potential outcomes.
arXiv link: http://arxiv.org/abs/2209.11444v5
Forecasting Cryptocurrencies Log-Returns: a LASSO-VAR and Sentiment Approach
disruptive potential and reports of unprecedented returns. In addition,
academics increasingly acknowledge the predictive power of Social Media in many
fields and, more specifically, for financial markets and economics. In this
paper, we leverage the predictive power of Twitter and Reddit sentiment
together with Google Trends indexes and volume to forecast the log returns of
ten cryptocurrencies. Specifically, we consider $Bitcoin$, $Ethereum$,
$Tether$, $Binance Coin$, $Litecoin$, $Enjin Coin$, $Horizen$, $Namecoin$,
$Peercoin$, and $Feathercoin$. We evaluate the performance of LASSO-VAR using
daily data from January 2018 to January 2022. In a 30 days recursive forecast,
we can retrieve the correct direction of the actual series more than 50% of the
time. We compare this result with the main benchmarks, and we see a 10%
improvement in Mean Directional Accuracy (MDA). The use of sentiment and
attention variables as predictors increase significantly the forecast accuracy
in terms of MDA but not in terms of Root Mean Squared Errors. We perform a
Granger causality test using a post-double LASSO selection for high-dimensional
VARs. Results show no "causality" from Social Media sentiment to
cryptocurrencies returns
arXiv link: http://arxiv.org/abs/2210.00883v1
Multiscale Comparison of Nonparametric Trend Curves
trends. In many applications, practitioners are interested in whether the
observed time series all have the same time trend. Moreover, they would often
like to know which trends are different and in which time intervals they
differ. We design a multiscale test to formally approach these questions.
Specifically, we develop a test which allows to make rigorous confidence
statements about which time trends are different and where (that is, in which
time intervals) they differ. Based on our multiscale test, we further develop a
clustering algorithm which allows to cluster the observed time series into
groups with the same trend. We derive asymptotic theory for our test and
clustering methods. The theory is complemented by a simulation study and two
applications to GDP growth data and house pricing data.
arXiv link: http://arxiv.org/abs/2209.10841v1
Modelling the Frequency of Home Deliveries: An Induced Travel Demand Contribution of Aggrandized E-shopping in Toronto during COVID-19 Pandemics
The dramatic growth of e-shopping will undoubtedly cause significant impacts on
travel demand. As a result, transportation modeller's ability to model
e-shopping demand is becoming increasingly important. This study developed
models to predict household' weekly home delivery frequencies. We used both
classical econometric and machine learning techniques to obtain the best model.
It is found that socioeconomic factors such as having an online grocery
membership, household members' average age, the percentage of male household
members, the number of workers in the household and various land use factors
influence home delivery demand. This study also compared the interpretations
and performances of the machine learning models and the classical econometric
model. Agreement is found in the variable's effects identified through the
machine learning and econometric models. However, with similar recall accuracy,
the ordered probit model, a classical econometric model, can accurately predict
the aggregate distribution of household delivery demand. In contrast, both
machine learning models failed to match the observed distribution.
arXiv link: http://arxiv.org/abs/2209.10664v1
Efficient Integrated Volatility Estimation in the Presence of Infinite Variation Jumps via Debiased Truncated Realized Variations
observations has been an active research area for more than two decades. One of
the most well-known and widely studied problems has been the estimation of the
quadratic variation of the continuous component of an It\^o semimartingale with
jumps. Several rate- and variance-efficient estimators have been proposed in
the literature when the jump component is of bounded variation. However, to
date, very few methods can deal with jumps of unbounded variation. By
developing new high-order expansions of the truncated moments of a locally
stable L\'evy process, we propose a new rate- and variance-efficient volatility
estimator for a class of It\^o semimartingales whose jumps behave locally like
those of a stable L\'evy process with Blumenthal-Getoor index $Y\in (1,8/5)$
(hence, of unbounded variation). The proposed method is based on a two-step
debiasing procedure for the truncated realized quadratic variation of the
process and can also cover the case $Y<1$. Our Monte Carlo experiments indicate
that the method outperforms other efficient alternatives in the literature in
the setting covered by our theoretical framework.
arXiv link: http://arxiv.org/abs/2209.10128v3
The boosted HP filter is more general than you might think
concerning trend-cycle discovery in macroeconomic data, and boosting has
recently upgraded the popular HP filter to a modern machine learning device
suited to data-rich and rapid computational environments. This paper extends
boosting's trend determination capability to higher order integrated processes
and time series with roots that are local to unity. The theory is established
by understanding the asymptotic effect of boosting on a simple exponential
function. Given a universe of time series in FRED databases that exhibit
various dynamic patterns, boosting timely captures downturns at crises and
recoveries that follow.
arXiv link: http://arxiv.org/abs/2209.09810v2
A Dynamic Stochastic Block Model for Multidimensional Networks
functioning of the economy. Nevertheless, modeling the dynamics in network data
with multiple types of relationships is still a challenging issue. Stochastic
block models provide a parsimonious and flexible approach to network analysis.
We propose a new stochastic block model for multidimensional networks, where
layer-specific hidden Markov-chain processes drive the changes in community
formation. The changes in the block membership of a node in a given layer may
be influenced by its own past membership in other layers. This allows for
clustering overlap, clustering decoupling, or more complex relationships
between layers, including settings of unidirectional, or bidirectional,
non-linear Granger block causality. We address the overparameterization issue
of a saturated specification by assuming a Multi-Laplacian prior distribution
within a Bayesian framework. Data augmentation and Gibbs sampling are used to
make the inference problem more tractable. Through simulations, we show that
standard linear models and the pairwise approach are unable to detect block
causality in most scenarios. In contrast, our model can recover the true
Granger causality structure. As an application to international trade, we show
that our model offers a unified framework, encompassing community detection and
Gravity equation modeling. We found new evidence of block Granger causality of
trade agreements and flows and core-periphery structure in both layers on a
large sample of countries.
arXiv link: http://arxiv.org/abs/2209.09354v2
Statistical Treatment Rules under Social Interaction
interaction. We construct an analytical framework under the anonymous
interaction assumption, where the decision problem becomes choosing a treatment
fraction. We propose a multinomial empirical success (MES) rule that includes
the empirical success rule of Manski (2004) as a special case. We investigate
the non-asymptotic bounds of the expected utility based on the MES rule.
Finally, we prove that the MES rule achieves the asymptotic optimality with the
minimax regret criterion.
arXiv link: http://arxiv.org/abs/2209.09077v2
Causal Effect Estimation with Global Probabilistic Forecasting: A Case Study of the Impact of Covid-19 Lockdowns on Energy Demand
improve reliability, availability, security, and efficiency. This
implementation needs technological advancements, the development of standards
and regulations, as well as testing and planning. Smart grid load forecasting
and management are critical for reducing demand volatility and improving the
market mechanism that connects generators, distributors, and retailers. During
policy implementations or external interventions, it is necessary to analyse
the uncertainty of their impact on the electricity demand to enable a more
accurate response of the system to fluctuating demand. This paper analyses the
uncertainties of external intervention impacts on electricity demand. It
implements a framework that combines probabilistic and global forecasting
models using a deep learning approach to estimate the causal impact
distribution of an intervention. The causal effect is assessed by predicting
the counterfactual distribution outcome for the affected instances and then
contrasting it to the real outcomes. We consider the impact of Covid-19
lockdowns on energy usage as a case study to evaluate the non-uniform effect of
this intervention on the electricity demand distribution. We could show that
during the initial lockdowns in Australia and some European countries, there
was often a more significant decrease in the troughs than in the peaks, while
the mean remained almost unaffected.
arXiv link: http://arxiv.org/abs/2209.08885v2
A Generalized Argmax Theorem with Applications
of estimators in many applications. The conclusion of the argmax theorem states
that the argmax of a sequence of stochastic processes converges in distribution
to the argmax of a limiting stochastic process. This paper generalizes the
argmax theorem to allow the maximization to take place over a sequence of
subsets of the domain. If the sequence of subsets converges to a limiting
subset, then the conclusion of the argmax theorem continues to hold. We
demonstrate the usefulness of this generalization in three applications:
estimating a structural break, estimating a parameter on the boundary of the
parameter space, and estimating a weakly identified parameter. The generalized
argmax theorem simplifies the proofs for existing results and can be used to
prove new results in these literatures.
arXiv link: http://arxiv.org/abs/2209.08793v1
A Structural Model for Detecting Communities in Networks
of a set of players embedded in sub-networks in the context of interaction and
learning. We characterize strategic network formation as a static game of
interactions where players maximize their utility depending on the connections
they establish and multiple interdependent actions that permit group-specific
parameters of players. It is challenging to apply this type of model to
real-life scenarios for two reasons: The computation of the Bayesian Nash
Equilibrium is highly demanding and the identification of social influence
requires the use of excluded variables that are oftentimes unavailable. Based
on the theoretical proposal, we propose a set of simulant equations and discuss
the identification of the social interaction effect employing multi-modal
network autoregressive.
arXiv link: http://arxiv.org/abs/2209.08380v2
Best Arm Identification with Contextual Information under a Small Gap
contextual (covariate) information. In each round of an adaptive experiment,
after observing contextual information, we choose a treatment arm using past
observations and current context. Our goal is to identify the best treatment
arm, which is a treatment arm with the maximal expected reward marginalized
over the contextual distribution, with a minimal probability of
misidentification. In this study, we consider a class of nonparametric bandit
models that converge to location-shift models when the gaps go to zero. First,
we derive lower bounds of the misidentification probability for a certain class
of strategies and bandit models (probabilistic models of potential outcomes)
under a small-gap regime. A small-gap regime is a situation where gaps of the
expected rewards between the best and suboptimal treatment arms go to zero,
which corresponds to one of the worst cases in identifying the best treatment
arm. We then develop the “Random Sampling (RS)-Augmented Inverse Probability
weighting (AIPW) strategy,” which is asymptotically optimal in the sense that
the probability of misidentification under the strategy matches the lower bound
when the budget goes to infinity in the small-gap regime. The RS-AIPW strategy
consists of the RS rule tracking a target sample allocation ratio and the
recommendation rule using the AIPW estimator.
arXiv link: http://arxiv.org/abs/2209.07330v4
$ρ$-GNF: A Copula-based Sensitivity Analysis to Unobserved Confounding Using Normalizing Flows
observational studies using copulas and normalizing flows. Using the idea of
interventional equivalence of structural causal models, we develop $\rho$-GNF
($\rho$-graphical normalizing flow), where $\in[-1,+1]$ is a bounded
sensitivity parameter. This parameter represents the back-door non-causal
association due to unobserved confounding, and which is encoded with a Gaussian
copula. In other words, the $\rho$-GNF enables scholars to estimate the average
causal effect (ACE) as a function of $\rho$, while accounting for various
assumed strengths of the unobserved confounding. The output of the $\rho$-GNF
is what we denote as the $\rho_{curve}$ that provides the bounds for the ACE
given an interval of assumed $\rho$ values. In particular, the $\rho_{curve}$
enables scholars to identify the confounding strength required to nullify the
ACE, similar to other sensitivity analysis methods (e.g., the E-value).
Leveraging on experiments from simulated and real-world data, we show the
benefits of $\rho$-GNF. One benefit is that the $\rho$-GNF uses a Gaussian
copula to encode the distribution of the unobserved causes, which is commonly
used in many applied settings. This distributional assumption produces narrower
ACE bounds compared to other popular sensitivity analysis methods.
arXiv link: http://arxiv.org/abs/2209.07111v2
Do shared e-scooter services cause traffic accidents? Evidence from six European countries
accidents by exploiting variation in availability of e-scooter services,
induced by the staggered rollout across 93 cities in six countries.
Police-reported accidents in the average month increased by around 8.2% after
shared e-scooters were introduced. For cities with limited cycling
infrastructure and where mobility relies heavily on cars, estimated effects are
largest. In contrast, no effects are detectable in cities with high bike-lane
density. This heterogeneity suggests that public policy can play a crucial role
in mitigating accidents related to e-scooters and, more generally, to changes
in urban mobility.
arXiv link: http://arxiv.org/abs/2209.06870v2
Sample Fit Reliability
constant and varying the model. We propose methods to test and improve sample
fit by holding a model constant and varying the sample. Much as the bootstrap
is a well-known method to re-sample data and estimate the uncertainty of the
fit of parameters in a model, we develop Sample Fit Reliability (SFR) as a set
of computational methods to re-sample data and estimate the reliability of the
fit of observations in a sample. SFR uses Scoring to assess the reliability of
each observation in a sample, Annealing to check the sensitivity of results to
removing unreliable data, and Fitting to re-weight observations for more robust
analysis. We provide simulation evidence to demonstrate the advantages of using
SFR, and we replicate three empirical studies with treatment effects to
illustrate how SFR reveals new insights about each study.
arXiv link: http://arxiv.org/abs/2209.06631v1
Carbon Monitor-Power: near-real-time monitoring of global power generation on hourly to daily scales
dataset: Carbon Monitor-Power since January, 2016 at national levels with
near-global coverage and hourly-to-daily time resolution. The data presented
here are collected from 37 countries across all continents for eight source
groups, including three types of fossil sources (coal, gas, and oil), nuclear
energy and four groups of renewable energy sources (solar energy, wind energy,
hydro energy and other renewables including biomass, geothermal, etc.). The
global near-real-time power dataset shows the dynamics of the global power
system, including its hourly, daily, weekly and seasonal patterns as influenced
by daily periodical activities, weekends, seasonal cycles, regular and
irregular events (i.e., holidays) and extreme events (i.e., the COVID-19
pandemic). The Carbon Monitor-Power dataset reveals that the COVID-19 pandemic
caused strong disruptions in some countries (i.e., China and India), leading to
a temporary or long-lasting shift to low carbon intensity, while it had only
little impact in some other countries (i.e., Australia). This dataset offers a
large range of opportunities for power-related scientific research and
policy-making.
arXiv link: http://arxiv.org/abs/2209.06086v1
Estimation of Average Derivatives of Latent Regressors: With an Application to Inference on Buffer-Stock Saving
two noisy measures of a latent regressor. Both measures have classical errors
with possibly asymmetric distributions. We show that the proposed estimator
achieves the root-n rate of convergence, and derive its asymptotic normal
distribution for statistical inference. Simulation studies demonstrate
excellent small-sample performance supporting the root-n asymptotic normality.
Based on the proposed estimator, we construct a formal test on the sub-unity of
the marginal propensity to consume out of permanent income (MPCP) under a
nonparametric consumption model and a permanent-transitory model of income
dynamics with nonparametric distribution. Applying the test to four recent
waves of U.S. Panel Study of Income Dynamics (PSID), we reject the null
hypothesis of the unit MPCP in favor of a sub-unit MPCP, supporting the
buffer-stock model of saving.
arXiv link: http://arxiv.org/abs/2209.05914v1
Bayesian Functional Emulation of CO2 Emissions on Future Climate Change Scenarios
integrated assessment model ensemble, based on a functional regression
framework. Inference on the unknown parameters is carried out through a mixed
effects hierarchical model using a fully Bayesian framework with a prior
distribution on the vector of all parameters. We also suggest an autoregressive
parameterization of the covariance matrix of the error, with matching marginal
prior. In this way, we allow for a functional framework for the discretized
output of the simulators that allows their time continuous evaluation.
arXiv link: http://arxiv.org/abs/2209.05767v1
Testing Endogeneity of Spatial Weights Matrices in Spatial Dynamic Panel Data Models
spatial weights matrices in a spatial dynamic panel data (SDPD) model (Qu, Lee,
and Yu, 2017). I firstly introduce the bias-corrected score function since the
score function is not centered around zero due to the two-way fixed effects. I
further adjust score functions to rectify the over-rejection of the null
hypothesis under a presence of local misspecification in contemporaneous
dependence over space, dependence over time, or spatial time dependence. I then
derive the explicit forms of our test statistic. A Monte Carlo simulation
supports the analytics and shows nice finite sample properties. Finally, an
empirical illustration is provided using data from Penn World Table version
6.1.
arXiv link: http://arxiv.org/abs/2209.05563v1
Evidence and Strategy on Economic Distance in Spatially Augmented Solow-Swan Growth Model
1939; Domar, 1946; Solow, 1956; Swan 1956; Mankiw, Romer, and Weil, 1992).
Recently, starting from the neoclassical growth model, Ertur and Koch (2007)
developed the spatially augmented Solow-Swan growth model with the exogenous
spatial weights matrices ($W$). While the exogenous $W$ assumption could be
true only with the geographical/physical distance, it may not be true when
economic/social distances play a role. Using Penn World Table version 7.1,
which covers year 1960-2010, I conducted the robust Rao's score test (Bera,
Dogan, and Taspinar, 2018) to determine if $W$ is endogeonus and used the
maximum likelihood estimation (Qu and Lee, 2015). The key finding is that the
significance and positive effects of physical capital externalities and spatial
externalities (technological interdependence) in Ertur and Koch (2007) were no
longer found with the exogenous $W$, but still they were with the endogenous
$W$ models. I also found an empirical strategy on which economic distance to
use when the data recently has been under heavy shocks of the worldwide
financial crises during year 1996-2010.
arXiv link: http://arxiv.org/abs/2209.05562v1
Testing the martingale difference hypothesis in high dimension
high-dimensional time series. Our test is built on the sum of squares of the
element-wise max-norm of the proposed matrix-valued nonlinear dependence
measure at different lags. To conduct the inference, we approximate the null
distribution of our test statistic by Gaussian approximation and provide a
simulation-based approach to generate critical values. The asymptotic behavior
of the test statistic under the alternative is also studied. Our approach is
nonparametric as the null hypothesis only assumes the time series concerned is
martingale difference without specifying any parametric forms of its
conditional moments. As an advantage of Gaussian approximation, our test is
robust to the cross-series dependence of unknown magnitude. To the best of our
knowledge, this is the first valid test for the martingale difference
hypothesis that not only allows for large dimension but also captures nonlinear
serial dependence. The practical usefulness of our test is illustrated via
simulation and a real data analysis. The test is implemented in a user-friendly
R-function.
arXiv link: http://arxiv.org/abs/2209.04770v2
Heterogeneous Treatment Effect Bounds under Sample Selection with an Application to the Effects of Social Media on Political Polarization
causal effect parameters in general sample selection models where the treatment
can affect whether an outcome is observed and no exclusion restrictions are
available. The method provides conditional effect bounds as functions of policy
relevant pre-treatment variables. It allows for conducting valid statistical
inference on the unidentified conditional effects. We use a flexible
debiased/double machine learning approach that can accommodate non-linear
functional forms and high-dimensional confounders. Easily verifiable high-level
conditions for estimation, misspecification robust confidence intervals, and
uniform confidence bands are provided as well. We re-analyze data from a large
scale field experiment on Facebook on counter-attitudinal news subscription
with attrition. Our method yields substantially tighter effect bounds compared
to conventional methods and suggests depolarization effects for younger users.
arXiv link: http://arxiv.org/abs/2209.04329v5
W-Transformers : A Wavelet-based Transformer Framework for Univariate Time Series Forecasting
in many vital areas such as natural language processing, computer vision,
anomaly detection, and recommendation systems, among many others. Among several
merits of transformers, the ability to capture long-range temporal dependencies
and interactions is desirable for time series forecasting, leading to its
progress in various time series applications. In this paper, we build a
transformer model for non-stationary time series. The problem is challenging
yet crucially important. We present a novel framework for univariate time
series representation learning based on the wavelet-based transformer encoder
architecture and call it W-Transformer. The proposed W-Transformers utilize a
maximal overlap discrete wavelet transformation (MODWT) to the time series data
and build local transformers on the decomposed datasets to vividly capture the
nonstationarity and long-range nonlinear dependencies in the time series.
Evaluating our framework on several publicly available benchmark time series
datasets from various domains and with diverse characteristics, we demonstrate
that it performs, on average, significantly better than the baseline
forecasters for short-term and long-term forecasting, even for datasets that
consist of only a few hundred training samples.
arXiv link: http://arxiv.org/abs/2209.03945v1
Modified Causal Forest
decisions at various levels of granularity provides substantial value to
decision makers. This paper develops estimation and inference procedures for
multiple treatment models in a selection-on-observed-variables framework by
modifying the Causal Forest approach (Wager and Athey, 2018) in several
dimensions. The new estimators have desirable theoretical, computational, and
practical properties for various aggregation levels of the causal effects.
While an Empirical Monte Carlo study suggests that they outperform previously
suggested estimators, an application to the evaluation of an active labour
market pro-gramme shows their value for applied research.
arXiv link: http://arxiv.org/abs/2209.03744v1
A Ridge-Regularised Jackknifed Anderson-Rubin Test
with few included exogenous covariates but many instruments -- possibly more
than the number of observations. We show that a ridge-regularised version of
the jackknifed Anderson Rubin (1949, henceforth AR) test controls asymptotic
size in the presence of heteroskedasticity, and when the instruments may be
arbitrarily weak. Asymptotic size control is established under weaker
assumptions than those imposed for recently proposed jackknifed AR tests in the
literature. Furthermore, ridge-regularisation extends the scope of jackknifed
AR tests to situations in which there are more instruments than observations.
Monte-Carlo simulations indicate that our method has favourable finite-sample
size and power properties compared to recently proposed alternative approaches
in the literature. An empirical application on the elasticity of substitution
between immigrants and natives in the US illustrates the usefulness of the
proposed method for practitioners.
arXiv link: http://arxiv.org/abs/2209.03259v2
Local Projection Inference in High Dimensions
high-dimensional settings. We use the desparsified (de-biased) lasso to
estimate the high-dimensional local projections, while leaving the impulse
response parameter of interest unpenalized. We establish the uniform asymptotic
normality of the proposed estimator under general conditions. Finally, we
demonstrate small sample performance through a simulation study and consider
two canonical applications in macroeconomic research on monetary policy and
government spending.
arXiv link: http://arxiv.org/abs/2209.03218v3
An Assessment Tool for Academic Research Managers in the Third World
for identifying talented candidates for promotion and funding. A key tool for
this is the use of the indexes provided by Web of Science and SCOPUS, costly
databases that sometimes exceed the possibilities of academic institutions in
many parts of the world. We show here how the data in one of the bases can be
used to infer the main index of the other one. Methods of data analysis used in
Machine Learning allow us to select just a few of the hundreds of variables in
a database, which later are used in a panel regression, yielding a good
approximation to the main index in the other database. Since the information of
SCOPUS can be freely scraped from the Web, this approach allows to infer for
free the Impact Factor of publications, the main index used in research
assessments around the globe.
arXiv link: http://arxiv.org/abs/2209.03199v1
Rethinking Generalized Beta Family of Distributions
mean-reverting stochastic differential equation (SDE) for a power of the
variable, whose steady-state (stationary) probability density function (PDF) is
a modified GB (mGB) distribution. The SDE approach allows for a lucid
explanation of Generalized Beta Prime (GB2) and Generalized Beta (GB1) limits
of GB distribution and, further down, of Generalized Inverse Gamma (GIGa) and
Generalized Gamma (GGa) limits, as well as describe the transition between the
latter two. We provide an alternative form to the "traditional" GB PDF to
underscore that a great deal of usefulness of GB distribution lies in its
allowing a long-range power-law behavior to be ultimately terminated at a
finite value. We derive the cumulative distribution function (CDF) of the
"traditional" GB, which belongs to the family generated by the regularized beta
function and is crucial for analysis of the tails of the distribution. We
analyze fifty years of historical data on realized market volatility,
specifically for S&P500, as a case study of the use of GB/mGB distributions
and show that its behavior is consistent with that of negative Dragon Kings.
arXiv link: http://arxiv.org/abs/2209.05225v1
Bayesian Mixed-Frequency Quantile Vector Autoregression: Eliciting tail risks of Monthly US GDP
essential role in both economic policy and private sector decisions. However,
the informational content of low-frequency variables and the results from
conditional mean models provide only limited evidence to investigate this
problem. We propose a novel mixed-frequency quantile vector autoregression
(MF-QVAR) model to address this issue. Inspired by the univariate Bayesian
quantile regression literature, the multivariate asymmetric Laplace
distribution is exploited under the Bayesian framework to form the likelihood.
A data augmentation approach coupled with a precision sampler efficiently
estimates the missing low-frequency variables at higher frequencies under the
state-space representation. The proposed methods allow us to nowcast
conditional quantiles for multiple variables of interest and to derive
quantile-related risk measures at high frequency, thus enabling timely policy
interventions. The main application of the model is to nowcast conditional
quantiles of the US GDP, which is strictly related to the quantification of
Value-at-Risk and the Expected Shortfall.
arXiv link: http://arxiv.org/abs/2209.01910v1
Robust Causal Learning for the Estimation of Average Treatment Effects
estimate the average treatment effect (ATE) from observational data. The
Double/Debiased Machine Learning (DML) is one of the prevalent methods to
estimate ATE in the observational study. However, the DML estimators can suffer
an error-compounding issue and even give an extreme estimate when the
propensity scores are misspecified or very close to 0 or 1. Previous studies
have overcome this issue through some empirical tricks such as propensity score
trimming, yet none of the existing literature solves this problem from a
theoretical standpoint. In this paper, we propose a Robust Causal Learning
(RCL) method to offset the deficiencies of the DML estimators. Theoretically,
the RCL estimators i) are as consistent and doubly robust as the DML
estimators, and ii) can get rid of the error-compounding issue. Empirically,
the comprehensive experiments show that i) the RCL estimators give more stable
estimations of the causal parameters than the DML estimators, and ii) the RCL
estimators outperform the traditional estimators and their variants when
applying different machine learning models on both simulation and benchmark
datasets.
arXiv link: http://arxiv.org/abs/2209.01805v1
Combining Forecasts under Structural Breaks Using Graphical LASSO
a machine learning algorithm called Graphical LASSO (GL). We visualize forecast
errors from different forecasters as a network of interacting entities and
generalize network inference in the presence of common factor structure and
structural breaks. First, we note that forecasters often use common information
and hence make common mistakes, which makes the forecast errors exhibit common
factor structures. We use the Factor Graphical LASSO (FGL, Lee and Seregina
(2023)) to separate common forecast errors from the idiosyncratic errors and
exploit sparsity of the precision matrix of the latter. Second, since the
network of experts changes over time as a response to unstable environments
such as recessions, it is unreasonable to assume constant forecast combination
weights. Hence, we propose Regime-Dependent Factor Graphical LASSO (RD-FGL)
that allows factor loadings and idiosyncratic precision matrix to be
regime-dependent. We develop its scalable implementation using the Alternating
Direction Method of Multipliers (ADMM) to estimate regime-dependent forecast
combination weights. The empirical application to forecasting macroeconomic
series using the data of the European Central Bank's Survey of Professional
Forecasters (ECB SPF) demonstrates superior performance of a combined forecast
using FGL and RD-FGL.
arXiv link: http://arxiv.org/abs/2209.01697v2
Instrumental variable quantile regression under random right censoring
variables and random right censoring. The endogeneity issue is solved using
instrumental variables. It is assumed that the structural quantile of the
logarithm of the outcome variable is linear in the covariates and censoring is
independent. The regressors and instruments can be either continuous or
discrete. The specification generates a continuum of equations of which the
quantile regression coefficients are a solution. Identification is obtained
when this system of equations has a unique solution. Our estimation procedure
solves an empirical analogue of the system of equations. We derive conditions
under which the estimator is asymptotically normal and prove the validity of a
bootstrap procedure for inference. The finite sample performance of the
approach is evaluated through numerical simulations. An application to the
national Job Training Partnership Act study illustrates the method.
arXiv link: http://arxiv.org/abs/2209.01429v2
Instrumental variables with unordered treatments: Theory and evidence from returns to fields of study
how one may combine instruments for multiple unordered treatments with
information about individuals' ranking of these treatments to achieve
identification while allowing for both observed and unobserved heterogeneity in
treatment effects. We show that the key assumptions underlying their
identification argument have testable implications. We also provide a new
characterization of the bias that may arise if these assumptions are violated.
Taken together, these results allow researchers not only to test the underlying
assumptions, but also to argue whether the bias from violation of these
assumptions are likely to be economically meaningful. Guided and motivated by
these results, we estimate and compare the earnings payoffs to post-secondary
fields of study in Norway and Denmark. In each country, we apply the
identification argument of Kirkeboen et al. (2016) to data on individuals'
ranking of fields of study and field-specific instruments from discontinuities
in the admission systems. We empirically examine whether and why the payoffs to
fields of study differ across the two countries. We find strong cross-country
correlation in the payoffs to fields of study, especially after removing fields
with violations of the assumptions underlying the identification argument.
arXiv link: http://arxiv.org/abs/2209.00417v3
A Unified Framework for Estimation of High-dimensional Conditional Factor Models
conditional factor models via nuclear norm regularization. We establish large
sample properties of the estimators, and provide an efficient computing
algorithm for finding the estimators as well as a cross validation procedure
for choosing the regularization parameter. The general framework allows us to
estimate a variety of conditional factor models in a unified way and quickly
deliver new asymptotic results. We apply the method to analyze the cross
section of individual US stock returns, and find that imposing homogeneity may
improve the model's out-of-sample predictability.
arXiv link: http://arxiv.org/abs/2209.00391v1
Switchback Experiments under Geometric Mixing
repeatedly turning an intervention on and off for a whole system. Switchback
experiments are a robust way to overcome cross-unit spillover effects; however,
they are vulnerable to bias from temporal carryovers. In this paper, we
consider properties of switchback experiments in Markovian systems that mix at
a geometric rate. We find that, in this setting, standard switchback designs
suffer considerably from carryover bias: Their estimation error decays as
$T^{-1/3}$ in terms of the experiment horizon $T$, whereas in the absence of
carryovers a faster rate of $T^{-1/2}$ would have been possible. We also show,
however, that judicious use of burn-in periods can considerably improve the
situation, and enables errors that decay as $\log(T)^{1/2}T^{-1/2}$. Our formal
results are mirrored in an empirical evaluation.
arXiv link: http://arxiv.org/abs/2209.00197v3
Modeling Volatility and Dependence of European Carbon and Energy Prices
their uncertainty and dependencies on related energy prices (natural gas, coal,
and oil). We propose a probabilistic multivariate conditional time series model
with a VECM-Copula-GARCH structure which exploits key characteristics of the
data. Data are normalized with respect to inflation and carbon emissions to
allow for proper cross-series evaluation. The forecasting performance is
evaluated in an extensive rolling-window forecasting study, covering eight
years out-of-sample. We discuss our findings for both levels- and
log-transformed data, focusing on time-varying correlations, and in view of the
Russian invasion of Ukraine.
arXiv link: http://arxiv.org/abs/2208.14311v4
A Consistent ICM-based $χ^2$ Specification Test
specification tests, they are not commonly used in empirical practice owing to,
e.g., the non-pivotality of the test and the high computational cost of
available bootstrap schemes especially in large samples. This paper proposes
specification and mean independence tests based on a class of ICM metrics
termed the generalized martingale difference divergence (GMDD). The proposed
tests exhibit consistency, asymptotic $\chi^2$-distribution under the null
hypothesis, and computational efficiency. Moreover, they demonstrate robustness
to heteroskedasticity of unknown form and can be adapted to enhance power
towards specific alternatives. A power comparison with classical
bootstrap-based ICM tests using Bahadur slopes is also provided. Monte Carlo
simulations are conducted to showcase the proposed tests' excellent size
control and competitive power.
arXiv link: http://arxiv.org/abs/2208.13370v2
Safe Policy Learning under Regression Discontinuity Designs with Multiple Cutoffs
evaluation with observational data. The primary focus of the existing
literature has been the estimation of the local average treatment effect at the
existing treatment cutoff. In contrast, we consider policy learning under the
RD design. Because the treatment assignment mechanism is deterministic,
learning better treatment cutoffs requires extrapolation. We develop a robust
optimization approach to finding optimal treatment cutoffs that improve upon
the existing ones. We first decompose the expected utility into
point-identifiable and unidentifiable components. We then propose an efficient
doubly-robust estimator for the identifiable parts. To account for the
unidentifiable components, we leverage the existence of multiple cutoffs that
are common under the RD design. Specifically, we assume that the heterogeneity
in the conditional expectations of potential outcomes across different groups
vary smoothly along the running variable. Under this assumption, we minimize
the worst case utility loss relative to the status quo policy. The resulting
new treatment cutoffs have a safety guarantee that they will not yield a worse
overall outcome than the existing cutoffs. Finally, we establish the asymptotic
regret bounds for the learned policy using semi-parametric efficiency theory.
We apply the proposed methodology to empirical and simulated data sets.
arXiv link: http://arxiv.org/abs/2208.13323v4
Comparing Stochastic Volatility Specifications for Large Bayesian VARs
volatility have become increasingly popular in empirical macroeconomics. One
main difficulty for practitioners is to choose the most suitable stochastic
volatility specification for their particular application. We develop Bayesian
model comparison methods -- based on marginal likelihood estimators that
combine conditional Monte Carlo and adaptive importance sampling -- to choose
among a variety of stochastic volatility specifications. The proposed methods
can also be used to select an appropriate shrinkage prior on the VAR
coefficients, which is a critical component for avoiding over-fitting in
high-dimensional settings. Using US quarterly data of different dimensions, we
find that both the Cholesky stochastic volatility and factor stochastic
volatility outperform the common stochastic volatility specification. Their
superior performance, however, can mostly be attributed to the more flexible
priors that accommodate cross-variable shrinkage.
arXiv link: http://arxiv.org/abs/2208.13255v1
An agent-based modeling approach for real-world economic systems: Example and calibration with a Social Accounting Matrix of Spain
relevance in recent decades. A frequent observation by policy makers is the
lack of tools that help at least to understand, if not predict, economic
crises. Currently, macroeconomic modeling is dominated by Dynamic Stochastic
General Equilibrium (DSGE) models. The limitations of DSGE in coping with the
complexity of today's global economy are often recognized and are the subject
of intense research to find possible solutions. As an alternative or complement
to DSGE, the last two decades have seen the rise of agent-based models (ABM).
An attractive feature of ABM is that it can model very complex systems because
it is a bottom-up approach that can describe the specific behavior of
heterogeneous agents. The main obstacle, however, is the large number of
parameters that need to be known or calibrated. To enable the use of ABM with
data from the real-world economy, this paper describes an agent-based
macroeconomic modeling approach that can read a Social Accounting Matrix (SAM)
and deploy from scratch an economic system (labor, activity sectors operating
as firms, a central bank, the government, external sectors...) whose structure
and activity produce a SAM with values very close to those of the actual SAM
snapshot. This approach paves the way for unleashing the expected high
performance of ABM models to deal with the complexities of current global
macroeconomics, including other layers of interest like ecology, epidemiology,
or social networks among others.
arXiv link: http://arxiv.org/abs/2208.13254v3
A Descriptive Method of Firm Size Transition Dynamics Using Markov Chain
determines the prosperity and stability of a country. As time passing, the
fluctuations of firm employment can reflect the process of creating or
destroying jobs. Therefore, it is instructive to investigate the firm
employment (size) dynamics. Drawing on the firm-level panel data extracted from
the Chinese Industrial Enterprises Database 1998-2013, this paper proposes a
Markov-chain-based descriptive approach to clearly demonstrate the firm size
transfer dynamics between different size categories. With this method, any firm
size transition path in a short time period can be intuitively demonstrated.
Furthermore, by utilizing the properties of Markov transfer matrices, the
definition of transition trend and the transition entropy are introduced and
estimated. As a result, the tendency of firm size transfer between small,
medium and large can be exactly revealed, and the uncertainty of size change
can be quantified. Generally from the evidence of this paper, it can be
inferred that small and medium manufacturing firms in China have greater job
creation potentials compared to large firms over this time period.
arXiv link: http://arxiv.org/abs/2208.13012v1
A restricted eigenvalue condition for unit-root non-stationary data
non-stationary data and derive its validity under the assumption of independent
Gaussian innovations that may be contemporaneously correlated. The method of
proof relies on matrix concentration inequalities and offers sufficient
flexibility to enable extensions of our results to alternative time series
settings. As an application of this result, we show the consistency of the
lasso estimator on ultra high-dimensional cointegrated data in which the number
of integrated regressors may grow exponentially in relation to the sample size.
arXiv link: http://arxiv.org/abs/2208.12990v1
Large Volatility Matrix Analysis Using Global and National Factor Models
based on the latent factor model. They often assumed that there are a few of
common factors, which can account for volatility dynamics. However, several
studies have demonstrated the presence of local factors. In particular, when
analyzing the global stock market, we often observe that nation-specific
factors explain their own country's volatility dynamics. To account for this,
we propose the Double Principal Orthogonal complEment Thresholding
(Double-POET) method, based on multi-level factor models, and also establish
its asymptotic properties. Furthermore, we demonstrate the drawback of using
the regular principal orthogonal component thresholding (POET) when the local
factor structure exists. We also describe the blessing of dimensionality using
Double-POET for local covariance matrix estimation. Finally, we investigate the
performance of the Double-POET estimator in an out-of-sample portfolio
allocation study using international stocks from 20 financial markets.
arXiv link: http://arxiv.org/abs/2208.12323v2
What Impulse Response Do Instrumental Variables Identify?
that the local projection-IV (LP-IV) estimand aggregates component-wise impulse
responses with potentially negative weights, challenging its causal
interpretation. To address this, we propose identification strategies using
multiple sign-restricted IVs or disaggregated data, which recover structurally
meaningful responses even when individual LP-IV estimands are non-causal. We
also show that, under weak stationarity, the identified sets are sharp and
cannot be further narrowed in some key cases. Applications to fiscal and
monetary policy demonstrate the practical value of our approach.
arXiv link: http://arxiv.org/abs/2208.11828v3
Robust Tests of Model Incompleteness in the Presence of Nuisance Parameters
admit certain policy-relevant features such as strategic interaction,
self-selection, or state dependence. We develop a novel test of model
incompleteness and analyze its asymptotic properties. A key observation is that
one can identify the least-favorable parametric model that represents the most
challenging scenario for detecting local alternatives without knowledge of the
selection mechanism. We build a robust test of incompleteness on a score
function constructed from such a model. The proposed procedure remains
computationally tractable even with nuisance parameters because it suffices to
estimate them only under the null hypothesis of model completeness. We
illustrate the test by applying it to a market entry model and a triangular
model with a set-valued control function.
arXiv link: http://arxiv.org/abs/2208.11281v2
Beta-Sorted Portfolios
covariation to selected risk factors -- are a popular tool in empirical finance
to analyze models of (conditional) expected returns. Despite their widespread
use, little is known of their statistical properties in contrast to comparable
procedures such as two-pass regressions. We formally investigate the properties
of beta-sorted portfolio returns by casting the procedure as a two-step
nonparametric estimator with a nonparametric first step and a beta-adaptive
portfolios construction. Our framework rationalize the well-known estimation
algorithm with precise economic and statistical assumptions on the general data
generating process and characterize its key features. We study beta-sorted
portfolios for both a single cross-section as well as for aggregation over time
(e.g., the grand mean), offering conditions that ensure consistency and
asymptotic normality along with new uniform inference procedures allowing for
uncertainty quantification and testing of various relevant hypotheses in
financial applications. We also highlight some limitations of current empirical
practices and discuss what inferences can and cannot be drawn from returns to
beta-sorted portfolios for either a single cross-section or across the whole
sample. Finally, we illustrate the functionality of our new procedures in an
empirical application.
arXiv link: http://arxiv.org/abs/2208.10974v3
pystacked: Stacking generalization and machine learning in Stata
and binary classification via Python's scikit-learn. Stacking combines multiple
supervised machine learners -- the "base" or "level-0" learners -- into a
single learner. The currently supported base learners include regularized
regression, random forest, gradient boosted trees, support vector machines, and
feed-forward neural nets (multi-layer perceptron). pystacked can also be used
with as a `regular' machine learning program to fit a single base learner and,
thus, provides an easy-to-use API for scikit-learn's machine learning
algorithms.
arXiv link: http://arxiv.org/abs/2208.10896v2
Optimal Pre-Analysis Plans: Statistical Decisions Subject to Implementability
We model the interaction between an agent who analyzes data and a principal who
makes a decision based on agent reports. The agent could be the manufacturer of
a new drug, and the principal a regulator deciding whether the drug is
approved. Or the agent could be a researcher submitting a research paper, and
the principal an editor deciding whether it is published. The agent decides
which statistics to report to the principal. The principal cannot verify
whether the analyst reported selectively. Absent a pre-analysis message, if
there are conflicts of interest, then many desirable decision rules cannot be
implemented. Allowing the agent to send a message before seeing the data
increases the set of decision rules that can be implemented, and allows the
principal to leverage agent expertise. The optimal mechanisms that we
characterize require pre-analysis plans. Applying these results to hypothesis
testing, we show that optimal rejection rules pre-register a valid test, and
make worst-case assumptions about unreported statistics. Optimal tests can be
found as a solution to a linear-programming problem.
arXiv link: http://arxiv.org/abs/2208.09638v3
Deep Learning for Choice Modeling
preference or utility across many fields including economics, marketing,
operations research, and psychology. While the vast majority of the literature
on choice models has been devoted to the analytical properties that lead to
managerial and policy-making insights, the existing methods to learn a choice
model from empirical data are often either computationally intractable or
sample inefficient. In this paper, we develop deep learning-based choice models
under two settings of choice modeling: (i) feature-free and (ii) feature-based.
Our model captures both the intrinsic utility for each candidate choice and the
effect that the assortment has on the choice probability. Synthetic and real
data experiments demonstrate the performances of proposed models in terms of
the recovery of the existing choice models, sample complexity, assortment
effect, architecture design, and model interpretation.
arXiv link: http://arxiv.org/abs/2208.09325v1
Understanding Volatility Spillover Relationship Among G7 Nations And India During Covid-19
to capture the interconnectedness and volatility transmission dynamics. The
nature of change in volatility spillover effects and time-varying conditional
correlation among the G7 countries and India is investigated. Methodology: To
assess the volatility spillover effects, the bivariate BEKK and t- DCC (1,1)
GARCH (1,1) models have been used. Our research shows how the dynamics of
volatility spillover between India and the G7 countries shift before and during
COVID-19. Findings: The findings reveal that the extent of volatility spillover
has altered during COVID compared to the pre-COVID environment. During this
pandemic, a sharp increase in conditional correlation indicates an increase in
systematic risk between countries. Originality: The study contributes to a
better understanding of the dynamics of volatility spillover between G7
countries and India. Asset managers and foreign corporations can use the
changing spillover dynamics to improve investment decisions and implement
effective hedging measures to protect their interests. Furthermore, this
research will assist financial regulators in assessing market risk in the
future owing to crises like as COVID-19.
arXiv link: http://arxiv.org/abs/2208.09148v1
On the Estimation of Peer Effects for Sampled Networks
observed networks under the new inferential paradigm of design identification,
which characterizes the missing data challenge arising with sampled networks
with the central idea that two full data versions which are topologically
compatible with the observed data may give rise to two different probability
distributions. We show that peer effects cannot be identified by design when
network links between sampled and unsampled units are not observed. Under
realistic modeling conditions, and under the assumption that sampled units
report on the size of their network of contacts, the asymptotic bias arising
from estimating peer effects with incomplete network data is characterized, and
a bias-corrected estimator is proposed. The finite sample performance of our
methodology is investigated via simulations.
arXiv link: http://arxiv.org/abs/2208.09102v1
Matrix Quantile Factor Model
with low-rank structure. We estimate the row and column factor spaces via
minimizing the empirical check loss function with orthogonal rotation
constraints. We show that the estimates converge at rate
$(\min\{p_1p_2,p_2T,p_1T\})^{-1/2}$ in the average Frobenius norm, where $p_1$,
$p_2$ and $T$ are the row dimensionality, column dimensionality and length of
the matrix sequence, respectively. This rate is faster than that of the
quantile estimates via “flattening" the matrix model into a large vector
model. To derive the central limit theorem, we introduce a novel augmented
Lagrangian function, which is equivalent to the original constrained empirical
check loss minimization problem. Via the equivalence, we prove that the Hessian
matrix of the augmented Lagrangian function is locally positive definite,
resulting in a locally convex penalized loss function around the true factors
and their loadings. This easily leads to a feasible second-order expansion of
the score function and readily established central limit theorems of the
smoothed estimates of the loadings. We provide three consistent criteria to
determine the pair of row and column factor numbers. Extensive simulation
studies and an empirical study justify our theory.
arXiv link: http://arxiv.org/abs/2208.08693v3
Inference on Strongly Identified Functionals of Weakly Identified Functions
(NPIV) analysis, proximal causal inference under unmeasured confounding, and
missing-not-at-random data with shadow variables, we are interested in
inference on a continuous linear functional (e.g., average causal effects) of
nuisance function (e.g., NPIV regression) defined by conditional moment
restrictions. These nuisance functions are generally weakly identified, in that
the conditional moment restrictions can be severely ill-posed as well as admit
multiple solutions. This is sometimes resolved by imposing strong conditions
that imply the function can be estimated at rates that make inference on the
functional possible. In this paper, we study a novel condition for the
functional to be strongly identified even when the nuisance function is not;
that is, the functional is amenable to asymptotically-normal estimation at
$n$-rates. The condition implies the existence of debiasing nuisance
functions, and we propose penalized minimax estimators for both the primary and
debiasing nuisance functions. The proposed nuisance estimators can accommodate
flexible function classes, and importantly they can converge to fixed limits
determined by the penalization regardless of the identifiability of the
nuisances. We use the penalized nuisance estimators to form a debiased
estimator for the functional of interest and prove its asymptotic normality
under generic high-level conditions, which provide for asymptotically valid
confidence intervals. We also illustrate our method in a novel partially linear
proximal causal inference problem and a partially linear instrumental variable
regression problem.
arXiv link: http://arxiv.org/abs/2208.08291v3
Time is limited on the road to asymptopia
(FABMs) is to infer reliable insights using numerical simulations validated by
only a single observed time series. Ergodicity (besides stationarity) is a
strong precondition for any estimation, however it has not been systematically
explored and is often simply presumed. For finite-sample lengths and limited
computational resources empirical estimation always takes place in
pre-asymptopia. Thus broken ergodicity must be considered the rule, but it
remains largely unclear how to deal with the remaining uncertainty in
non-ergodic observables. Here we show how an understanding of the ergodic
properties of moment functions can help to improve the estimation of (F)ABMs.
We run Monte Carlo experiments and study the convergence behaviour of moment
functions of two prototype models. We find infeasibly-long convergence times
for most. Choosing an efficient mix of ensemble size and simulated time length
guided our estimation and might help in general.
arXiv link: http://arxiv.org/abs/2208.08169v1
Characterizing M-estimators
general functionals by formally connecting the theory of consistent loss
functions from forecast evaluation with the theory of M-estimation. This novel
characterization result opens up the possibility for theoretical research on
efficient and equivariant M-estimation and, more generally, it allows to
leverage existing results on loss functions known from the literature of
forecast evaluation in estimation theory.
arXiv link: http://arxiv.org/abs/2208.08108v1
Optimal Recovery for Causal Inference
processing techniques. As an example, it is crucial to successfully quantify
the causal effects of an intervention to determine whether the intervention
achieved desired outcomes. We present a new geometric signal processing
approach to classical synthetic control called ellipsoidal optimal recovery
(EOpR), for estimating the unobservable outcome of a treatment unit. EOpR
provides policy evaluators with both worst-case and typical outcomes to help in
decision making. It is an approximation-theoretic technique that relates to the
theory of principal components, which recovers unknown observations given a
learned signal class and a set of known observations. We show EOpR can improve
pre-treatment fit and mitigate bias of the post-treatment estimate relative to
other methods in causal inference. Beyond recovery of the unit of interest, an
advantage of EOpR is that it produces worst-case limits over the estimates
produced. We assess our approach on artificially-generated data, on datasets
commonly used in the econometrics literature, and in the context of the
COVID-19 pandemic, showing better performance than baseline techniques
arXiv link: http://arxiv.org/abs/2208.06729v3
From the historical Roman road network to modern infrastructure in Italy
Empire in Italy, plays an important role today in facilitating the construction
of new infrastructure. This paper investigates the historical path of Roman
roads as main determinant of both motorways and railways in the country. The
empirical analysis shows how the modern Italian transport infrastructure
followed the path traced in ancient times by the Romans in constructing their
roads. Being paved and connecting Italy from North to South, consular
trajectories lasted in time, representing the starting physical capital for
developing the new transport networks.
arXiv link: http://arxiv.org/abs/2208.06675v1
A Nonparametric Approach with Marginals for Modeling Consumer Choice
challenge is to develop parsimonious models that describe and predict consumer
choice behavior while being amenable to prescriptive tasks such as pricing and
assortment optimization. The marginal distribution model (MDM) is one such
model, which requires only the specification of marginal distributions of the
random utilities. This paper aims to establish necessary and sufficient
conditions for given choice data to be consistent with the MDM hypothesis,
inspired by the usefulness of similar characterizations for the random utility
model (RUM). This endeavor leads to an exact characterization of the set of
choice probabilities that the MDM can represent. Verifying the consistency of
choice data with this characterization is equivalent to solving a
polynomial-sized linear program. Since the analogous verification task for RUM
is computationally intractable and neither of these models subsumes the other,
MDM is helpful in striking a balance between tractability and representational
power. The characterization is then used with robust optimization for making
data-driven sales and revenue predictions for new unseen assortments. When the
choice data lacks consistency with the MDM hypothesis, finding the best-fitting
MDM choice probabilities reduces to solving a mixed integer convex program.
Numerical results using real world data and synthetic data demonstrate that MDM
exhibits competitive representational power and prediction performance compared
to RUM and parametric models while being significantly faster in computation
than RUM.
arXiv link: http://arxiv.org/abs/2208.06115v6
Testing for homogeneous treatment effects in linear and nonparametric instrumental variable models
instrumental variables literature. This assumption signifies that treatment
effects are constant across all subjects. It allows to interpret instrumental
variable estimates as average treatment effects over the whole population of
the study. When this assumption does not hold, the bias of instrumental
variable estimators can be larger than that of naive estimators ignoring
endogeneity. This paper develops two tests for the assumption of homogeneous
treatment effects when the treatment is endogenous and an instrumental variable
is available. The tests leverage a covariable that is (jointly with the error
terms) independent of a coordinate of the instrument. This covariate does not
need to be exogenous. The first test assumes that the potential outcomes are
linear in the regressors and is computationally simple. The second test is
nonparametric and relies on Tikhonov regularization. The treatment can be
either discrete or continuous. We show that the tests have asymptotically
correct level and asymptotic power equal to one against a range of
alternatives. Simulations demonstrate that the proposed tests attain excellent
finite sample performances. The methodology is also applied to the evaluation
of returns to schooling and the effect of price on demand in a fish market.
arXiv link: http://arxiv.org/abs/2208.05344v4
Selecting Valid Instrumental Variables in Linear Models with Multiple Exposure Variables: Adaptive Lasso and the Median-of-Medians Estimator
effects of multiple confounded exposure/treatment variables on an outcome, we
investigate the adaptive Lasso method for selecting valid instrumental
variables from a set of available instruments that may contain invalid ones. An
instrument is invalid if it fails the exclusion conditions and enters the model
as an explanatory variable. We extend the results as developed in Windmeijer et
al. (2019) for the single exposure model to the multiple exposures case. In
particular we propose a median-of-medians estimator and show that the
conditions on the minimum number of valid instruments under which this
estimator is consistent for the causal effects are only moderately stronger
than the simple majority rule that applies to the median estimator for the
single exposure case. The adaptive Lasso method which uses the initial
median-of-medians estimator for the penalty weights achieves consistent
selection with oracle properties of the resulting IV estimator. This is
confirmed by some Monte Carlo simulation results. We apply the method to
estimate the causal effects of educational attainment and cognitive ability on
body mass index (BMI) in a Mendelian Randomization setting.
arXiv link: http://arxiv.org/abs/2208.05278v1
Endogeneity in Weakly Separable Models without Monotonicity
separable with a binary endogenous treatment. Vytlacil and Yildiz (2007)
proposed an identification strategy that exploits the mean of observed
outcomes, but their approach requires a monotonicity condition. In comparison,
we exploit full information in the entire outcome distribution, instead of just
its mean. As a result, our method does not require monotonicity and is also
applicable to general settings with multiple indices. We provide examples where
our approach can identify treatment effect parameters of interest whereas
existing methods would fail. These include models where potential outcomes
depend on multiple unobserved disturbance terms, such as a Roy model, a
multinomial choice model, as well as a model with endogenous random
coefficients. We establish consistency and asymptotic normality of our
estimators.
arXiv link: http://arxiv.org/abs/2208.05047v1
Finite Tests from Functional Characterizations
classes involves two main approaches. The first, known as the functional
approach, assumes access to an entire demand function. The second, the revealed
preference approach, constructs inequalities to test finite demand data. This
paper bridges these methods by using the functional approach to test finite
data through preference learnability results. We develop a computationally
efficient algorithm that generates tests for choice data based on functional
characterizations of preference families. We provide these restrictions for
various applications, including homothetic and weakly separable preferences,
where the latter's revealed preference characterization is provably NP-Hard. We
also address choice under uncertainty, offering tests for betweenness
preferences. Lastly, we perform a simulation exercise demonstrating that our
tests are effective in finite samples and accurately reject demands not
belonging to a specified class.
arXiv link: http://arxiv.org/abs/2208.03737v5
Strategic differences between regional investments into graphene technology and how corporations and universities manage patent portfolios
technology in the prevailing innovation model. To gain strategic advantages in
the technological competitions between regions, nations need to leverage the
investments of public and private funds to diversify over all technologies or
specialize in a small number of technologies. In this paper, we investigated
who the leaders are at the regional and assignee levels, how they attained
their leadership positions, and whether they adopted diversification or
specialization strategies, using a dataset of 176,193 patent records on
graphene between 1986 and 2017 downloaded from Derwent Innovation. By applying
a co-clustering method to the IPC subclasses in the patents and using a z-score
method to extract keywords from their titles and abstracts, we identified seven
graphene technology areas emerging in the sequence synthesis - composites -
sensors - devices - catalyst - batteries - water treatment. We then examined
the top regions in their investment preferences and their changes in rankings
over time and found that they invested in all seven technology areas. In
contrast, at the assignee level, some were diversified while others were
specialized. We found that large entities diversified their portfolios across
multiple technology areas, while small entities specialized around their core
competencies. In addition, we found that universities had higher entropy values
than corporations on average, leading us to the hypothesis that corporations
file, buy, or sell patents to enable product development. In contrast,
universities focus only on licensing their patents. We validated this
hypothesis through an aggregate analysis of reassignment and licensing and a
more detailed analysis of three case studies - SAMSUNG, RICE UNIVERSITY, and
DYSON.
arXiv link: http://arxiv.org/abs/2208.03719v1
Quantile Random-Coefficient Regression with Interactive Fixed Effects: Heterogeneous Group-Level Policy Evaluation
effects to study the effects of group-level policies that are heterogeneous
across individuals. Our approach is the first to use a latent factor structure
to handle the unobservable heterogeneities in the random coefficient. The
asymptotic properties and an inferential method for the policy estimators are
established. The model is applied to evaluate the effect of the minimum wage
policy on earnings between 1967 and 1980 in the United States. Our results
suggest that the minimum wage policy has significant and persistent positive
effects on black workers and female workers up to the median. Our results also
indicate that the policy helps reduce income disparity up to the median between
two groups: black, female workers versus white, male workers. However, the
policy is shown to have little effect on narrowing the income gap between low-
and high-income workers within the subpopulations.
arXiv link: http://arxiv.org/abs/2208.03632v3
Forecasting Algorithms for Causal Inference with Panel Data
science research. We adapt a deep neural architecture for time series
forecasting (the N-BEATS algorithm) to more accurately impute the
counterfactual evolution of a treated unit had treatment not occurred. Across a
range of settings, the resulting estimator (“SyNBEATS”) significantly
outperforms commonly employed methods (synthetic controls, two-way fixed
effects), and attains comparable or more accurate performance compared to
recently proposed methods (synthetic difference-in-differences, matrix
completion). An implementation of this estimator is available for public use.
Our results highlight how advances in the forecasting literature can be
harnessed to improve causal inference in panel data settings.
arXiv link: http://arxiv.org/abs/2208.03489v3
Partial Identification of Personalized Treatment Response with Trial-reported Analyses of Binary Subgroups
the usefulness of published trial findings. Medical decision makers commonly
observe many patient covariates and seek to use this information to personalize
treatment choices. Yet standard summaries of trial findings only partition
subjects into broad subgroups, typically into binary categories. Given this
reporting practice, we study the problem of inference on long mean treatment
outcomes E[y(t)|x], where t is a treatment, y(t) is a treatment outcome, and
the covariate vector x has length K, each component being a binary variable.
The available data are estimates of {E[y(t)|xk = 0], E[y(t)|xk = 1], P(xk)}, k
= 1, . . . , K reported in journal articles. We show that reported trial
findings partially identify {E[y(t)|x], P(x)}. Illustrative computations
demonstrate that the summaries of trial findings in journal articles may imply
only wide bounds on long mean outcomes. One can realistically tighten
inferences if one can combine reported trial findings with credible assumptions
having identifying power, such as bounded-variation assumptions.
arXiv link: http://arxiv.org/abs/2208.03381v2
Factor Network Autoregressions
complex network structures. The coefficients of the model reflect many
different types of connections between economic agents ("multilayer network"),
which are summarized into a smaller number of network matrices ("network
factors") through a novel tensor-based principal component approach. We provide
consistency and asymptotic normality results for the estimation of the factors,
their loadings, and the coefficients of the FNAR, as the number of layers,
nodes and time points diverges to infinity. Our approach combines two different
dimension-reduction techniques and can be applied to high-dimensional datasets.
Simulation results show the goodness of our estimators in finite samples. In an
empirical application, we use the FNAR to investigate the cross-country
interdependence of GDP growth rates based on a variety of international trade
and financial linkages. The model provides a rich characterization of
macroeconomic network effects as well as good forecasts of GDP growth rates.
arXiv link: http://arxiv.org/abs/2208.02925v7
Weak convergence to derivatives of fractional Brownian motion
fractional process with fractional parameter $d$ converges weakly to fractional
Brownian motion for $d>1/2$. We show that, for any non-negative integer $M$,
derivatives of order $m=0,1,\dots,M$ of the normalized fractional process with
respect to the fractional parameter $d$, jointly converge weakly to the
corresponding derivatives of fractional Brownian motion. As an illustration we
apply the results to the asymptotic distribution of the score vectors in the
multifractional vector autoregressive model.
arXiv link: http://arxiv.org/abs/2208.02516v2
Difference-in-Differences with a Misclassified Treatment
effect on the treated (ATT) in difference-in-difference (DID) designs when the
variable that classifies individuals into treatment and control groups
(treatment status, D) is endogenously misclassified. We show that
misclassification in D hampers consistent estimation of ATT because 1) it
restricts us from identifying the truly treated from those misclassified as
being treated and 2) differential misclassification in counterfactual trends
may result in parallel trends being violated with D even when they hold with
the true but unobserved D*. We propose a solution to correct for endogenous
one-sided misclassification in the context of a parametric DID regression which
allows for considerable heterogeneity in treatment effects and establish its
asymptotic properties in panel and repeated cross section settings.
Furthermore, we illustrate the method by using it to estimate the insurance
impact of a large-scale in-kind food transfer program in India which is known
to suffer from large targeting errors.
arXiv link: http://arxiv.org/abs/2208.02412v1
The Econometrics of Financial Duration Modeling
models, where events are observed over a given time span, such as a trading
day, or a week. For the classical autoregressive conditional duration (ACD)
models by Engle and Russell (1998, Econometrica 66, 1127-1162), we show that
the large sample behavior of likelihood estimators is highly sensitive to the
tail behavior of the financial durations. In particular, even under
stationarity, asymptotic normality breaks down for tail indices smaller than
one or, equivalently, when the clustering behaviour of the observed events is
such that the unconditional distribution of the durations has no finite mean.
Instead, we find that estimators are mixed Gaussian and have non-standard rates
of convergence. The results are based on exploiting the crucial fact that for
duration data the number of observations within any given time span is random.
Our results apply to general econometric models where the number of observed
events is random.
arXiv link: http://arxiv.org/abs/2208.02098v3
Bayesian ranking and selection with applications to field studies, economic mobility, and forecasting
assemble a team of political forecasters, we might begin by narrowing our
choice set to the candidates we are confident rank among the top 10% in
forecasting ability. Unfortunately, we do not know each candidate's true
ability but observe a noisy estimate of it. This paper develops new Bayesian
algorithms to rank and select candidates based on noisy estimates. Using
simulations based on empirical data, we show that our algorithms often
outperform frequentist ranking and selection algorithms. Our Bayesian ranking
algorithms yield shorter rank confidence intervals while maintaining
approximately correct coverage. Our Bayesian selection algorithms select more
candidates while maintaining correct error rates. We apply our ranking and
selection procedures to field experiments, economic mobility, forecasting, and
similar problems. Finally, we implement our ranking and selection techniques in
a user-friendly Python package documented here:
https://dsbowen-conditional-inference.readthedocs.io/en/latest/.
arXiv link: http://arxiv.org/abs/2208.02038v1
Bootstrap inference in the presence of bias
biased. We show that, even when the bias term cannot be consistently estimated,
valid inference can be obtained by proper implementations of the bootstrap.
Specifically, we show that the prepivoting approach of Beran (1987, 1988),
originally proposed to deliver higher-order refinements, restores bootstrap
validity by transforming the original bootstrap p-value into an asymptotically
uniform random variable. We propose two different implementations of
prepivoting (plug-in and double bootstrap), and provide general high-level
conditions that imply validity of bootstrap inference. To illustrate the
practical relevance and implementation of our results, we discuss five
examples: (i) inference on a target parameter based on model averaging; (ii)
ridge-type regularized estimators; (iii) nonparametric regression; (iv) a
location model for infinite variance data; and (v) dynamic panel data models.
arXiv link: http://arxiv.org/abs/2208.02028v3
Weak Instruments, First-Stage Heteroskedasticity, the Robust F-Test and a GMM Estimator with the Weight Matrix Based on First-Stage Residuals
F-statistic in the Monte Carlo analysis of Andrews (2018), who found in a
heteroskedastic grouped-data design that even for very large values of the
robust F-statistic, the standard 2SLS confidence intervals had large coverage
distortions. This finding appears to discredit the robust F-statistic as a test
for underidentification. However, it is shown here that large values of the
robust F-statistic do imply that there is first-stage information, but this may
not be utilized well by the 2SLS estimator, or the standard GMM estimator. An
estimator that corrects for this is a robust GMM estimator, denoted GMMf, with
the robust weight matrix not based on the structural residuals, but on the
first-stage residuals. For the grouped-data setting of Andrews (2018), this
GMMf estimator gives the weights to the group specific estimators according to
the group specific concentration parameters in the same way as 2SLS does under
homoskedasticity, which is formally shown using weak instrument asymptotics.
The GMMf estimator is much better behaved than the 2SLS estimator in the
Andrews (2018) design, behaving well in terms of relative bias and Wald-test
size distortion at more standard values of the robust F-statistic. We show that
the same patterns can occur in a dynamic panel data model when the error
variance is heteroskedastic over time. We further derive the conditions under
which the Stock and Yogo (2005) weak instruments critical values apply to the
robust F-statistic in relation to the behaviour of the GMMf estimator.
arXiv link: http://arxiv.org/abs/2208.01967v1
Multifractal cross-correlations of bitcoin and ether trading characteristics in the post-COVID-19 time
has seldom been a subject of systematic study. In order to fill this gap, we
analyse detrended correlations of the price returns, the average number of
trades in time unit, and the traded volume based on high-frequency data
representing two major cryptocurrencies: bitcoin and ether. We apply the
multifractal detrended cross-correlation analysis, which is considered the most
reliable method for identifying nonlinear correlations in time series. We find
that all the quantities considered in our study show an unambiguous
multifractal structure from both the univariate (auto-correlation) and
bivariate (cross-correlation) perspectives. We looked at the bitcoin--ether
cross-correlations in simultaneously recorded signals, as well as in
time-lagged signals, in which a time series for one of the cryptocurrencies is
shifted with respect to the other. Such a shift suppresses the
cross-correlations partially for short time scales, but does not remove them
completely. We did not observe any qualitative asymmetry in the results for the
two choices of a leading asset. The cross-correlations for the simultaneous and
lagged time series became the same in magnitude for the sufficiently long
scales.
arXiv link: http://arxiv.org/abs/2208.01445v1
Doubly Robust Estimation of Local Average Treatment Effects Using Inverse Probability Weighted Regression Adjustment
(LATE) and the local average treatment effect on the treated (LATT) when
control variables are available, either to render the instrumental variable
(IV) suitably exogenous or to improve precision. Unlike previous approaches,
our doubly robust (DR) estimation procedures use quasi-likelihood methods
weighted by the inverse of the IV propensity score - so-called inverse
probability weighted regression adjustment (IPWRA) estimators. By properly
choosing models for the propensity score and outcome models, fitted values are
ensured to be in the logical range determined by the response variable,
producing DR estimators of LATE and LATT with appealing small sample
properties. Inference is relatively straightforward both analytically and using
the nonparametric bootstrap. Our DR LATE and DR LATT estimators work well in
simulations. We also propose a DR version of the Hausman test that can be used
to assess the unconfoundedness assumption through a comparison of different
estimates of the average treatment effect on the treated (ATT) under one-sided
noncompliance. Unlike the usual test that compares OLS and IV estimates, this
procedure is robust to treatment effect heterogeneity.
arXiv link: http://arxiv.org/abs/2208.01300v2
A penalized two-pass regression to predict stock returns with time-varying risk premia
The penalization in the first pass enforces sparsity for the time-variation
drivers while also maintaining compatibility with the no-arbitrage restrictions
by regularizing appropriate groups of coefficients. The second pass delivers
risk premia estimates to predict equity excess returns. Our Monte Carlo results
and our empirical results on a large cross-sectional data set of US individual
stocks show that penalization without grouping can yield to nearly all
estimated time-varying models violating the no-arbitrage restrictions.
Moreover, our results demonstrate that the proposed method reduces the
prediction errors compared to a penalized approach without appropriate grouping
or a time-invariant factor model.
arXiv link: http://arxiv.org/abs/2208.00972v1
The Effect of Omitted Variables on the Sign of Regression Coefficients
it can be substantially easier for omitted variables to flip coefficient signs
than to drive them to zero. This behavior occurs with "Oster's delta" (Oster
2019), a widely reported robustness measure. Consequently, any time this
measure is large -- suggesting that omitted variables may be unimportant -- a
much smaller value reverses the sign of the parameter of interest. We propose a
modified measure of robustness to address this concern. We illustrate our
results in four empirical applications and two meta-analyses. We implement our
methods in the companion Stata module regsensitivity.
arXiv link: http://arxiv.org/abs/2208.00552v4
Interpreting and predicting the economy flows: A time-varying parameter global vector autoregressive integrated the machine learning model
(TVP-GVAR) framework for predicting and analysing developed region economic
variables. We want to provide an easily accessible approach for the economy
application settings, where a variety of machine learning models can be
incorporated for out-of-sample prediction. The LASSO-type technique for
numerically efficient model selection of mean squared errors (MSEs) is
selected. We show the convincing in-sample performance of our proposed model in
all economic variables and relatively high precision out-of-sample predictions
with different-frequency economic inputs. Furthermore, the time-varying
orthogonal impulse responses provide novel insights into the connectedness of
economic variables at critical time points across developed regions. We also
derive the corresponding asymptotic bands (the confidence intervals) for
orthogonal impulse responses function under standard assumptions.
arXiv link: http://arxiv.org/abs/2209.05998v1
Compact representations of structured BFGS matrices
in which recursive quasi-Newton update formulas are represented as compact
matrix factorizations. For problems in which the objective function contains
additional structure, so-called structured quasi-Newton methods exploit
available second-derivative information and approximate unavailable second
derivatives. This article develops the compact representations of two
structured Broyden-Fletcher-Goldfarb-Shanno update formulas. The compact
representations enable efficient limited memory and initialization strategies.
Two limited memory line search algorithms are described and tested on a
collection of problems, including a real world large scale imaging application.
arXiv link: http://arxiv.org/abs/2208.00057v1
Tangential Wasserstein Projections
the geometric properties of the 2-Wasserstein space. It is designed for general
multivariate probability measures, is computationally efficient to implement,
and provides a unique solution in regular settings. The idea is to work on
regular tangent cones of the Wasserstein space using generalized geodesics. Its
structure and computational properties make the method applicable in a variety
of settings, from causal inference to the analysis of object data. An
application to estimating causal effects yields a generalization of the notion
of synthetic controls to multivariate data with individual-level heterogeneity,
as well as a way to estimate optimal weights jointly over all time periods.
arXiv link: http://arxiv.org/abs/2207.14727v2
Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data
policy. One dominant approach is through panel data analysis in which the
behaviors of multiple units are observed over time. The information across time
and space motivates two general approaches: (i) horizontal regression (i.e.,
unconfoundedness), which exploits time series patterns, and (ii) vertical
regression (e.g., synthetic controls), which exploits cross-sectional patterns.
Conventional wisdom states that the two approaches are fundamentally different.
We establish this position to be partly false for estimation but generally true
for inference. In particular, we prove that both approaches yield identical
point estimates under several standard settings. For the same point estimate,
however, each approach quantifies uncertainty with respect to a distinct
estimand. In turn, the confidence interval developed for one estimand may have
incorrect coverage for another. This emphasizes that the source of randomness
that researchers assume has direct implications for the accuracy of inference.
arXiv link: http://arxiv.org/abs/2207.14481v2
Stable Matching with Mistaken Agents
environments, we propose a solution concept -- robust equilibrium -- that
requires only an asymptotically optimal behavior. We use it to study large
random matching markets operated by the applicant-proposing Deferred Acceptance
(DA). Although truth-telling is a dominant strategy, almost all applicants may
be non-truthful in robust equilibrium; however, the outcome must be arbitrarily
close to the stable matching. Our results imply that one can assume truthful
agents to study DA outcomes, theoretically or counterfactually. However, to
estimate the preferences of mistaken agents, one should assume stable matching
but not truth-telling.
arXiv link: http://arxiv.org/abs/2207.13939v4
Identification and Inference with Min-over-max Estimators for the Measurement of Labor Market Fairness
Although the metric is a complex statistic involving min and max computations,
we propose a smooth approximation of those functions and derive its asymptotic
distribution. The limit of these approximations and their gradients converge to
those of the true max and min functions, wherever they exist. More importantly,
when the true max and min functions are not differentiable, the approximations
still are, and they provide valid asymptotic inference everywhere in the
domain. We conclude with some directions on how to compute confidence intervals
for DP, how to test if it is under 0.8 (the U.S. Equal Employment Opportunity
Commission fairness threshold), and how to do inference in an A/B test.
arXiv link: http://arxiv.org/abs/2207.13797v1
Conformal Prediction Bands for Two-Dimensional Functional Time Series
series, exploiting the tools of Functional data analysis. Leveraging this
approach, a forecasting framework for such complex data is developed. The main
focus revolves around Conformal Prediction, a versatile nonparametric paradigm
used to quantify uncertainty in prediction problems. Building upon recent
variations of Conformal Prediction for Functional time series, a probabilistic
forecasting scheme for two-dimensional functional time series is presented,
while providing an extension of Functional Autoregressive Processes of order
one to this setting. Estimation techniques for the latter process are
introduced and their performance are compared in terms of the resulting
prediction regions. Finally, the proposed forecasting procedure and the
uncertainty quantification technique are applied to a real dataset, collecting
daily observations of Sea Level Anomalies of the Black Sea
arXiv link: http://arxiv.org/abs/2207.13656v2
Differentially Private Estimation via Statistical Depth
maximum influence of an observation, which can be difficult in the absence of
exogenous bounds on the input data or the estimator, especially in high
dimensional settings. This paper shows that standard notions of statistical
depth, i.e., halfspace depth and regression depth, are particularly
advantageous in this regard, both in the sense that the maximum influence of a
single observation is easy to analyze and that this value is typically low.
This is used to motivate new approximate DP location and regression estimators
using the maximizers of these two notions of statistical depth. A more
computationally efficient variant of the approximate DP regression estimator is
also provided. Also, to avoid requiring that users specify a priori bounds on
the estimates and/or the observations, variants of these DP mechanisms are
described that satisfy random differential privacy (RDP), which is a relaxation
of differential privacy provided by Hall, Wasserman, and Rinaldo (2013). We
also provide simulations of the two DP regression methods proposed here. The
proposed estimators appear to perform favorably relative to the existing DP
regression methods we consider in these simulations when either the sample size
is at least 100-200 or the privacy-loss budget is sufficiently high.
arXiv link: http://arxiv.org/abs/2207.12602v1
Forecasting euro area inflation using a huge panel of survey expectations
an econometric model which exploits a massive number of time series on survey
expectations for the European Commission's Business and Consumer Survey. To
make estimation of such a huge model tractable, we use recent advances in
computational statistics to carry out posterior simulation and inference. Our
findings suggest that the inclusion of a wide range of firms and consumers'
opinions about future economic developments offers useful information to
forecast prices and assess tail risks to inflation. These predictive
improvements do not only arise from surveys related to expected inflation but
also from other questions related to the general economic environment. Finally,
we find that firms' expectations about the future seem to have more predictive
content than consumer expectations.
arXiv link: http://arxiv.org/abs/2207.12225v1
Sparse Bayesian State-Space and Time-Varying Parameter Models
(TVP) models for univariate and multivariate time series within a Bayesian
framework. We show how both continuous as well as discrete spike-and-slab
shrinkage priors can be transferred from variable selection for regression
models to variance selection for TVP models by using a non-centered
parametrization. We discuss efficient MCMC estimation and provide an
application to US inflation modeling.
arXiv link: http://arxiv.org/abs/2207.12147v1
Misclassification in Difference-in-differences Models
used in empirical economics research. However, there is almost no work
examining what the DID method identifies in the presence of a misclassified
treatment variable. This paper studies the identification of treatment effects
in DID designs when the treatment is misclassified. Misclassification arises in
various ways, including when the timing of a policy intervention is ambiguous
or when researchers need to infer treatment from auxiliary data. We show that
the DID estimand is biased and recovers a weighted average of the average
treatment effects on the treated (ATT) in two subpopulations -- the correctly
classified and misclassified groups. In some cases, the DID estimand may yield
the wrong sign and is otherwise attenuated. We provide bounds on the ATT when
the researcher has access to information on the extent of misclassification in
the data. We demonstrate our theoretical results using simulations and provide
two empirical applications to guide researchers in performing sensitivity
analysis using our proposed methods.
arXiv link: http://arxiv.org/abs/2207.11890v2
Detecting common bubbles in multivariate mixed causal-noncausal models
observed in individual series are common to various series. We detect the
non-linear dynamics using the recent mixed causal and noncausal models. Both a
likelihood ratio test and information criteria are investigated, the former
having better performances in our Monte Carlo simulations. Implementing our
approach on three commodity prices we do not find evidence of commonalities
although some series look very similar.
arXiv link: http://arxiv.org/abs/2207.11557v1
A Conditional Linear Combination Test with Many Weak Instruments
Lagrangian multiplier (LM), and orthogonalized jackknife LM tests for inference
in IV regressions with many weak instruments and heteroskedasticity. Following
I.Andrews (2016), we choose the weights in the linear combination based on a
decision-theoretic rule that is adaptive to the identification strength. Under
both weak and strong identifications, the proposed test controls asymptotic
size and is admissible among certain class of tests. Under strong
identification, our linear combination test has optimal power against local
alternatives among the class of invariant or unbiased tests which are
constructed based on jackknife AR and LM tests. Simulations and an empirical
application to Angrist and Krueger's (1991) dataset confirm the good power
properties of our test.
arXiv link: http://arxiv.org/abs/2207.11137v3
Time-Varying Poisson Autoregression
Time-Varying Poisson AutoRegressive with eXogenous covariates (TV-PARX), suited
to model and forecast time series of counts. {We show that the score-driven
framework is particularly suitable to recover the evolution of time-varying
parameters and provides the required flexibility to model and forecast time
series of counts characterized by convoluted nonlinear dynamics and structural
breaks.} We study the asymptotic properties of the TV-PARX model and prove
that, under mild conditions, maximum likelihood estimation (MLE) yields
strongly consistent and asymptotically normal parameter estimates.
Finite-sample performance and forecasting accuracy are evaluated through Monte
Carlo simulations. The empirical usefulness of the time-varying specification
of the proposed TV-PARX model is shown by analyzing the number of new daily
COVID-19 infections in Italy and the number of corporate defaults in the US.
arXiv link: http://arxiv.org/abs/2207.11003v1
Testing for a Threshold in Models with Endogenous Regressors
endogenous regressors - proposed in Caner and Hansen (2004) - can exhibit
severe size distortions both in small and in moderately large samples,
pertinent to empirical applications. We propose three new tests that rectify
these size distortions. The first test is based on GMM estimators. The other
two are based on unconventional 2SLS estimators, that use additional
information about the linearity (or lack of linearity) of the first stage. Just
like the test in Caner and Hansen (2004), our tests are non-pivotal, and we
prove their bootstrap validity. The empirical application revisits the question
in Ramey and Zubairy (2018) whether government spending multipliers are larger
in recessions, but using tests for an unknown threshold. Consistent with Ramey
and Zubairy (2018), we do not find strong evidence that these multipliers are
larger in recessions.
arXiv link: http://arxiv.org/abs/2207.10076v1
Efficient Bias Correction for Cross-section and Panel Data
estimators. We show that the choice of bias correction method has no effect on
the higher-order variance of semiparametrically efficient parametric
estimators, so long as the estimate of the bias is asymptotically linear. It is
also shown that bootstrap, jackknife, and analytical bias estimates are
asymptotically linear for estimators with higher-order expansions of a standard
form. In particular, we find that for a variety of estimators the
straightforward bootstrap bias correction gives the same higher-order variance
as more complicated analytical or jackknife bias corrections. In contrast, bias
corrections that do not estimate the bias at the parametric rate, such as the
split-sample jackknife, result in larger higher-order variances in the i.i.d.
setting we focus on. For both a cross-sectional MLE and a panel model with
individual fixed effects, we show that the split-sample jackknife has a
higher-order variance term that is twice as large as that of the
`leave-one-out' jackknife.
arXiv link: http://arxiv.org/abs/2207.09943v4
Asymptotic Properties of Endogeneity Corrections Using Nonlinear Transformations
which arises from a nonlinear transformation of a latent variable. It is shown
that the corresponding coefficient can be consistently estimated without
external instruments by adding a rank-based transformation of the regressor to
the model and performing standard OLS estimation. In contrast to other
approaches, our nonparametric control function approach does not rely on a
conformably specified copula. Furthermore, the approach allows for the presence
of additional exogenous regressors which may be (linearly) correlated with the
endogenous regressor(s). Consistency and asymptotic normality of the estimator
are proved and the estimator is compared with copula based approaches by means
of Monte Carlo simulations. An empirical application on wage data of the US
current population survey demonstrates the usefulness of our method.
arXiv link: http://arxiv.org/abs/2207.09246v3
The role of the geometric mean in case-control studies
expensive, outcome-dependent sampling is relevant to many modern settings where
data is readily available for a biased sample of the target population, such as
public administrative data. Under outcome-dependent sampling, common effect
measures such as the average risk difference and the average risk ratio are not
identified, but the conditional odds ratio is. Aggregation of the conditional
odds ratio is challenging since summary measures are generally not identified.
Furthermore, the marginal odds ratio can be larger (or smaller) than all
conditional odds ratios. This so-called non-collapsibility of the odds ratio is
avoidable if we use an alternative aggregation to the standard arithmetic mean.
We provide a new definition of collapsibility that makes this choice of
aggregation method explicit, and we demonstrate that the odds ratio is
collapsible under geometric aggregation. We describe how to partially identify,
estimate, and do inference on the geometric odds ratio under outcome-dependent
sampling. Our proposed estimator is based on the efficient influence function
and therefore has doubly robust-style properties.
arXiv link: http://arxiv.org/abs/2207.09016v1
Bias correction and uniform inference for the quantile density function
the quantile function), I show how to perform the boundary bias correction,
establish the rate of strong uniform consistency of the bias-corrected
estimator, and construct the confidence bands that are asymptotically exact
uniformly over the entire domain $[0,1]$. The proposed procedures rely on the
pivotality of the studentized bias-corrected estimator and known
anti-concentration properties of the Gaussian approximation for its supremum.
arXiv link: http://arxiv.org/abs/2207.09004v1
Isotonic propensity score matching
based on propensity scores estimated by isotonic regression. This approach is
predicated on the assumption of monotonicity in the propensity score function,
a condition that can be justified in many economic applications. We show that
the nature of the isotonic estimator can help us to fix many problems of
existing matching methods, including efficiency, choice of the number of
matches, choice of tuning parameters, robustness to propensity score
misspecification, and bootstrap validity. As a by-product, a uniformly
consistent isotonic estimator is developed for our proposed matching method.
arXiv link: http://arxiv.org/abs/2207.08868v3
Estimating Continuous Treatment Effects in Panel Data using Machine Learning with a Climate Application
linear two-way fixed effects models (TWFE). When the treatment-outcome
relationship is nonlinear, TWFE is misspecifed and potentially biased for the
average partial derivative (APD). We develop an automatic double/de-biased
machine learning (ADML) estimator that is consistent for the population APD
while allowing additive unit fixed effects, nonlinearities, and high
dimensional heterogeneity. We prove asymptotic normality and add two
refinements - optimization based de-biasing and analytic derivatives - that
reduce bias and remove numerical approximation error. Simulations show that the
proposed method outperforms high order polynomial OLS and standard ML
estimators. Our estimator leads to significantly larger (by 50%), but equally
precise, estimates of the effect of extreme heat on corn yield compared to
standard linear models.
arXiv link: http://arxiv.org/abs/2207.08789v3
Testing for explosive bubbles: a review
series. A large number of recently developed testing methods under various
assumptions about innovation of errors are covered. The review also considers
the methods for dating explosive (bubble) regimes. Special attention is devoted
to time-varying volatility in the errors. Moreover, the modelling of possible
relationships between time series with explosive regimes is discussed.
arXiv link: http://arxiv.org/abs/2207.08249v1
Simultaneity in Binary Outcome Models with an Application to Employment for Couples
introduce a simultaneous logit model for bivariate binary outcomes and to study
estimation of dynamic linear fixed effects panel data models using short
panels. In this paper, we study a dynamic panel data version of the bivariate
model introduced in Schmidt and Strauss (1975) that allows for lagged dependent
variables and fixed effects as in Ahn and Schmidt (1995). We combine a
conditional likelihood approach with a method of moments approach to obtain an
estimation strategy for the resulting model. We apply this estimation strategy
to a simple model for the intra-household relationship in employment. Our main
conclusion is that the within-household dependence in employment differs
significantly by the ethnicity composition of the couple even after one allows
for unobserved household specific heterogeneity.
arXiv link: http://arxiv.org/abs/2207.07343v2
Flexible global forecast combinations
experts or models -- is a proven approach to economic forecasting. To date,
research on economic forecasting has concentrated on local combination methods,
which handle separate but related forecasting tasks in isolation. Yet, it has
been known for over two decades in the machine learning community that global
methods, which exploit task-relatedness, can improve on local methods that
ignore it. Motivated by the possibility for improvement, this paper introduces
a framework for globally combining forecasts while being flexible to the level
of task-relatedness. Through our framework, we develop global versions of
several existing forecast combinations. To evaluate the efficacy of these new
global forecast combinations, we conduct extensive comparisons using synthetic
and real data. Our real data comparisons, which involve forecasts of core
economic indicators in the Eurozone, provide empirical evidence that the
accuracy of global combinations of economic forecasts can surpass local
combinations.
arXiv link: http://arxiv.org/abs/2207.07318v3
High Dimensional Generalised Penalised Least Squares
serially correlated errors. We examine Lasso under the assumption of strong
mixing in the covariates and error process, allowing for fatter tails in their
distribution. While the Lasso estimator performs poorly under such
circumstances, we estimate via GLS Lasso the parameters of interest and extend
the asymptotic properties of the Lasso under more general conditions. Our
theoretical results indicate that the non-asymptotic bounds for stationary
dependent processes are sharper, while the rate of Lasso under general
conditions appears slower as $T,p\to \infty$. Further we employ the debiased
Lasso to perform inference uniformly on the parameters of interest. Monte Carlo
results support the proposed estimator, as it has significant efficiency gains
over traditional methods.
arXiv link: http://arxiv.org/abs/2207.07055v4
Parallel Trends and Dynamic Choices
effects, and the parallel trends condition is its main identifying assumption:
the trend in mean untreated outcomes is independent of the observed treatment
status. In observational settings, treatment is often a dynamic choice made or
influenced by rational actors, such as policy-makers, firms, or individual
agents. This paper relates parallel trends to economic models of dynamic
choice. We clarify the implications of parallel trends on agent behavior and
study when dynamic selection motives lead to violations of parallel trends.
Finally, we consider identification under alternative assumptions that
accommodate features of dynamic choice.
arXiv link: http://arxiv.org/abs/2207.06564v3
Parametric quantile regression for income data
many areas of economics. Nevertheless, income data have asymmetric behavior and
are best modeled by non-normal distributions. The modeling of income plays an
important role in determining workers' earnings, as well as being an important
research topic in labor economics. Thus, the objective of this work is to
propose parametric quantile regression models based on two important asymmetric
income distributions, namely, Dagum and Singh-Maddala distributions. The
proposed quantile models are based on reparameterizations of the original
distributions by inserting a quantile parameter. We present the
reparameterizations, some properties of the distributions, and the quantile
regression models with their inferential aspects. We proceed with Monte Carlo
simulation studies, considering the maximum likelihood estimation performance
evaluation and an analysis of the empirical distribution of two residuals. The
Monte Carlo results show that both models meet the expected outcomes. We apply
the proposed quantile regression models to a household income data set provided
by the National Institute of Statistics of Chile. We showed that both proposed
models had a good performance both in terms of model fitting. Thus, we conclude
that results were favorable to the use of Singh-Maddala and Dagum quantile
regression models for positive asymmetric data, such as income data.
arXiv link: http://arxiv.org/abs/2207.06558v1
Two-stage differences in differences
and average treatment effects vary across groups and over time,
difference-in-differences regression does not identify an easily interpretable
measure of the typical effect of the treatment. In this paper, I extend this
literature in two ways. First, I provide some simple underlying intuition for
why difference-in-differences regression does not identify a
group$\times$period average treatment effect. Second, I propose an alternative
two-stage estimation framework, motivated by this intuition. In this framework,
group and period effects are identified in a first stage from the sample of
untreated observations, and average treatment effects are identified in a
second stage by comparing treated and untreated outcomes, after removing these
group and period effects. The two-stage approach is robust to treatment-effect
heterogeneity under staggered adoption, and can be used to identify a host of
different average treatment effect measures. It is also simple, intuitive, and
easy to implement. I establish the theoretical properties of the two-stage
approach and demonstrate its effectiveness and applicability using Monte-Carlo
evidence and an example from the literature.
arXiv link: http://arxiv.org/abs/2207.05943v1
Detecting Grouped Local Average Treatment Effects and Selecting True Instruments
instruments, we propose a two-step procedure for identifying complier groups
with identical local average treatment effects (LATE) despite relying on
distinct instruments, even if several instruments violate the identifying
assumptions. We use the fact that the LATE is homogeneous for instruments which
(i) satisfy the LATE assumptions (instrument validity and treatment
monotonicity in the instrument) and (ii) generate identical complier groups in
terms of treatment propensities given the respective instruments. We propose a
two-step procedure, where we first cluster the propensity scores in the first
step and find groups of IVs with the same reduced form parameters in the second
step. Under the plurality assumption that within each set of instruments with
identical treatment propensities, instruments truly satisfying the LATE
assumptions are the largest group, our procedure permits identifying these true
instruments in a data driven way. We show that our procedure is consistent and
provides consistent and asymptotically normal estimators of underlying LATEs.
We also provide a simulation study investigating the finite sample properties
of our approach and an empirical application investigating the effect of
incarceration on recidivism in the US with judge assignments serving as
instruments.
arXiv link: http://arxiv.org/abs/2207.04481v2
Identification and Inference for Welfare Gains without Unconfoundedness
results from switching from one policy (such as the status quo policy) to
another policy. The welfare gain is not point identified in general when data
are obtained from an observational study or a randomized experiment with
imperfect compliance. I characterize the sharp identified region of the welfare
gain and obtain bounds under various assumptions on the unobservables with and
without instrumental variables. Estimation and inference of the lower and upper
bounds are conducted using orthogonalized moment conditions to deal with the
presence of infinite-dimensional nuisance parameters. I illustrate the analysis
by considering hypothetical policies of assigning individuals to job training
programs using experimental data from the National Job Training Partnership Act
Study. Monte Carlo simulations are conducted to assess the finite sample
performance of the estimators.
arXiv link: http://arxiv.org/abs/2207.04314v1
Model diagnostics of discrete data regression: a unifying framework using functional residuals
it is not well addressed in standard textbooks on generalized linear models.
The lack of exposition is attributed to the fact that when outcome data are
discrete, classical methods (e.g., Pearson/deviance residual analysis and
goodness-of-fit tests) have limited utility in model diagnostics and treatment.
This paper establishes a novel framework for model diagnostics of discrete data
regression. Unlike the literature defining a single-valued quantity as the
residual, we propose to use a function as a vehicle to retain the residual
information. In the presence of discreteness, we show that such a functional
residual is appropriate for summarizing the residual randomness that cannot be
captured by the structural part of the model. We establish its theoretical
properties, which leads to the innovation of new diagnostic tools including the
functional-residual-vs covariate plot and Function-to-Function (Fn-Fn) plot.
Our numerical studies demonstrate that the use of these tools can reveal a
variety of model misspecifications, such as not properly including a
higher-order term, an explanatory variable, an interaction effect, a dispersion
parameter, or a zero-inflation component. The functional residual yields, as a
byproduct, Liu-Zhang's surrogate residual mainly developed for cumulative link
models for ordinal data (Liu and Zhang, 2018, JASA). As a general notion, it
considerably broadens the diagnostic scope as it applies to virtually all
parametric models for binary, ordinal and count data, all in a unified
diagnostic scheme.
arXiv link: http://arxiv.org/abs/2207.04299v1
Spatial Econometrics for Misaligned Data
of the independent and dependent variables do not coincide, in which case we
speak of misaligned data. We develop and investigate two complementary methods
for regression analysis with misaligned data that circumvent the need to
estimate or specify the covariance of the regression errors. We carry out a
detailed reanalysis of Maccini and Yang (2009) and find economically
significant quantitative differences but sustain most qualitative conclusions.
arXiv link: http://arxiv.org/abs/2207.04082v1
Large Bayesian VARs with Factor Stochastic Volatility: Identification, Order Invariance and Structural Analysis
widely used for structural analysis. Often the structural model identified
through economically meaningful restrictions--e.g., sign restrictions--is
supposed to be independent of how the dependent variables are ordered. But
since the reduced-form model is not order invariant, results from the
structural analysis depend on the order of the variables. We consider a VAR
based on the factor stochastic volatility that is constructed to be order
invariant. We show that the presence of multivariate stochastic volatility
allows for statistical identification of the model. We further prove that, with
a suitable set of sign restrictions, the corresponding structural model is
point-identified. An additional appeal of the proposed approach is that it can
easily handle a large number of dependent variables as well as sign
restrictions. We demonstrate the methodology through a structural analysis in
which we use a 20-variable VAR with sign restrictions to identify 5 structural
shocks.
arXiv link: http://arxiv.org/abs/2207.03988v1
On the instrumental variable estimation with many weak and invalid instruments
variable (IV) models with unknown IV validity. With the assumption of the
"sparsest rule", which is equivalent to the plurality rule but becomes
operational in computation algorithms, we investigate and prove the advantages
of non-convex penalized approaches over other IV estimators based on two-step
selections, in terms of selection consistency and accommodation for
individually weak IVs. Furthermore, we propose a surrogate sparsest penalty
that aligns with the identification condition and provides oracle sparse
structure simultaneously. Desirable theoretical properties are derived for the
proposed estimator with weaker IV strength conditions compared to the previous
literature. Finite sample properties are demonstrated using simulations and the
selection and estimation method is applied to an empirical study concerning the
effect of BMI on diastolic blood pressure.
arXiv link: http://arxiv.org/abs/2207.03035v2
Degrees of Freedom and Information Criteria for the Synthetic Control Method
synthetic control method (SCM) in the familiar form of degrees of freedom. We
obtain estimable information criteria. These may be used to circumvent
cross-validation when selecting either the weighting matrix in the SCM with
covariates, or the tuning parameter in model averaging or penalized variants of
SCM. We assess the impact of car license rationing in Tianjin and make a novel
use of SCM; while a natural match is available, it and other donors are noisy,
inviting the use of SCM to average over approximately matching donors. The very
large number of candidate donors calls for model averaging or penalized
variants of SCM and, with short pre-treatment series, model selection per
information criteria outperforms that per cross-validation.
arXiv link: http://arxiv.org/abs/2207.02943v1
csa2sls: A complete subset approach for many instruments using Stata
subset averaging two-stage least squares (CSA2SLS) estimator in Lee and Shin
(2021). The CSA2SLS estimator is an alternative to the two-stage least squares
estimator that remedies the bias issue caused by many correlated instruments.
We conduct Monte Carlo simulations and confirm that the CSA2SLS estimator
reduces both the mean squared error and the estimation bias substantially when
instruments are correlated. We illustrate the usage of $csa2sls$ in
Stata by an empirical application.
arXiv link: http://arxiv.org/abs/2207.01533v2
A Comparison of Methods for Adaptive Experimentation
experimentation: Thompson sampling, Tempered Thompson sampling, and Exploration
sampling. We gauge the performance of each in terms of social welfare and
estimation accuracy, and as a function of the number of experimental waves. We
further construct a set of novel "hybrid" loss measures to identify which
methods are optimal for researchers pursuing a combination of experimental
aims. Our main results are: 1) the relative performance of Thompson sampling
depends on the number of experimental waves, 2) Tempered Thompson sampling
uniquely distributes losses across multiple experimental aims, and 3) in most
cases, Exploration sampling performs similarly to random assignment.
arXiv link: http://arxiv.org/abs/2207.00683v1
Valid and Unobtrusive Measurement of Returns to Advertising through Asymmetric Budget Split
increase in performance (such as clicks or conversions) can an advertiser
expect in return for additional budget on the platform? Even from the
perspective of the platform, accurately measuring advertising returns is hard.
Selection and omitted variable biases make estimates from observational methods
unreliable, and straightforward experimentation is often costly or infeasible.
We introduce Asymmetric Budget Split, a novel methodology for valid measurement
of ad returns from the perspective of the platform. Asymmetric budget split
creates small asymmetries in ad budget allocation across comparable partitions
of the platform's userbase. By observing performance of the same ad at
different budget levels while holding all other factors constant, the platform
can obtain a valid measure of ad returns. The methodology is unobtrusive and
cost-effective in that it does not require holdout groups or sacrifices in ad
or marketplace performance. We discuss a successful deployment of asymmetric
budget split to LinkedIn's Jobs Marketplace, an ad marketplace where it is used
to measure returns from promotion budgets in terms of incremental job
applicants. We outline operational considerations for practitioners and discuss
further use cases such as budget-aware performance forecasting.
arXiv link: http://arxiv.org/abs/2207.00206v1
Unique futures in China: studys on volatility spillover effects of ferrous metal futures
characteristics. Due to the late listing time, it has received less attention
from scholars. Our research focuses on the volatility spillover effects,
defined as the intensity of price volatility in financial instruments. We use
DCC-GARCH, BEKK-GARCH, and DY(2012) index methods to conduct empirical tests on
the volatility spillover effects of the Chinese ferrous metal futures market
and other parts of the Chinese commodity futures market, as well as industries
related to the steel industry chain in stock markets. It can be seen that there
is a close volatility spillover relationship between ferrous metal futures and
nonferrous metal futures. Energy futures and chemical futures have a
significant transmission effect on the fluctuations of ferrous metals. In
addition, ferrous metal futures have a significant spillover effect on the
stock index of the steel industry, real estate industry, building materials
industry, machinery equipment industry, and household appliance industry.
Studying the volatility spillover effect of the ferrous metal futures market
can reveal the operating laws of this field and provide ideas and theoretical
references for investors to hedge their risks. It shows that the ferrous metal
futures market has an essential role as a "barometer" for the Chinese commodity
futures market and the stock market.
arXiv link: http://arxiv.org/abs/2206.15039v1
Dynamic CoVaR Modeling and Estimation
variants are widely used in economics and finance. In this article, we propose
joint dynamic forecasting models for the Value-at-Risk (VaR) and CoVaR. The
CoVaR version we consider is defined as a large quantile of one variable (e.g.,
losses in the financial system) conditional on some other variable (e.g.,
losses in a bank's shares) being in distress. We introduce a two-step
M-estimator for the model parameters drawing on recently proposed bivariate
scoring functions for the pair (VaR, CoVaR). We prove consistency and
asymptotic normality of our parameter estimator and analyze its finite-sample
properties in simulations. Finally, we apply a specific subclass of our dynamic
forecasting models, which we call CoCAViaR models, to log-returns of large US
banks. A formal forecast comparison shows that our CoCAViaR models generate
CoVaR predictions which are superior to forecasts issued from current benchmark
models.
arXiv link: http://arxiv.org/abs/2206.14275v4
Business Cycle Synchronization in the EU: A Regional-Sectoral Look through Soft-Clustering and Wavelet Decomposition
synchronization in the EU -- a necessary condition for the optimal currency
area. We argue that complete and tidy clustering of the data improves the
decision maker's understanding of the business cycle and, by extension, the
quality of economic decisions. We define the business cycles by applying a
wavelet approach to drift-adjusted gross value added data spanning over 2000Q1
to 2021Q2. For the application of the synchronization analysis, we propose the
novel soft-clustering approach, which adjusts hierarchical clustering in
several aspects. First, the method relies on synchronicity dissimilarity
measures, noting that, for time series data, the feature space is the set of
all points in time. Then, the “soft” part of the approach strengthens the
synchronization signal by using silhouette measures. Finally, we add a
probabilistic sparsity algorithm to drop out the most asynchronous “noisy”
data improving the silhouette scores of the most and less synchronous groups.
The method, hence, splits the sectoral-regional data into three groups: the
synchronous group that shapes the EU business cycle; the less synchronous group
that may hint at cycle forecasting relevant information; the asynchronous group
that may help investors to diversify through-the-cycle risks of the investment
portfolios. The results support the core-periphery hypothesis.
arXiv link: http://arxiv.org/abs/2206.14128v1
Estimating the Currency Composition of Foreign Exchange Reserves
influencing global exchange rates and asset prices. However, some of the
largest holders of reserves report minimal information about their currency
composition, hindering empirical analysis. I describe a Hidden Markov Model to
estimate the composition of a central bank's reserves by relating the
fluctuation in the portfolio's valuation to the exchange rates of major reserve
currencies. I apply the model to China and Singapore, two countries that
collectively hold about $3.4 trillion in reserves and conceal their
composition. I find that both China's reserve composition likely resembles the
global average, while Singapore probably holds fewer US dollars.
arXiv link: http://arxiv.org/abs/2206.13751v4
Misspecification and Weak Identification in Asset Pricing
asset pricing has led to an overstated performance of risk factors. Because the
conventional Fama and MacBeth (1973) methodology is jeopardized by
misspecification and weak identification, we infer risk premia by using a
double robust Lagrange multiplier test that remains reliable in the presence of
these two empirically relevant issues. Moreover, we show how the
identification, and the resulting appropriate interpretation, of the risk
premia is governed by the relative magnitudes of the misspecification
J-statistic and the identification IS-statistic. We revisit several prominent
empirical applications and all specifications with one to six factors from the
factor zoo of Feng, Giglio, and Xiu (2020) to emphasize the widespread
occurrence of misspecification and weak identification.
arXiv link: http://arxiv.org/abs/2206.13600v1
Instrumented Common Confounding
introduce the instrumented common confounding (ICC) approach to
(nonparametrically) identify causal effects with instruments, which are
exogenous only conditional on some unobserved common confounders. The ICC
approach is most useful in rich observational data with multiple sources of
unobserved confounding, where instruments are at most exogenous conditional on
some unobserved common confounders. Suitable examples of this setting are
various identification problems in the social sciences, nonlinear dynamic
panels, and problems with multiple endogenous confounders. The ICC identifying
assumptions are closely related to those in mixture models, negative control
and IV. Compared to mixture models [Bonhomme et al., 2016], we require less
conditionally independent variables and do not need to model the unobserved
confounder. Compared to negative control [Cui et al., 2020], we allow for
non-common confounders, with respect to which the instruments are exogenous.
Compared to IV [Newey and Powell, 2003], we allow instruments to be exogenous
conditional on some unobserved common confounders, for which a set of relevant
observed variables exists. We prove point identification with outcome model and
alternatively first stage restrictions. We provide a practical step-by-step
guide to the ICC model assumptions and present the causal effect of education
on income as a motivating example.
arXiv link: http://arxiv.org/abs/2206.12919v2
Estimation and Inference in High-Dimensional Panel Data Models with Interactive Fixed Effects
high-dimensional panel data models with interactive fixed effects. Our approach
can be regarded as a non-trivial extension of the very popular common
correlated effects (CCE) approach. Roughly speaking, we proceed as follows: We
first construct a projection device to eliminate the unobserved factors from
the model by applying a dimensionality reduction transform to the matrix of
cross-sectionally averaged covariates. The unknown parameters are then
estimated by applying lasso techniques to the projected model. For inference
purposes, we derive a desparsified version of our lasso-type estimator. While
the original CCE approach is restricted to the low-dimensional case where the
number of regressors is small and fixed, our methods can deal with both low-
and high-dimensional situations where the number of regressors is large and may
even exceed the overall sample size. We derive theory for our estimation and
inference methods both in the large-T-case, where the time series length T
tends to infinity, and in the small-T-case, where T is a fixed natural number.
Specifically, we derive the convergence rate of our estimator and show that its
desparsified version is asymptotically normal under suitable regularity
conditions. The theoretical analysis of the paper is complemented by a
simulation study and an empirical application to characteristic based asset
pricing.
arXiv link: http://arxiv.org/abs/2206.12152v3
Assessing and Comparing Fixed-Target Forecasts of Arctic Sea Ice: Glide Charts for Feature-Engineered Linear Regression and Machine Learning Models
errors as the target date is approached) to evaluate and compare fixed-target
forecasts of Arctic sea ice. We first use them to evaluate the simple
feature-engineered linear regression (FELR) forecasts of Diebold and Goebel
(2021), and to compare FELR forecasts to naive pure-trend benchmark forecasts.
Then we introduce a much more sophisticated feature-engineered machine learning
(FEML) model, and we use glide charts to evaluate FEML forecasts and compare
them to a FELR benchmark. Our substantive results include the frequent
appearance of predictability thresholds, which differ across months, meaning
that accuracy initially fails to improve as the target date is approached but
then increases progressively once a threshold lead time is crossed. Also, we
find that FEML can improve appreciably over FELR when forecasting "turning
point" months in the annual cycle at horizons of one to three months ahead.
arXiv link: http://arxiv.org/abs/2206.10721v2
New possibilities in identification of binary choice models with fixed effects
propose a condition called sign saturation and show that this condition is
sufficient for identifying the model. In particular, this condition can
guarantee identification even when all the regressors are bounded, including
multiple discrete regressors. We also establish that without this condition,
the model is not identified unless the error distribution belongs to a special
class. Moreover, we show that sign saturation is also essential for identifying
the sign of treatment effects. Finally, we introduce a measure for sign
saturation and develop tools for its estimation and inference.
arXiv link: http://arxiv.org/abs/2206.10475v7
Symmetric generalized Heckman models
correlated with a latent variable, and involves situations in which the
response variable had part of its observations censored. Heckman (1976)
proposed a sample selection model based on the bivariate normal distribution
that fits both the variable of interest and the latent variable. Recently, this
assumption of normality has been relaxed by more flexible models such as the
Student-t distribution (Marchenko and Genton, 2012; Lachos et al., 2021). The
aim of this work is to propose generalized Heckman sample selection models
based on symmetric distributions (Fang et al., 1990). This is a new class of
sample selection models, in which variables are added to the dispersion and
correlation parameters. A Monte Carlo simulation study is performed to assess
the behavior of the parameter estimation method. Two real data sets are
analyzed to illustrate the proposed approach.
arXiv link: http://arxiv.org/abs/2206.10054v1
Policy Learning under Endogeneity Using Instrumental Variables
intervention policies in observational data settings characterized by
endogenous treatment selection and the availability of instrumental variables.
We introduce encouragement rules that manipulate an instrument. Incorporating
the marginal treatment effects (MTE) as policy invariant structural parameters,
we establish the identification of the social welfare criterion for the optimal
encouragement rule. Focusing on binary encouragement rules, we propose to
estimate the optimal policy via the Empirical Welfare Maximization (EWM) method
and derive convergence rates of the regret (welfare loss). We consider
extensions to accommodate multiple instruments and budget constraints. Using
data from the Indonesian Family Life Survey, we apply the EWM encouragement
rule to advise on the optimal tuition subsidy assignment. Our framework offers
interpretability regarding why a certain subpopulation is targeted.
arXiv link: http://arxiv.org/abs/2206.09883v3
Unbiased estimation of the OLS covariance matrix when the errors are clustered
estimator of the covariance matrix of the OLS estimator that comes close to
unbiasedness. In this paper we derive an estimator that is unbiased when the
random-effects model holds. We do the same for two more general structures. We
study the usefulness of these estimators against others by simulation, the size
of the $t$-test being the criterion. Our findings suggest that the choice of
estimator hardly matters when the regressor has the same distribution over the
clusters. But when the regressor is a cluster-specific treatment variable, the
choice does matter and the unbiased estimator we propose for the random-effects
model shows excellent performance, even when the clusters are highly
unbalanced.
arXiv link: http://arxiv.org/abs/2206.09644v1
Optimal data-driven hiring with equity for underrepresented groups
by hiring. An employer evaluates a set of applicants based on their observable
attributes. The goal is to hire the best candidates while avoiding bias with
regard to a certain protected attribute. Simply ignoring the protected
attribute will not eliminate bias due to correlations in the data. We present a
hiring policy that depends on the protected attribute functionally, but not
statistically, and we prove that, among all possible fair policies, ours is
optimal with respect to the firm's objective. We test our approach on both
synthetic and real data, and find that it shows great practical potential to
improve equity for underrepresented and historically marginalized groups.
arXiv link: http://arxiv.org/abs/2206.09300v1
Interpretable and Actionable Vehicular Greenhouse Gas Emission Prediction at Road link-level
accurate and precise GHG emission prediction models have become a key focus of
the climate research. The appeal is that the predictive models will inform
policymakers, and hopefully, in turn, they will bring about systematic changes.
Since the transportation sector is constantly among the top GHG emission
contributors, especially in populated urban areas, substantial effort has been
going into building more accurate and informative GHG prediction models to help
create more sustainable urban environments. In this work, we seek to establish
a predictive framework of GHG emissions at the urban road segment or link level
of transportation networks. The key theme of the framework centers around model
interpretability and actionability for high-level decision-makers using
econometric Discrete Choice Modelling (DCM). We illustrate that DCM is capable
of predicting link-level GHG emission levels on urban road networks in a
parsimonious and effective manner. Our results show up to 85.4% prediction
accuracy in the DCM models' performances. We also argue that since the goal of
most GHG emission prediction models focuses on involving high-level
decision-makers to make changes and curb emissions, the DCM-based GHG emission
prediction framework is the most suitable framework.
arXiv link: http://arxiv.org/abs/2206.09073v1
Semiparametric Single-Index Estimation for Average Treatment Effects
under the assumption of unconfoundedness given observational data. Our
estimation method alleviates misspecification issues of the propensity score
function by estimating the single-index link function involved through Hermite
polynomials. Our approach is computationally tractable and allows for
moderately large dimension covariates. We provide the large sample properties
of the estimator and show its validity. Also, the average treatment effect
estimator achieves the parametric rate and asymptotic normality. Our extensive
Monte Carlo study shows that the proposed estimator is valid in finite samples.
Applying our method to maternal smoking and infant health, we find that
conventional estimates of smoking's impact on birth weight may be biased due to
propensity score misspecification, and our analysis of job training programs
reveals earnings effects that are more precisely estimated than in prior work.
These applications demonstrate how addressing model misspecification can
substantively affect our understanding of key policy-relevant treatment
effects.
arXiv link: http://arxiv.org/abs/2206.08503v4
Fast and Accurate Variational Inference for Large Bayesian VARs with Stochastic Volatility
distribution of the log-volatility in the context of large Bayesian VARs. In
contrast to existing approaches that are based on local approximations, the new
proposal provides a global approximation that takes into account the entire
support of the joint distribution. In a Monte Carlo study we show that the new
global approximation is over an order of magnitude more accurate than existing
alternatives. We illustrate the proposed methodology with an application of a
96-variable VAR with stochastic volatility to measure global bank network
connectedness.
arXiv link: http://arxiv.org/abs/2206.08438v1
Likelihood ratio test for structural changes in factor models
equivalent to a model without changes in the loadings but a change in the
variance of its factors. This effectively transforms a structural change
problem of high dimension into a problem of low dimension. This paper considers
the likelihood ratio (LR) test for a variance change in the estimated factors.
The LR test implicitly explores a special feature of the estimated factors: the
pre-break and post-break variances can be a singular matrix under the
alternative hypothesis, making the LR test diverging faster and thus more
powerful than Wald-type tests. The better power property of the LR test is also
confirmed by simulations. We also consider mean changes and multiple breaks. We
apply the procedure to the factor modelling and structural change of the US
employment using monthly industry-level-data.
arXiv link: http://arxiv.org/abs/2206.08052v2
Optimality of Matched-Pair Designs in Randomized Controlled Trials
stratified randomization. I show that among all stratified randomization
schemes which treat all units with probability one half, a certain matched-pair
design achieves the maximum statistical precision for estimating the average
treatment effect (ATE). In an important special case, the optimal design pairs
units according to the baseline outcome. In a simulation study based on
datasets from 10 RCTs, this design lowers the standard error for the estimator
of the ATE by 10% on average, and by up to 34%, relative to the original
designs.
arXiv link: http://arxiv.org/abs/2206.07845v1
Finite-Sample Guarantees for High-Dimensional DML
treatment effects in observational settings, where identification of causal
parameters requires a conditional independence or unconfoundedness assumption,
since it allows to control flexibly for a potentially very large number of
covariates. This paper gives novel finite-sample guarantees for joint inference
on high-dimensional DML, bounding how far the finite-sample distribution of the
estimator is from its asymptotic Gaussian approximation. These guarantees are
useful to applied researchers, as they are informative about how far off the
coverage of joint confidence bands can be from the nominal level. There are
many settings where high-dimensional causal parameters may be of interest, such
as the ATE of many treatment profiles, or the ATE of a treatment on many
outcomes. We also cover infinite-dimensional parameters, such as impacts on the
entire marginal distribution of potential outcomes. The finite-sample
guarantees in this paper complement the existing results on consistency and
asymptotic normality of DML estimators, which are either asymptotic or treat
only the one-dimensional case.
arXiv link: http://arxiv.org/abs/2206.07386v1
A new algorithm for structural restrictions in Bayesian vector autoregressions
using sign and other structural restrictions is developed. The reduced-form VAR
disturbances are driven by a few common factors and structural identification
restrictions can be incorporated in their loadings in the form of parametric
restrictions. A Gibbs sampler is derived that allows for reduced-form
parameters and structural restrictions to be sampled efficiently in one step. A
key benefit of the proposed approach is that it allows for treating parameter
estimation and structural inference as a joint problem. An additional benefit
is that the methodology can scale to large VARs with multiple shocks, and it
can be extended to accommodate non-linearities, asymmetries, and numerous other
interesting empirical features. The excellent properties of the new algorithm
for inference are explored using synthetic data experiments, and by revisiting
the role of financial factors in economic fluctuations using identification
based on sign restrictions.
arXiv link: http://arxiv.org/abs/2206.06892v1
Nowcasting the Portuguese GDP with Monthly Data
domestic product (GDP) in each current quarter (nowcasting). It combines bridge
equations of the real GDP on readily available monthly data like the Economic
Sentiment Indicator (ESI), industrial production index, cement sales or exports
and imports, with forecasts for the jagged missing values computed with the
well-known Hodrick and Prescott (HP) filter. As shown, this simple multivariate
approach can perform as well as a Targeted Diffusion Index (TDI) model and
slightly better than the univariate Theta method in terms of out-of-sample mean
errors.
arXiv link: http://arxiv.org/abs/2206.06823v1
A novel reconstruction attack on foreign-trade official statistics, with a Brazilian case study
a novel transaction re-identification attack against official foreign-trade
statistics releases in Brazil. The attack's goal is to re-identify the
importers of foreign-trade transactions (by revealing the identity of the
company performing that transaction), which consequently violates those
importers' fiscal secrecy (by revealing sensitive information: the value and
volume of traded goods). We provide a mathematical formalization of this fiscal
secrecy problem using principles from the framework of quantitative information
flow (QIF), then carefully identify the main sources of imprecision in the
official data releases used as auxiliary information in the attack, and model
transaction re-construction as a linear optimization problem solvable through
integer linear programming (ILP). We show that this problem is NP-complete, and
provide a methodology to identify tractable instances. We exemplify the
feasibility of our attack by performing 2,003 transaction re-identifications
that in total amount to more than $137M, and affect 348 Brazilian companies.
Further, since similar statistics are produced by other statistical agencies,
our attack is of broader concern.
arXiv link: http://arxiv.org/abs/2206.06493v1
Clustering coefficients as measures of the complex interactions in a directed weighted multilayer network
weighted and directed multilayer networks. We extend in the multilayer
theoretical context the clustering coefficients proposed in the literature for
weighted directed monoplex networks. We quantify how deeply a node is involved
in a choesive structure focusing on a single node, on a single layer or on the
entire system. The coefficients convey several characteristics inherent to the
complex topology of the multilayer network. We test their effectiveness
applying them to a particularly complex structure such as the international
trade network. The trade data integrate different aspects and they can be
described by a directed and weighted multilayer network, where each layer
represents import and export relationships between countries for a given
sector. The proposed coefficients find successful application in describing the
interrelations of the trade network, allowing to disentangle the effects of
countries and sectors and jointly consider the interactions between them.
arXiv link: http://arxiv.org/abs/2206.06309v2
A Constructive GAN-based Approach to Exact Estimate Treatment Effect without Matching
selection bias between sample groups can be significantly eliminated. However
in practice, when estimating average treatment effect on the treated (ATT) via
matching, no matter which method, the trade-off between estimation accuracy and
information loss constantly exist. Attempting to completely replace the
matching process, this paper proposes the GAN-ATT estimator that integrates
generative adversarial network (GAN) into counterfactual inference framework.
Through GAN machine learning, the probability density functions (PDFs) of
samples in both treatment group and control group can be approximated. By
differentiating conditional PDFs of the two groups with identical input
condition, the conditional average treatment effect (CATE) can be estimated,
and the ensemble average of corresponding CATEs over all treatment group
samples is the estimate of ATT. Utilizing GAN-based infinite sample
augmentations, problems in the case of insufficient samples or lack of common
support domains can be easily solved. Theoretically, when GAN could perfectly
learn the PDFs, our estimators can provide exact estimate of ATT.
To check the performance of the GAN-ATT estimator, three sets of data are
used for ATT estimations: Two toy data sets with 1/2 dimensional covariate
inputs and constant/covariate-dependent treatment effect are tested. The
estimates of GAN-ATT are proved close to the ground truth and are better than
traditional matching approaches; A real firm-level data set with
high-dimensional input is tested and the applicability towards real data sets
is evaluated by comparing matching approaches. Through the evidences obtained
from the three tests, we believe that the GAN-ATT estimator has significant
advantages over traditional matching methods in estimating ATT.
arXiv link: http://arxiv.org/abs/2206.06116v1
Robust Knockoffs for Controlling False Discoveries With an Application to Bond Recovery Rates
are frequently present in finance, economics, but also in complex natural
systems as e.g. weather. We develop a robustified version of the knockoff
framework, which addresses challenges with high dependence among possibly many
influencing factors and strong time correlation. In particular, the repeated
subsampling strategy tackles the variability of the knockoffs and the
dependency of factors. Simultaneously, we also control the proportion of false
discoveries over a grid of all possible values, which mitigates variability of
selected factors from ad-hoc choices of a specific false discovery level. In
the application for corporate bond recovery rates, we identify new important
groups of relevant factors on top of the known standard drivers. But we also
show that out-of-sample, the resulting sparse model has similar predictive
power to state-of-the-art machine learning models that use the entire set of
predictors.
arXiv link: http://arxiv.org/abs/2206.06026v1
Debiased Machine Learning U-statistics
Learning (ML) first-steps. Standard plug-in estimators often suffer from
regularization and model-selection biases, producing invalid inferences. We
show that Debiased Machine Learning (DML) estimators can be constructed within
a U-statistics framework to correct these biases while preserving desirable
statistical properties. The approach delivers simple, robust estimators with
provable asymptotic normality and good finite-sample performance. We apply our
method to three problems: inference on Inequality of Opportunity (IOp) using
the Gini coefficient of ML-predicted incomes given circumstances, inference on
predictive accuracy via the Area Under the Curve (AUC), and inference on linear
models with ML-based sample-selection corrections. Using European survey data,
we present the first debiased estimates of income IOp. In our empirical
application, commonly employed ML-based plug-in estimators systematically
underestimate IOp, while our debiased estimators are robust across ML methods.
arXiv link: http://arxiv.org/abs/2206.05235v4
Forecasting macroeconomic data with Bayesian VARs: Sparse or dense? It depends!
forecasting macroeconomic variables. In high dimensions, however, they are
prone to overfitting. Bayesian methods, more concretely shrinkage priors, have
shown to be successful in improving prediction performance. In the present
paper, we introduce the semi-global framework, in which we replace the
traditional global shrinkage parameter with group-specific shrinkage
parameters. We show how this framework can be applied to various shrinkage
priors, such as global-local priors and stochastic search variable selection
priors. We demonstrate the virtues of the proposed framework in an extensive
simulation study and in an empirical application forecasting data of the US
economy. Further, we shed more light on the ongoing “Illusion of Sparsity”
debate, finding that forecasting performances under sparse/dense priors vary
across evaluated economic variables and across time frames. Dynamic model
averaging, however, can combine the merits of both worlds.
arXiv link: http://arxiv.org/abs/2206.04902v5
On the Performance of the Neyman Allocation with Small Pilots
typically assume that researchers have access to large pilot studies. This may
be unrealistic. To understand the properties of the Neyman Allocation with
small pilots, we study its behavior in an asymptotic framework that takes pilot
size to be fixed even as the size of the main wave tends to infinity. Our
analysis shows that the Neyman Allocation can lead to estimates of the ATE with
higher asymptotic variance than with (non-adaptive) balanced randomization. In
particular, this happens when the outcome variable is relatively homoskedastic
with respect to treatment status or when it exhibits high kurtosis. We provide
a series of empirical examples showing that such situations can arise in
practice. Our results suggest that researchers with small pilots should not use
the Neyman Allocation if they believe that outcomes are homoskedastic or
heavy-tailed. Finally, we examine some potential methods for improving the
finite sample performance of the FNA via simulations.
arXiv link: http://arxiv.org/abs/2206.04643v4
A Two-Ball Ellsberg Paradox: An Experiment
sample \\ (N=708) to test whether people prefer to avoid ambiguity even when it
means choosing dominated options. In contrast to the literature, we find that
55% of subjects prefer a risky act to an ambiguous act that always provides a
larger probability of winning. Our experimental design shows that such a
preference is not mainly due to a lack of understanding. We conclude that
subjects avoid ambiguity per se rather than avoiding ambiguity because
it may yield a worse outcome. Such behavior cannot be reconciled with existing
models of ambiguity aversion in a straightforward manner.
arXiv link: http://arxiv.org/abs/2206.04605v6
Inference for Matched Tuples and Fully Blocked Factorial Designs
treatments, where treatment status is determined according to a "matched
tuples" design. Here, by a matched tuples design, we mean an experimental
design where units are sampled i.i.d. from the population of interest, grouped
into "homogeneous" blocks with cardinality equal to the number of treatments,
and finally, within each block, each treatment is assigned exactly once
uniformly at random. We first study estimation and inference for matched tuples
designs in the general setting where the parameter of interest is a vector of
linear contrasts over the collection of average potential outcomes for each
treatment. Parameters of this form include standard average treatment effects
used to compare one treatment relative to another, but also include parameters
which may be of interest in the analysis of factorial designs. We first
establish conditions under which a sample analogue estimator is asymptotically
normal and construct a consistent estimator of its corresponding asymptotic
variance. Combining these results establishes the asymptotic exactness of tests
based on these estimators. In contrast, we show that, for two common testing
procedures based on t-tests constructed from linear regressions, one test is
generally conservative while the other generally invalid. We go on to apply our
results to study the asymptotic properties of what we call "fully-blocked" 2^K
factorial designs, which are simply matched tuples designs applied to a full
factorial experiment. Leveraging our previous results, we establish that our
estimator achieves a lower asymptotic variance under the fully-blocked design
than that under any stratified factorial design which stratifies the
experimental sample into a finite number of "large" strata. A simulation study
and empirical application illustrate the practical relevance of our results.
arXiv link: http://arxiv.org/abs/2206.04157v5
Economic activity and climate change
relationship between economic activity and climate change. Due to the critical
relevance of these effects for the well-being of future generations, there is
an explosion of publications devoted to measuring this relationship and its
main channels. The relation between economic activity and climate change is
complex with the possibility of causality running in both directions. Starting
from economic activity, the channels that relate economic activity and climate
change are energy consumption and the consequent pollution. Hence, we first
describe the main econometric contributions about the interactions between
economic activity and energy consumption, moving then to describing the
contributions on the interactions between economic activity and pollution.
Finally, we look at the main results on the relationship between climate change
and economic activity. An important consequence of climate change is the
increasing occurrence of extreme weather phenomena. Therefore, we also survey
contributions on the economic effects of catastrophic climate phenomena.
arXiv link: http://arxiv.org/abs/2206.03187v2
The Impact of Sampling Variability on Estimated Combinations of Distributional Forecasts
combinations, with particular attention given to the combination of forecast
distributions. Unknown parameters in the forecast combination are optimized
according to criterion functions based on proper scoring rules, which are
chosen to reward the form of forecast accuracy that matters for the problem at
hand, and forecast performance is measured using the out-of-sample expectation
of said scoring rule. Our results provide novel insights into the behavior of
estimated forecast combinations. Firstly, we show that, asymptotically, the
sampling variability in the performance of standard forecast combinations is
determined solely by estimation of the constituent models, with estimation of
the combination weights contributing no sampling variability whatsoever, at
first order. Secondly, we show that, if computationally feasible, forecast
combinations produced in a single step -- in which the constituent model and
combination function parameters are estimated jointly -- have superior
predictive accuracy and lower sampling variability than standard forecast
combinations -- where constituent model and combination function parameters are
estimated in two steps. These theoretical insights are demonstrated
numerically, both in simulation settings and in an extensive empirical
illustration using a time series of S&P500 returns.
arXiv link: http://arxiv.org/abs/2206.02376v1
Markovian Interference in Experiments
experimental units impact other units through a limiting constraint (such as a
limited inventory). Despite outsize practical importance, the best estimators
for this `Markovian' interference problem are largely heuristic in nature, and
their bias is not well understood. We formalize the problem of inference in
such experiments as one of policy evaluation. Off-policy estimators, while
unbiased, apparently incur a large penalty in variance relative to
state-of-the-art heuristics. We introduce an on-policy estimator: the
Differences-In-Q's (DQ) estimator. We show that the DQ estimator can in general
have exponentially smaller variance than off-policy evaluation. At the same
time, its bias is second order in the impact of the intervention. This yields a
striking bias-variance tradeoff so that the DQ estimator effectively dominates
state-of-the-art alternatives. From a theoretical perspective, we introduce
three separate novel techniques that are of independent interest in the theory
of Reinforcement Learning (RL). Our empirical evaluation includes a set of
experiments on a city-scale ride-hailing simulator.
arXiv link: http://arxiv.org/abs/2206.02371v2
Assessing Omitted Variable Bias when the Controls are Endogenous
of causal effects. Several widely used methods assess the impact of omitted
variables on empirical conclusions by comparing measures of selection on
observables with measures of selection on unobservables. The recent literature
has discussed various limitations of these existing methods, however. This
includes a companion paper of ours which explains issues that arise when the
omitted variables are endogenous, meaning that they are correlated with the
included controls. In the present paper, we develop a new approach to
sensitivity analysis that avoids those limitations, while still allowing
researchers to calibrate sensitivity parameters by comparing the magnitude of
selection on observables with the magnitude of selection on unobservables as in
previous methods. We illustrate our results in an empirical study of the effect
of historical American frontier life on modern cultural beliefs. Finally, we
implement these methods in the companion Stata module regsensitivity for easy
use in practice.
arXiv link: http://arxiv.org/abs/2206.02303v5
Causal impact of severe events on electricity demand: The case of COVID-19 in Japan
global impact on people's lives. Previous studies have reported that COVID-19
decreased the electricity demand in early 2020. However, our study found that
the electricity demand increased in summer and winter even when the infection
was widespread. The fact that the event has continued over two years suggests
that it is essential to introduce the method which can estimate the impact of
the event for long period considering seasonal fluctuations. We employed the
Bayesian structural time-series model to estimate the causal impact of COVID-19
on electricity demand in Japan. The results indicate that behavioral
restrictions due to COVID-19 decreased the daily electricity demand (-5.1% in
weekdays, -6.1% in holidays) in April and May 2020 as indicated by previous
studies. However, even in 2020, the results show that the demand increases in
the hot summer and cold winter (the increasing rate is +14% in the period from
1st August to 15th September 2020, and +7.6% from 16th December 2020 to 15th
January 2021). This study shows that the significant decrease in electricity
demand for the business sector exceeded the increase in demand for the
household sector in April and May 2020; however, the increase in demand for the
households exceeded the decrease in demand for the business in hot summer and
cold winter periods. Our result also implies that it is possible to run out of
electricity when people's behavior changes even if they are less active.
arXiv link: http://arxiv.org/abs/2206.02122v1
Debiased Machine Learning without Sample-Splitting for Stable Estimators
generalized method of moments problem, which involves auxiliary functions that
correspond to solutions to a regression or classification problem. Recent line
of work on debiased machine learning shows how one can use generic machine
learning estimators for these auxiliary problems, while maintaining asymptotic
normality and root-$n$ consistency of the target parameter of interest, while
only requiring mean-squared-error guarantees from the auxiliary estimation
algorithms. The literature typically requires that these auxiliary problems are
fitted on a separate sample or in a cross-fitting manner. We show that when
these auxiliary estimation algorithms satisfy natural leave-one-out stability
properties, then sample splitting is not required. This allows for sample
re-use, which can be beneficial in moderately sized sample regimes. For
instance, we show that the stability properties that we propose are satisfied
for ensemble bagged estimators, built via sub-sampling without replacement, a
popular technique in machine learning practice.
arXiv link: http://arxiv.org/abs/2206.01825v2
Bayesian and Frequentist Inference for Synthetic Controls
causal effects with observational data. Despite this, inference for synthetic
control methods remains challenging. Often, inferential results rely on linear
factor model data generating processes. In this paper, we characterize the
conditions on the factor model primitives (the factor loadings) for which the
statistical risk minimizers are synthetic controls (in the simplex). Then, we
propose a Bayesian alternative to the synthetic control method that preserves
the main features of the standard method and provides a new way of doing valid
inference. We explore a Bernstein-von Mises style result to link our Bayesian
inference to the frequentist inference. For linear factor model frameworks we
show that a maximum likelihood estimator (MLE) of the synthetic control weights
can consistently estimate the predictive function of the potential outcomes for
the treated unit and that our Bayes estimator is asymptotically close to the
MLE in the total variation sense. Through simulations, we show that there is
convergence between the Bayes and frequentist approach even in sparse settings.
Finally, we apply the method to re-visit the study of the economic costs of the
German re-unification and the Catalan secession movement. The Bayesian
synthetic control method is available in the bsynth R-package.
arXiv link: http://arxiv.org/abs/2206.01779v3
Cointegration and ARDL specification between the Dubai crude oil and the US natural gas market
and the price of the US natural gas using an updated monthly dataset from 1992
to 2018, incorporating the latter events in the energy markets. After employing
a variety of unit root and cointegration tests, the long-run relationship is
examined via the autoregressive distributed lag (ARDL) cointegration technique,
along with the Toda-Yamamoto (1995) causality test. Our results indicate that
there is a long-run relationship with a unidirectional causality running from
the Dubai crude oil market to the US natural gas market. A variety of post
specification tests indicate that the selected ARDL model is well-specified,
and the results of the Toda-Yamamoto approach via impulse response functions,
forecast error variance decompositions, and historical decompositions with
generalized weights, show that the Dubai crude oil price retains a positive
relationship and affects the US natural gas price.
arXiv link: http://arxiv.org/abs/2206.03278v1
Randomization Inference Tests for Shift-Share Designs
choice between existing approaches that allow for unrestricted spatial
correlation involves tradeoffs, varying in terms of their validity when there
are relatively few or concentrated shocks, and in terms of the assumptions on
the shock assignment process and treatment effects heterogeneity. We propose
alternative randomization inference methods that combine the advantages of
different approaches. These methods are valid in finite samples under
relatively stronger assumptions, while asymptotically valid under weaker
assumptions.
arXiv link: http://arxiv.org/abs/2206.00999v1
Human Wellbeing and Machine Learning
International organisations and statistical offices are now collecting such
survey data at scale. However, standard regression models explain surprisingly
little of the variation in wellbeing, limiting our ability to predict it. In
response, we here assess the potential of Machine Learning (ML) to help us
better understand wellbeing. We analyse wellbeing data on over a million
respondents from Germany, the UK, and the United States. In terms of predictive
power, our ML approaches do perform better than traditional models. Although
the size of the improvement is small in absolute terms, it turns out to be
substantial when compared to that of key variables like health. We moreover
find that drastically expanding the set of explanatory variables doubles the
predictive power of both OLS and the ML approaches on unseen data. The
variables identified as important by our ML algorithms - $i.e.$ material
conditions, health, and meaningful social relations - are similar to those that
have already been identified in the literature. In that sense, our data-driven
ML results validate the findings from conventional approaches.
arXiv link: http://arxiv.org/abs/2206.00574v1
Time-Varying Multivariate Causal Processes
processes which nests many classic and new examples as special cases. We first
prove the existence of a weakly dependent stationary approximation for our
model which is the foundation to initiate the theoretical development.
Afterwards, we consider the QMLE estimation approach, and provide both
point-wise and simultaneous inferences on the coefficient functions. In
addition, we demonstrate the theoretical findings through both simulated and
real data examples. In particular, we show the empirical relevance of our study
using an application to evaluate the conditional correlations between the stock
markets of China and U.S. We find that the interdependence between the two
stock markets is increasing over time.
arXiv link: http://arxiv.org/abs/2206.00409v1
Predicting Day-Ahead Stock Returns using Search Engine Query Volumes: An Application of Gradient Boosted Decision Trees to the S&P 100
the major modern resource for research, detailed data on internet usage
exhibits vast amounts of behavioral information. This paper aims to answer the
question whether this information can be facilitated to predict future returns
of stocks on financial capital markets. In an empirical analysis it implements
gradient boosted decision trees to learn relationships between abnormal returns
of stocks within the S&P 100 index and lagged predictors derived from
historical financial data, as well as search term query volumes on the internet
search engine Google. Models predict the occurrence of day-ahead stock returns
in excess of the index median. On a time frame from 2005 to 2017, all disparate
datasets exhibit valuable information. Evaluated models have average areas
under the receiver operating characteristic between 54.2% and 56.7%, clearly
indicating a classification better than random guessing. Implementing a simple
statistical arbitrage strategy, models are used to create daily trading
portfolios of ten stocks and result in annual performances of more than 57%
before transaction costs. With ensembles of different data sets topping up the
performance ranking, the results further question the weak form and semi-strong
form efficiency of modern financial capital markets. Even though transaction
costs are not included, the approach adds to the existing literature. It gives
guidance on how to use and transform data on internet usage behavior for
financial and economic modeling and forecasting.
arXiv link: http://arxiv.org/abs/2205.15853v2
Variable importance without impossible data
box prediction algorithm make use of synthetic inputs that combine predictor
variables from multiple subjects. These inputs can be unlikely, physically
impossible, or even logically impossible. As a result, the predictions for such
cases can be based on data very unlike any the black box was trained on. We
think that users cannot trust an explanation of the decision of a prediction
algorithm when the explanation uses such values. Instead we advocate a method
called Cohort Shapley that is grounded in economic game theory and unlike most
other game theoretic methods, it uses only actually observed data to quantify
variable importance. Cohort Shapley works by narrowing the cohort of subjects
judged to be similar to a target subject on one or more features. We illustrate
it on an algorithmic fairness problem where it is essential to attribute
importance to protected variables that the model was not trained on.
arXiv link: http://arxiv.org/abs/2205.15750v3
Estimating spot volatility under infinite variation jumps with dependent market microstructure noise
financial data. It is well known that they introduce bias in the estimation of
volatility (including integrated and spot volatilities) of assets, and many
methods have been proposed to deal with this problem. When the jumps are
intensive with infinite variation, the efficient estimation of spot volatility
under serially dependent noise is not available and is thus in need. For this
purpose, we propose a novel estimator of spot volatility with a hybrid use of
the pre-averaging technique and the empirical characteristic function. Under
mild assumptions, the results of consistency and asymptotic normality of our
estimator are established. Furthermore, we show that our estimator achieves an
almost efficient convergence rate with optimal variance when the jumps are
either less active or active with symmetric structure. Simulation studies
verify our theoretical conclusions. We apply our proposed estimator to
empirical analyses, such as estimating the weekly volatility curve using
second-by-second transaction price data.
arXiv link: http://arxiv.org/abs/2205.15738v2
Fast Two-Stage Variational Bayesian Approach to Estimating Panel Spatial Autoregressive Models with Unrestricted Spatial Weights Matrices
estimate unrestricted panel spatial autoregressive models. Using
Dirichlet-Laplace priors, we are able to uncover the spatial relationships
between cross-sectional units without imposing any a priori restrictions. Monte
Carlo experiments show that our approach works well for both long and short
panels. We are also the first in the literature to develop VB methods to
estimate large covariance matrices with unrestricted sparsity patterns, which
are useful for popular large data models such as Bayesian vector
autoregressions. In empirical applications, we examine the spatial
interdependence between euro area sovereign bond ratings and spreads. We find
marked differences between the spillover behaviours of the northern euro area
countries and those of the south.
arXiv link: http://arxiv.org/abs/2205.15420v3
Credible, Strategyproof, Optimal, and Bounded Expected-Round Single-Item Auctions for all Distributions
multiple buyers with i.i.d. valuations. Akbarpour and Li (2020) show that the
only optimal, credible, strategyproof auction is the ascending price auction
with reserves which has unbounded communication complexity. Recent work of
Ferreira and Weinberg (2020) circumvents their impossibility result assuming
the existence of cryptographically secure commitment schemes, and designs a
two-round credible, strategyproof, optimal auction. However, their auction is
only credible when buyers' valuations are MHR or $\alpha$-strongly regular:
they show their auction might not be credible even when there is a single buyer
drawn from a non-MHR distribution. In this work, under the same cryptographic
assumptions, we identify a new single-item auction that is credible,
strategyproof, revenue optimal, and terminates in constant rounds in
expectation for all distributions with finite monopoly price.
arXiv link: http://arxiv.org/abs/2205.14758v1
Provably Auditing Ordinary Least Squares in Low Dimensions
linear regression is critically important, but most metrics either only measure
local stability (i.e. against infinitesimal changes in the data), or are only
interpretable under statistical assumptions. Recent work proposes a simple,
global, finite-sample stability metric: the minimum number of samples that need
to be removed so that rerunning the analysis overturns the conclusion,
specifically meaning that the sign of a particular coefficient of the estimated
regressor changes. However, besides the trivial exponential-time algorithm, the
only approach for computing this metric is a greedy heuristic that lacks
provable guarantees under reasonable, verifiable assumptions; the heuristic
provides a loose upper bound on the stability and also cannot certify lower
bounds on it.
We show that in the low-dimensional regime where the number of covariates is
a constant but the number of samples is large, there are efficient algorithms
for provably estimating (a fractional version of) this metric. Applying our
algorithms to the Boston Housing dataset, we exhibit regression analyses where
we can estimate the stability up to a factor of $3$ better than the greedy
heuristic, and analyses where we can certify stability to dropping even a
majority of the samples.
arXiv link: http://arxiv.org/abs/2205.14284v2
Average Adjusted Association: Efficient Estimation with High Dimensional Confounders
association between binary outcome and exposure variables. Despite its
widespread use, there has been limited discussion on how to summarize the log
odds ratio as a function of confounders through averaging. To address this
issue, we propose the Average Adjusted Association (AAA), which is a summary
measure of association in a heterogeneous population, adjusted for observed
confounders. To facilitate the use of it, we also develop efficient
double/debiased machine learning (DML) estimators of the AAA. Our DML
estimators use two equivalent forms of the efficient influence function, and
are applicable in various sampling scenarios, including random sampling,
outcome-based sampling, and exposure-based sampling. Through real data and
simulations, we demonstrate the practicality and effectiveness of our proposed
estimators in measuring the AAA.
arXiv link: http://arxiv.org/abs/2205.14048v2
Identification of Auction Models Using Order Statistics
opposed to all bids. The usual measurement error approaches to unobserved
heterogeneity are inapplicable due to dependence among order statistics. We
bridge this gap by providing a set of positive identification results. First,
we show that symmetric auctions with discrete unobserved heterogeneity are
identifiable using two consecutive order statistics and an instrument. Second,
we extend the results to ascending auctions with unknown competition and
unobserved heterogeneity.
arXiv link: http://arxiv.org/abs/2205.12917v2
Machine learning method for return direction forecasting of Exchange Traded Funds using classification and regression models
the direction of returns from Exchange Traded Funds (ETFs) using the historical
return data of its components, helping to make investment strategy decisions
through a trading algorithm. In methodological terms, regression and
classification models were applied, using standard datasets from Brazilian and
American markets, in addition to algorithmic error metrics. In terms of
research results, they were analyzed and compared to those of the Na\"ive
forecast and the returns obtained by the buy & hold technique in the same
period of time. In terms of risk and return, the models mostly performed better
than the control metrics, with emphasis on the linear regression model and the
classification models by logistic regression, support vector machine (using the
LinearSVC model), Gaussian Naive Bayes and K-Nearest Neighbors, where in
certain datasets the returns exceeded by two times and the Sharpe ratio by up
to four times those of the buy & hold control model.
arXiv link: http://arxiv.org/abs/2205.12746v2
Estimation and Inference for High Dimensional Factor Model with Regime Switching
factor models with regime switching in the loadings. The model parameters are
estimated jointly by the EM (expectation maximization) algorithm, which in the
current context only requires iteratively calculating regime probabilities and
principal components of the weighted sample covariance matrix. When regime
dynamics are taken into account, smoothed regime probabilities are calculated
using a recursive algorithm. Consistency, convergence rates and limit
distributions of the estimated loadings and the estimated factors are
established under weak cross-sectional and temporal dependence as well as
heteroscedasticity. It is worth noting that due to high dimension, regime
switching can be identified consistently after the switching point with only
one observation. Simulation results show good performance of the proposed
method. An application to the FRED-MD dataset illustrates the potential of the
proposed method for detection of business cycle turning points.
arXiv link: http://arxiv.org/abs/2205.12126v2
Subgeometrically ergodic autoregressions with autoregressive conditional heteroskedasticity
of univariate nonlinear autoregressions with autoregressive conditional
heteroskedasticity (ARCH). The notion of subgeometric ergodicity was introduced
in the Markov chain literature in 1980s and it means that the transition
probability measures converge to the stationary measure at a rate slower than
geometric; this rate is also closely related to the convergence rate of
$\beta$-mixing coefficients. While the existing literature on subgeometrically
ergodic autoregressions assumes a homoskedastic error term, this paper provides
an extension to the case of conditionally heteroskedastic ARCH-type errors,
considerably widening the scope of potential applications. Specifically, we
consider suitably defined higher-order nonlinear autoregressions with possibly
nonlinear ARCH errors and show that they are, under appropriate conditions,
subgeometrically ergodic at a polynomial rate. An empirical example using
energy sector volatility index data illustrates the use of subgeometrically
ergodic AR-ARCH models.
arXiv link: http://arxiv.org/abs/2205.11953v2
Quasi Black-Box Variational Inference with Natural Gradients for Bayesian Learning
complex models. Our approach relies on natural gradient updates within a
general black-box framework for efficient training with limited model-specific
derivations. It applies within the class of exponential-family variational
posterior distributions, for which we extensively discuss the Gaussian case for
which the updates have a rather simple form. Our Quasi Black-box Variational
Inference (QBVI) framework is readily applicable to a wide class of Bayesian
inference problems and is of simple implementation as the updates of the
variational posterior do not involve gradients with respect to the model
parameters, nor the prescription of the Fisher information matrix. We develop
QBVI under different hypotheses for the posterior covariance matrix, discuss
details about its robust and feasible implementation, and provide a number of
real-world applications to demonstrate its effectiveness.
arXiv link: http://arxiv.org/abs/2205.11568v3
Robust and Agnostic Learning of Conditional Distributional Treatment Effects
individual causal effects given baseline covariates. However, the CATE only
captures the (conditional) average, and can overlook risks and tail events,
which are important to treatment choice. In aggregate analyses, this is usually
addressed by measuring the distributional treatment effect (DTE), such as
differences in quantiles or tail expectations between treatment groups.
Hypothetically, one can similarly fit conditional quantile regressions in each
treatment group and take their difference, but this would not be robust to
misspecification or provide agnostic best-in-class predictions. We provide a
new robust and model-agnostic methodology for learning the conditional DTE
(CDTE) for a class of problems that includes conditional quantile treatment
effects, conditional super-quantile treatment effects, and conditional
treatment effects on coherent risk measures given by $f$-divergences. Our
method is based on constructing a special pseudo-outcome and regressing it on
covariates using any regression learner. Our method is model-agnostic in that
it can provide the best projection of CDTE onto the regression model class. Our
method is robust in that even if we learn these nuisances nonparametrically at
very slow rates, we can still learn CDTEs at rates that depend on the class
complexity and even conduct inferences on linear projections of CDTEs. We
investigate the behavior of our proposal in simulations, as well as in a case
study of 401(k) eligibility effects on wealth.
arXiv link: http://arxiv.org/abs/2205.11486v3
Probabilistic forecasting of German electricity imbalance prices
uncertainty to electricity prices and to electricity generation. To address
this challenge, the energy exchanges have been developing further trading
possibilities, especially the intraday and balancing markets. For an energy
trader participating in both markets, the forecasting of imbalance prices is of
particular interest. Therefore, in this manuscript we conduct a very short-term
probabilistic forecasting of imbalance prices, contributing to the scarce
literature in this novel subject. The forecasting is performed 30 minutes
before the delivery, so that the trader might still choose the trading place.
The distribution of the imbalance prices is modelled and forecasted using
methods well-known in the electricity price forecasting literature: lasso with
bootstrap, gamlss, and probabilistic neural networks. The methods are compared
with a naive benchmark in a meaningful rolling window study. The results
provide evidence of the efficiency between the intraday and balancing markets
as the sophisticated methods do not substantially overperform the intraday
continuous price index. On the other hand, they significantly improve the
empirical coverage. The analysis was conducted on the German market, however it
could be easily applied to any other market of similar structure.
arXiv link: http://arxiv.org/abs/2205.11439v1
A Novel Control-Oriented Cell Transmission Model Including Service Stations on Highways
evolution on a highway stretch is affected by the presence of a service
station. The presented model enhances the classical CTM dynamics by adding the
dynamics associated with the service stations, where the vehicles may stop
before merging back into the mainstream. We name it CTMs. We discuss its
flexibility in describing different complex scenarios where multiple stations
are characterized by different drivers' average stopping times corresponding to
different services. The model has been developed to help design control
strategies aimed at decreasing traffic congestion. Thus, we discuss how
classical control schemes can interact with the proposed CTMs. Finally,
we validate the proposed model through numerical simulations and assess the
effects of service stations on traffic evolution, which appear to be
beneficial, especially for relatively short congested periods.
arXiv link: http://arxiv.org/abs/2205.15115v3
Graph-Based Methods for Discrete Choice
choose between political candidates to vote for, between social media posts to
share, and between brands to purchase--moreover, data on these choices are
increasingly abundant. Discrete choice models are a key tool for learning
individual preferences from such data. Additionally, social factors like
conformity and contagion influence individual choice. Traditional methods for
incorporating these factors into choice models do not account for the entire
social network and require hand-crafted features. To overcome these
limitations, we use graph learning to study choice in networked contexts. We
identify three ways in which graph learning techniques can be used for discrete
choice: learning chooser representations, regularizing choice model parameters,
and directly constructing predictions from a network. We design methods in each
category and test them on real-world choice datasets, including county-level
2016 US election results and Android app installation and usage data. We show
that incorporating social network structure can improve the predictions of the
standard econometric choice model, the multinomial logit. We provide evidence
that app installations are influenced by social context, but we find no such
effect on app usage among the same participants, which instead is habit-driven.
In the election data, we highlight the additional insights a discrete choice
framework provides over classification or regression, the typical approaches.
On synthetic data, we demonstrate the sample complexity benefit of using social
information in choice models.
arXiv link: http://arxiv.org/abs/2205.11365v2
Regime and Treatment Effects in Duration Models: Decomposing Expectation and Transplant Effects on the Kidney Waitlist
initial regime randomization influences the timing of a treatment duration. The
initial randomization and treatment affect in turn a duration outcome of
interest. Our empirical application considers the survival of individuals on
the kidney transplant waitlist. Upon entering the waitlist, individuals with an
AB blood type, who are universal recipients, are effectively randomized to a
regime with a higher propensity to rapidly receive a kidney transplant. Our
dynamic potential outcomes framework allows us to identify the pre-transplant
effect of the blood type, and the transplant effects depending on blood type.
We further develop dynamic assumptions which build on the LATE framework and
allow researchers to separate effects for different population substrata. Our
main empirical result is that AB blood type candidates display a higher
pre-transplant mortality. We provide evidence that this effect is due to
behavioural changes rather than biological differences.
arXiv link: http://arxiv.org/abs/2205.11189v1
Fast Instrument Learning with Faster Rates
high-dimensional instruments. We propose a simple algorithm which combines
kernelized IV methods and an arbitrary, adaptive regression algorithm, accessed
as a black box. Our algorithm enjoys faster-rate convergence and adapts to the
dimensionality of informative latent features, while avoiding an expensive
minimax optimization procedure, which has been necessary to establish similar
guarantees. It further brings the benefit of flexible machine learning models
to quasi-Bayesian uncertainty quantification, likelihood-based model selection,
and model averaging. Simulation studies demonstrate the competitive performance
of our method.
arXiv link: http://arxiv.org/abs/2205.10772v2
The Effect of Increased Access to IVF on Women's Careers
a method of assisted reproduction that can delay fertility, which results in
decreased motherhood income penalty. In this research, I estimate the effects
of expanded access to in vitro fertilization (IVF) arising from state insurance
mandates. I use a difference-in-differences model to estimate the effect of
increased IVF accessibility for delaying childbirth and decreasing the
motherhood income penalty. Using the fertility supplement dataset from the
Current Population Survey (CPS), I estimate how outcomes change in states when
they implement their mandates compared to how outcomes change in states that
are not changing their policies. The results indicate that IVF mandates
increase the probability of motherhood by 38 by 3.1 percentage points (p<0.01).
However, the results provide no evidence that IVF insurance mandates impact
women's earnings.
arXiv link: http://arxiv.org/abs/2205.14186v2
The Power of Prognosis: Improving Covariate Balance Tests with Outcome Information
natural experiments and related designs. Unfortunately, when measured
covariates are unrelated to potential outcomes, balance is uninformative about
key identification conditions. We show that balance tests can then lead to
erroneous conclusions. To build stronger tests, researchers should identify
covariates that are jointly predictive of potential outcomes; formally measure
and report covariate prognosis; and prioritize the most individually
informative variables in tests. Building on prior research on “prognostic
scores," we develop bootstrap balance tests that upweight covariates associated
with the outcome. We adapt this approach for regression-discontinuity designs
and use simulations to compare weighting methods based on linear regression and
more flexible methods, including machine learning. The results show how
prognosis weighting can avoid both false negatives and false positives. To
illustrate key points, we study empirical examples from a sample of published
studies, including an important debate over close elections.
arXiv link: http://arxiv.org/abs/2205.10478v2
What's the Harm? Sharp Bounds on the Fraction Negatively Affected by Treatment
counterfactuals -- prevents us from identifying how many might be negatively
affected by a proposed intervention. If, in an A/B test, half of users click
(or buy, or watch, or renew, etc.), whether exposed to the standard experience
A or a new one B, hypothetically it could be because the change affects no one,
because the change positively affects half the user population to go from
no-click to click while negatively affecting the other half, or something in
between. While unknowable, this impact is clearly of material importance to the
decision to implement a change or not, whether due to fairness, long-term,
systemic, or operational considerations. We therefore derive the
tightest-possible (i.e., sharp) bounds on the fraction negatively affected (and
other related estimands) given data with only factual observations, whether
experimental or observational. Naturally, the more we can stratify individuals
by observable covariates, the tighter the sharp bounds. Since these bounds
involve unknown functions that must be learned from data, we develop a robust
inference algorithm that is efficient almost regardless of how and how fast
these functions are learned, remains consistent when some are mislearned, and
still gives valid conservative bounds when most are mislearned. Our methodology
altogether therefore strongly supports credible conclusions: it avoids
spuriously point-identifying this unknowable impact, focusing on the best
bounds instead, and it permits exceedingly robust inference on these. We
demonstrate our method in simulation studies and in a case study of career
counseling for the unemployed.
arXiv link: http://arxiv.org/abs/2205.10327v2
Treatment Effects in Bunching Designs: The Impact of Mandatory Overtime Pay on Hours
researcher does not assume a parametric choice model. I find that in a general
choice model, identifying the average causal response to the policy switch at a
kink amounts to confronting two extrapolation problems, each about the
distribution of a counterfactual choice that is observed only in a censored
manner. I apply this insight to partially identify the effect of overtime pay
regulation on the hours of U.S. workers using administrative payroll data,
assuming that each distribution satisfies a weak non-parametric shape
constraint in the region where it is not observed. The resulting bounds are
informative and indicate a relatively small elasticity of demand for weekly
hours, addressing a long-standing question about the causal effects of the
overtime mandate.
arXiv link: http://arxiv.org/abs/2205.10310v4
The Forecasting performance of the Factor model with Martingale Difference errors
models with martingale difference errors (FMMDE) recently introduced by Lee and
Shao (2018). The FMMDE makes it possible to retrieve a transformation of the
original series so that the resulting variables can be partitioned according to
whether they are conditionally mean-independent with respect to past
information. We contribute to the literature in two respects. First, we propose
a novel methodology for selecting the number of factors in FMMDE. Through
simulation experiments, we show the good performance of our approach for finite
samples for various panel data specifications. Second, we compare the
forecasting performance of FMMDE with alternative factor model specifications
by conducting an extensive forecasting exercise using FRED-MD, a comprehensive
monthly macroeconomic database for the US economy. Our empirical findings
indicate that FMMDE provides an advantage in predicting the evolution of the
real sector of the economy when the novel methodology for factor selection is
adopted. These results are confirmed for key aggregates such as Production and
Income, the Labor Market, and Consumption.
arXiv link: http://arxiv.org/abs/2205.10256v2
A New Central Limit Theorem for the Augmented IPW Estimator: Variance Inflation, Cross-Fit Covariance and Beyond
causal inference. In recent times, inference for the ATE in the presence of
high-dimensional covariates has been extensively studied. Among the diverse
approaches that have been proposed, augmented inverse probability weighting
(AIPW) with cross-fitting has emerged a popular choice in practice. In this
work, we study this cross-fit AIPW estimator under well-specified outcome
regression and propensity score models in a high-dimensional regime where the
number of features and samples are both large and comparable. Under assumptions
on the covariate distribution, we establish a new central limit theorem for the
suitably scaled cross-fit AIPW that applies without any sparsity assumptions on
the underlying high-dimensional parameters. Our CLT uncovers two crucial
phenomena among others: (i) the AIPW exhibits a substantial variance inflation
that can be precisely quantified in terms of the signal-to-noise ratio and
other problem parameters, (ii) the asymptotic covariance between the
pre-cross-fit estimators is non-negligible even on the root-n scale. These
findings are strikingly different from their classical counterparts. On the
technical front, our work utilizes a novel interplay between three distinct
tools--approximate message passing theory, the theory of deterministic
equivalents, and the leave-one-out approach. We believe our proof techniques
should be useful for analyzing other two-stage estimators in this
high-dimensional regime. Finally, we complement our theoretical results with
simulations that demonstrate both the finite sample efficacy of our CLT and its
robustness to our assumptions.
arXiv link: http://arxiv.org/abs/2205.10198v3
Nonlinear Fore(Back)casting and Innovation Filtering for Causal-Noncausal VAR Models
satisfy the Markov property in both calendar and reverse time. Based on that
property, we introduce closed-form formulas of forward and backward predictive
densities for point and interval forecasting and backcasting out-of-sample. The
backcasting formula is used for adjusting the forecast interval to obtain a
desired coverage level when the tail quantiles are difficult to estimate. A
confidence set for the prediction interval is introduced for assessing the
uncertainty due to estimation. We also define new nonlinear past-dependent
innovations of mixed causal-noncausal VAR models for impulse response function
analysis. Our approach is illustrated by simulations and an application to oil
prices and real GDP growth rates.
arXiv link: http://arxiv.org/abs/2205.09922v4
High-dimensional Data Bootstrap
review high-dimensional central limit theorems for distributions of sample mean
vectors over the rectangles, bootstrap consistency results in high dimensions,
and key techniques used to establish those results. We then review selected
applications of high-dimensional bootstrap: construction of simultaneous
confidence sets for high-dimensional vector parameters, multiple hypothesis
testing via stepdown, post-selection inference, intersection bounds for
partially identified parameters, and inference on best policies in policy
evaluation. Finally, we also comment on a couple of future research directions.
arXiv link: http://arxiv.org/abs/2205.09691v1
Dynamics of a Binary Option Market with Exogenous Information and Price Sensitivity
with exogenous information. The resulting non-linear system has a discontinuous
right hand side, which can be analyzed using zero-dimensional Filippov
surfaces. Under general assumptions on purchasing rules, we show that when
exogenous information is constant in the binary asset market, the price always
converges. We then investigate market prices in the case of changing
information, showing empirically that price sensitivity has a strong effect on
price lag vs. information. We conclude with open questions on general $n$-ary
option markets. As a by-product of the analysis, we show that these markets are
equivalent to a simple recurrent neural network, helping to explain some of the
predictive power associated with prediction markets, which are usually designed
as $n$-ary option markets.
arXiv link: http://arxiv.org/abs/2206.07132v1
Treatment Choice with Nonlinear Regret
undesirable treatment choice due to sensitivity to sampling uncertainty. We
propose to minimize the mean of a nonlinear transformation of regret and show
that singleton rules are not essentially complete for nonlinear regret.
Focusing on mean square regret, we derive closed-form fractions for
finite-sample Bayes and minimax optimal rules. Our approach is grounded in
decision theory and extends to limit experiments. The treatment fractions can
be viewed as the strength of evidence favoring treatment. We apply our
framework to a normal regression model and sample size calculation.
arXiv link: http://arxiv.org/abs/2205.08586v6
The Power of Tests for Detecting $p$-Hacking
based on the distribution of $p$-values across studies. Interpreting results in
this literature requires a careful understanding of the power of methods for
detecting $p$-hacking. We theoretically study the implications of likely forms
of $p$-hacking on the distribution of $p$-values to understand the power of
tests for detecting it. Power can be low and depends crucially on the
$p$-hacking strategy and the distribution of true effects. Combined tests for
upper bounds and monotonicity and tests for continuity of the $p$-curve tend to
have the highest power for detecting $p$-hacking.
arXiv link: http://arxiv.org/abs/2205.07950v4
2SLS with Multiple Treatments
multiple treatments under treatment effect heterogeneity. Two conditions are
shown to be necessary and sufficient for the 2SLS to identify positively
weighted sums of agent-specific effects of each treatment: average conditional
monotonicity and no cross effects. Our identification analysis allows for any
number of treatments, any number of continuous or discrete instruments, and the
inclusion of covariates. We provide testable implications and present
characterizations of choice behavior implied by our identification conditions.
arXiv link: http://arxiv.org/abs/2205.07836v11
HARNet: A Convolutional Neural Network for Realized Volatility Forecasting
areas, neural network models have so far not been widely adopted in the context
of volatility forecasting. In this work, we aim to bridge the conceptual gap
between established time series approaches, such as the Heterogeneous
Autoregressive (HAR) model, and state-of-the-art deep neural network models.
The newly introduced HARNet is based on a hierarchy of dilated convolutional
layers, which facilitates an exponential growth of the receptive field of the
model in the number of model parameters. HARNets allow for an explicit
initialization scheme such that before optimization, a HARNet yields identical
predictions as the respective baseline HAR model. Particularly when considering
the QLIKE error as a loss function, we find that this approach significantly
stabilizes the optimization of HARNets. We evaluate the performance of HARNets
with respect to three different stock market indexes. Based on this evaluation,
we formulate clear guidelines for the optimization of HARNets and show that
HARNets can substantially improve upon the forecasting accuracy of their
respective HAR baseline models. In a qualitative analysis of the filter weights
learnt by a HARNet, we report clear patterns regarding the predictive power of
past information. Among information from the previous week, yesterday and the
day before, yesterday's volatility makes by far the most contribution to
today's realized volatility forecast. Moroever, within the previous month, the
importance of single weeks diminishes almost linearly when moving further into
the past.
arXiv link: http://arxiv.org/abs/2205.07719v1
Is climate change time reversible?
stochastic processes by using the properties of mixed causal and noncausal
models. It shows that they can also be used for non-stationary processes when
the trend component is computed with the Hodrick-Prescott filter rendering a
time-reversible closed-form solution. This paper also links the concept of an
environmental tipping point to the statistical property of time irreversibility
and assesses fourteen climate indicators. We find evidence of time
irreversibility in $GHG$ emissions, global temperature, global sea levels, sea
ice area, and some natural oscillation indices. While not conclusive, our
findings urge the implementation of correction policies to avoid the worst
consequences of climate change and not miss the opportunity window, which might
still be available, despite closing quickly.
arXiv link: http://arxiv.org/abs/2205.07579v3
Inference with Imputed Data: The Allure of Making Stuff Up
is no panacea for missing data. What one can learn about a population parameter
depends on the assumptions one finds credible to maintain. The credibility of
assumptions varies with the empirical setting. No specific assumptions can
provide a realistic general solution to the problem of inference with missing
data. Yet Rubin has promoted random multiple imputation (RMI) as a general way
to deal with missing values in public-use data. This recommendation has been
influential to empirical researchers who seek a simple fix to the nuisance of
missing data. This paper adds to my earlier critiques of imputation. It
provides a transparent assessment of the mix of Bayesian and frequentist
thinking used by Rubin to argue for RMI. It evaluates random imputation to
replace missing outcome or covariate data when the objective is to learn a
conditional expectation. It considers steps that might help combat the allure
of making stuff up.
arXiv link: http://arxiv.org/abs/2205.07388v1
Joint Location and Cost Planning in Maximum Capture Facility Location under Multiplicative Random Utility Maximization
market under random utility maximization (RUM) models. The objective is to
locate new facilities and make decisions on the costs (or budgets) to spend on
the new facilities, aiming to maximize an expected captured customer demand,
assuming that customers choose a facility among all available facilities
according to a RUM model. We examine two RUM frameworks in the discrete choice
literature, namely, the additive and multiplicative RUM. While the former has
been widely used in facility location problems, we are the first to explore the
latter in the context. We numerically show that the two RUM frameworks can well
approximate each other in the context of the cost optimization problem. In
addition, we show that, under the additive RUM framework, the resultant cost
optimization problem becomes highly non-convex and may have several local
optima. In contrast, the use of the multiplicative RUM brings several
advantages to the competitive facility location problem. For instance, the cost
optimization problem under the multiplicative RUM can be solved efficiently by
a general convex optimization solver or can be reformulated as a conic
quadratic program and handled by a conic solver available in some off-the-shelf
solvers such as CPLEX or GUROBI. Furthermore, we consider a joint location and
cost optimization problem under the multiplicative RUM and propose three
approaches to solve the problem, namely, an equivalent conic reformulation, a
multi-cut outer-approximation algorithm, and a local search heuristic. We
provide numerical experiments based on synthetic instances of various sizes to
evaluate the performances of the proposed algorithms in solving the cost
optimization, and the joint location and cost optimization problems.
arXiv link: http://arxiv.org/abs/2205.07345v2
How do Bounce Rates vary according to product sold?
based upon the different devices through which traffic share is observed. This
research paper focuses on how the type of products sold by different E-commerce
websites affects the bounce rate obtained through Mobile/Desktop. It tries to
explain the observations which counter the general trend of positive relation
between Mobile traffic share and bounce rate and how this is different for the
Desktop. To estimate the differences created by the types of products sold by
E-commerce websites on the bounce rate according to the data observed for
different time, fixed effect model (within group method) is used to determine
the difference created by the factors. Along with the effect of the type of
products sold by the E-commerce website on bounce rate, the effect of
individual website is also compared to verify the results obtained for type of
products.
arXiv link: http://arxiv.org/abs/2205.06866v1
A Robust Permutation Test for Subvector Inference in Linear Regressions
coefficients in linear models. The test is exact when the regressors and the
error terms are independent. Then, we show that the test is asymptotically of
correct level, consistent and has power against local alternatives when the
independence condition is relaxed, under two main conditions. The first is a
slight reinforcement of the usual absence of correlation between the regressors
and the error term. The second is that the number of strata, defined by values
of the regressors not involved in the subvector test, is small compared to the
sample size. The latter implies that the vector of nuisance regressors is
discrete. Simulations and empirical illustrations suggest that the test has
good power in practice if, indeed, the number of strata is small compared to
the sample size.
arXiv link: http://arxiv.org/abs/2205.06713v4
Causal Estimation of Position Bias in Recommender Systems Using Marketplace Instruments
search engines, are ubiquitous in today's digital society. They facilitate
information discovery by ranking retrieved items on predicted relevance, i.e.
likelihood of interaction (click, share) between users and items. Typically
modeled using past interactions, such rankings have a major drawback:
interaction depends on the attention items receive. A highly-relevant item
placed outside a user's attention could receive little interaction. This
discrepancy between observed interaction and true relevance is termed the
position bias. Position bias degrades relevance estimation and when it
compounds over time, it can silo users into false relevant items, causing
marketplace inefficiencies. Position bias may be identified with randomized
experiments, but such an approach can be prohibitive in cost and feasibility.
Past research has also suggested propensity score methods, which do not
adequately address unobserved confounding; and regression discontinuity
designs, which have poor external validity. In this work, we address these
concerns by leveraging the abundance of A/B tests in ranking evaluations as
instrumental variables. Historical A/B tests allow us to access exogenous
variation in rankings without manually introducing them, harming user
experience and platform revenue. We demonstrate our methodology in two distinct
applications at LinkedIn - feed ads and the People-You-May-Know (PYMK)
recommender. The marketplaces comprise users and campaigns on the ads side, and
invite senders and recipients on PYMK. By leveraging prior experimentation, we
obtain quasi-experimental variation in item rankings that is orthogonal to user
relevance. Our method provides robust position effect estimates that handle
unobserved confounding well, greater generalizability, and easily extends to
other information retrieval systems.
arXiv link: http://arxiv.org/abs/2205.06363v1
A single risk approach to the semiparametric copula competing risks model
only interested in a subset of risks. This paper considers a depending
competing risks model with the distribution of one risk being a parametric or
semi-parametric model, while the model for the other risks being unknown.
Identifiability is shown for popular classes of parametric models and the
semiparametric proportional hazards model. The identifiability of the
parametric models does not require a covariate, while the semiparametric model
requires at least one. Estimation approaches are suggested which are shown to
be $n$-consistent. Applicability and attractive finite sample
performance are demonstrated with the help of simulations and data examples.
arXiv link: http://arxiv.org/abs/2205.06087v1
Multivariate ordered discrete response models
rectangular structures. From the perspective of behavioral economics, these
non-lattice models correspond to broad bracketing in decision making, whereas
lattice models, which researchers typically estimate in practice, correspond to
narrow bracketing. In these models, we specify latent processes as a sum of an
index of covariates and an unobserved error, with unobservables for different
latent processes potentially correlated. We provide conditions that are
sufficient for identification under the independence of errors and covariates
and outline an estimation approach. We present simulations and empirical
examples, with a particular focus on probit specifications.
arXiv link: http://arxiv.org/abs/2205.05779v2
Externally Valid Policy Choice
externally valid or generalizable: they perform well in other target
populations besides the experimental (or training) population from which data
are sampled. We first show that welfare-maximizing policies for the
experimental population are robust to shifts in the distribution of outcomes
(but not characteristics) between the experimental and target populations. We
then develop new methods for learning policies that are robust to shifts in
outcomes and characteristics. In doing so, we highlight how treatment effect
heterogeneity within the experimental population affects the generalizability
of policies. Our methods may be used with experimental or observational data
(where treatment is endogenous). Many of our methods can be implemented with
linear programming.
arXiv link: http://arxiv.org/abs/2205.05561v3
On learning agent-based models from data
of complex systems from micro-level assumptions. However, ABMs typically can
not estimate agent-specific (or "micro") variables: this is a major limitation
which prevents ABMs from harnessing micro-level data availability and which
greatly limits their predictive power. In this paper, we propose a protocol to
learn the latent micro-variables of an ABM from data. The first step of our
protocol is to reduce an ABM to a probabilistic model, characterized by a
computationally tractable likelihood. This reduction follows two general design
principles: balance of stochasticity and data availability, and replacement of
unobservable discrete choices with differentiable approximations. Then, our
protocol proceeds by maximizing the likelihood of the latent variables via a
gradient-based expectation maximization algorithm. We demonstrate our protocol
by applying it to an ABM of the housing market, in which agents with different
incomes bid higher prices to live in high-income neighborhoods. We demonstrate
that the obtained model allows accurate estimates of the latent variables,
while preserving the general behavior of the ABM. We also show that our
estimates can be used for out-of-sample forecasting. Our protocol can be seen
as an alternative to black-box data assimilation methods, that forces the
modeler to lay bare the assumptions of the model, to think about the
inferential process, and to spot potential identification problems.
arXiv link: http://arxiv.org/abs/2205.05052v2
Estimating Discrete Games of Complete Information: Bringing Logit Back in the Game
difficult due to partial identification and the absence of closed-form moment
characterizations. This paper proposes computationally tractable approaches to
estimation and inference that remove the computational burden associated with
equilibria enumeration, numerical simulation, and grid search. Separately for
unordered and ordered-actions games, I construct an identified set
characterized by a finite set of generalized likelihood-based conditional
moment inequalities that are convex in (a subvector of) structural model
parameters under the standard logit assumption on unobservables. I use
simulation and empirical examples to show that the proposed approaches generate
informative identified sets and can be several orders of magnitude faster than
existing estimation methods.
arXiv link: http://arxiv.org/abs/2205.05002v5
Stable Outcomes and Information in Games: An Empirical Framework
which players' decisions are publicly observed, yet no player takes the
opportunity to deviate. To analyze such situations in the presence of
incomplete information, we build an empirical framework by introducing a novel
solution concept that we call Bayes stable equilibrium. Our framework allows
the researcher to be agnostic about players' information and the equilibrium
selection rule. The Bayes stable equilibrium identified set collapses to the
complete information pure strategy Nash equilibrium identified set under strong
assumptions on players' information. Furthermore, all else equal, it is weakly
tighter than the Bayes correlated equilibrium identified set. We also propose
computationally tractable approaches for estimation and inference. In an
application, we study the strategic entry decisions of McDonald's and Burger
King in the US. Our results highlight the identifying power of informational
assumptions and show that the Bayes stable equilibrium identified set can be
substantially tighter than the Bayes correlated equilibrium identified set. In
a counterfactual experiment, we examine the impact of increasing access to
healthy food on the market structures in Mississippi food deserts.
arXiv link: http://arxiv.org/abs/2205.04990v2
Distributionally Robust Policy Learning with Wasserstein Distance
observable characteristics, and it is necessary to exploit such heterogeneity
to devise individualized treatment rules (ITRs). Existing estimation methods of
such ITRs assume that the available experimental or observational data are
derived from the target population in which the estimated policy is
implemented. However, this assumption often fails in practice because of
limited useful data. In this case, policymakers must rely on the data generated
in the source population, which differs from the target population.
Unfortunately, existing estimation methods do not necessarily work as expected
in the new setting, and strategies that can achieve a reasonable goal in such a
situation are required. This study examines the application of distributionally
robust optimization (DRO), which formalizes an ambiguity about the target
population and adapts to the worst-case scenario in the set. It is shown that
DRO with Wasserstein distance-based characterization of ambiguity provides
simple intuitions and a simple estimation method. I then develop an estimator
for the distributionally robust ITR and evaluate its theoretical performance.
An empirical application shows that the proposed approach outperforms the naive
approach in the target population.
arXiv link: http://arxiv.org/abs/2205.04637v2
Robust Data-Driven Decisions Under Model Uncertainty
possibly non-identical distributions, the data-generating process (DGP) in
general cannot be perfectly identified from the data. For making decisions
facing such uncertainty, this paper presents a novel approach by studying how
the data can best be used to robustly improve decisions. That is, no matter
which DGP governs the uncertainty, one can make a better decision than without
using the data. I show that common inference methods, e.g., maximum likelihood
and Bayesian updating cannot achieve this goal. To address, I develop new
updating rules that lead to robustly better decisions either asymptotically
almost surely or in finite sample with a pre-specified probability. Especially,
they are easy to implement as are given by simple extensions of the standard
statistical procedures in the case where the possible DGPs are all independent
and identically distributed. Finally, I show that the new updating rules also
lead to more intuitive conclusions in existing economic models such as asset
pricing under ambiguity.
arXiv link: http://arxiv.org/abs/2205.04573v1
A unified test for regression discontinuity designs
problem. We document a massive over-rejection of the diagnostic restriction
among empirical studies in the top five economics journals. At least one
diagnostic test was rejected for 19 out of 59 studies, whereas less than 5% of
the collected 787 tests rejected the null hypotheses. In other words, one-third
of the studies rejected at least one of their diagnostic tests, whereas their
underlying identifying restrictions appear plausible. Multiple testing causes
this problem because the median number of tests per study was as high as 12.
Therefore, we offer unified tests to overcome the size-control problem. Our
procedure is based on the new joint asymptotic normality of local polynomial
mean and density estimates. In simulation studies, our unified tests
outperformed the Bonferroni correction. We implement the procedure as an R
package rdtest with two empirical examples in its vignettes.
arXiv link: http://arxiv.org/abs/2205.04345v5
Policy Choice in Time Series by Empirical Welfare Maximization
where the available data is a multi-variate time series. Building on the
statistical treatment choice framework, we propose Time-series Empirical
Welfare Maximization (T-EWM) methods to estimate an optimal policy rule by
maximizing an empirical welfare criterion constructed using nonparametric
potential outcome time series. We characterize conditions under which T-EWM
consistently learns a policy choice that is optimal in terms of conditional
welfare given the time-series history. We derive a nonasymptotic upper bound
for conditional welfare regret. To illustrate the implementation and uses of
T-EWM, we perform simulation studies and apply the method to estimate optimal
restriction rules against Covid-19.
arXiv link: http://arxiv.org/abs/2205.03970v4
Dynamic demand for differentiated products with fixed-effects unobserved heterogeneity
model of demand for differentiated product using consumer-level panel data with
few purchase events per consumer (i.e., short panel). Consumers are
forward-looking and their preferences incorporate two sources of dynamics: last
choice dependence due to habits and switching costs, and duration dependence
due to inventory, depreciation, or learning. A key distinguishing feature of
the model is that consumer unobserved heterogeneity has a Fixed Effects (FE)
structure -- that is, its probability distribution conditional on the initial
values of endogenous state variables is unrestricted. I apply and extend recent
results to establish the identification of all the structural parameters as
long as the dataset includes four or more purchase events per household. The
parameters can be estimated using a sufficient statistic - conditional maximum
likelihood (CML) method. An attractive feature of CML in this model is that the
sufficient statistic controls for the forward-looking value of the consumer's
decision problem such that the method does not require solving dynamic
programming problems or calculating expected present values.
arXiv link: http://arxiv.org/abs/2205.03948v2
Identification and Estimation of Dynamic Games with Unknown Information Structure
underlying information structure is unknown to the analyst. We introduce
Markov correlated equilibrium, a dynamic analog of Bayes correlated
equilibrium, and show that its predictions coincide with the Markov perfect
equilibrium predictions attainable when players observe richer signals than the
analyst assumes. We provide tractable methods for informationally robust
estimation, inference, and counterfactual analysis. We illustrate the framework
with a dynamic entry game between Starbucks and Dunkin' in the US and study the
role of informational assumptions.
arXiv link: http://arxiv.org/abs/2205.03706v5
Benchmarking Econometric and Machine Learning Methodologies in Nowcasting
data published with a significant time lag, such as final GDP figures.
Currently, there are a plethora of methodologies and approaches for
practitioners to choose from. However, there lacks a comprehensive comparison
of these disparate approaches in terms of predictive performance and
characteristics. This paper addresses that deficiency by examining the
performance of 12 different methodologies in nowcasting US quarterly GDP
growth, including all the methods most commonly employed in nowcasting, as well
as some of the most popular traditional machine learning approaches.
Performance was assessed on three different tumultuous periods in US economic
history: the early 1980s recession, the 2008 financial crisis, and the COVID
crisis. The two best performing methodologies in the analysis were long
short-term memory artificial neural networks (LSTM) and Bayesian vector
autoregression (BVAR). To facilitate further application and testing of each of
the examined methodologies, an open-source repository containing boilerplate
code that can be applied to different datasets is published alongside the
paper, available at: github.com/dhopp1/nowcasting_benchmark.
arXiv link: http://arxiv.org/abs/2205.03318v1
Leverage, Influence, and the Jackknife in Clustered Regression Models: Reliable Inference Using summclust
structure of the dataset for linear regression models with clustered
disturbances. The key unit of observation for such a model is the cluster. We
therefore propose cluster-level measures of leverage, partial leverage, and
influence and show how to compute them quickly in most cases. The measures of
leverage and partial leverage can be used as diagnostic tools to identify
datasets and regression designs in which cluster-robust inference is likely to
be challenging. The measures of influence can provide valuable information
about how the results depend on the data in the various clusters. We also show
how to calculate two jackknife variance matrix estimators efficiently as a
byproduct of our other computations. These estimators, which are already
available in Stata, are generally more conservative than conventional variance
matrix estimators. The summclust package computes all the quantities that we
discuss.
arXiv link: http://arxiv.org/abs/2205.03288v3
Cluster-Robust Inference: A Guide to Empirical Practice
other disciplines. However, it is only recently that theoretical foundations
for the use of these methods in many empirically relevant situations have been
developed. In this paper, we use these theoretical results to provide a guide
to empirical practice. We do not attempt to present a comprehensive survey of
the (very large) literature. Instead, we bridge theory and practice by
providing a thorough guide on what to do and why, based on recently available
econometric theory and simulation evidence. To practice what we preach, we
include an empirical analysis of the effects of the minimum wage on labor
supply of teenagers using individual data.
arXiv link: http://arxiv.org/abs/2205.03285v1
Estimation and Inference by Stochastic Optimization
bootstrap inference. For complex models, this can be computationally intensive.
This paper combines optimization with resampling: turning stochastic
optimization into a fast resampling device. Two methods are introduced: a
resampled Newton-Raphson (rNR) and a resampled quasi-Newton (rqN) algorithm.
Both produce draws that can be used to compute consistent estimates, confidence
intervals, and standard errors in a single run. The draws are generated by a
gradient and Hessian (or an approximation) computed from batches of data that
are resampled at each iteration. The proposed methods transition quickly from
optimization to resampling when the objective is smooth and strictly convex.
Simulated and empirical applications illustrate the properties of the methods
on large scale and computationally intensive problems. Comparisons with
frequentist and Bayesian methods highlight the features of the algorithms.
arXiv link: http://arxiv.org/abs/2205.03254v1
Choosing Exogeneity Assumptions in Potential Outcome Models
among them? When exogeneity is imposed on an unobservable like a potential
outcome, we argue that the form of exogeneity should be chosen based on the
kind of selection on unobservables it allows. Consequently, researchers can
assess the plausibility of any exogeneity assumption by studying the
distributions of treatment given the unobservables that are consistent with
that assumption. We use this approach to study two common exogeneity
assumptions: quantile and mean independence. We show that both assumptions
require a kind of non-monotonic relationship between treatment and the
potential outcomes. We discuss how to assess the plausibility of this kind of
treatment selection. We also show how to define a new and weaker version of
quantile independence that allows for monotonic treatment selection. We then
show the implications of the choice of exogeneity assumption for
identification. We apply these results in an empirical illustration of the
effect of child soldiering on wages.
arXiv link: http://arxiv.org/abs/2205.02288v1
Reducing Marketplace Interference Bias Via Shadow Prices
the design or operation of their platforms. The workhorse of experimentation is
the randomized controlled trial (RCT), or A/B test, in which users are randomly
assigned to treatment or control groups. However, marketplace interference
causes the Stable Unit Treatment Value Assumption (SUTVA) to be violated,
leading to bias in the standard RCT metric. In this work, we propose techniques
for platforms to run standard RCTs and still obtain meaningful estimates
despite the presence of marketplace interference. We specifically consider a
generalized matching setting, in which the platform explicitly matches supply
with demand via a linear programming algorithm. Our first proposal is for the
platform to estimate the value of global treatment and global control via
optimization. We prove that this approach is unbiased in the fluid limit. Our
second proposal is to compare the average shadow price of the treatment and
control groups rather than the total value accrued by each group. We prove that
this technique corresponds to the correct first-order approximation (in a
Taylor series sense) of the value function of interest even in a finite-size
system. We then use this result to prove that, under reasonable assumptions,
our estimator is less biased than the RCT estimator. At the heart of our result
is the idea that it is relatively easy to model interference in matching-driven
marketplaces since, in such markets, the platform mediates the spillover.
arXiv link: http://arxiv.org/abs/2205.02274v4
Approximating Choice Data by Discrete Choice Models
discrete choice models, such as mixed-logit models, are rich enough to
approximate any nonparametric random utility models arbitrarily well across
choice sets. The condition turns out to be the affine-independence of the set
of characteristic vectors. When the condition fails, resulting in some random
utility models that cannot be closely approximated, we identify preferences and
substitution patterns that are challenging to approximate accurately. We also
propose algorithms to quantify the magnitude of approximation errors.
arXiv link: http://arxiv.org/abs/2205.01882v4
Machine Learning based Framework for Robust Price-Sensitivity Estimation with Application to Airline Pricing
feature-dependent price sensitivity. Developing practical algorithms that can
estimate price elasticities robustly, especially when information about no
purchases (losses) is not available, to drive such automated pricing systems is
a challenge faced by many industries. Based on the Poisson semi-parametric
approach, we construct a flexible yet interpretable demand model where the
price related part is parametric while the remaining (nuisance) part of the
model is non-parametric and can be modeled via sophisticated machine learning
(ML) techniques. The estimation of price-sensitivity parameters of this model
via direct one-stage regression techniques may lead to biased estimates due to
regularization. To address this concern, we propose a two-stage estimation
methodology which makes the estimation of the price-sensitivity parameters
robust to biases in the estimators of the nuisance parameters of the model. In
the first-stage we construct estimators of observed purchases and prices given
the feature vector using sophisticated ML estimators such as deep neural
networks. Utilizing the estimators from the first-stage, in the second-stage we
leverage a Bayesian dynamic generalized linear model to estimate the
price-sensitivity parameters. We test the performance of the proposed
estimation schemes on simulated and real sales transaction data from the
Airline industry. Our numerical studies demonstrate that our proposed two-stage
approach reduces the estimation error in price-sensitivity parameters from 25%
to 4% in realistic simulation settings. The two-stage estimation techniques
proposed in this work allows practitioners to leverage modern ML techniques to
robustly estimate price-sensitivities while still maintaining interpretability
and allowing ease of validation of its various constituent parts.
arXiv link: http://arxiv.org/abs/2205.01875v2
Efficient Score Computation and Expectation-Maximization Algorithm in Regime-Switching Models
regime-switching models, and derived from which, an efficient
expectation-maximization (EM) algorithm. Different from existing algorithms,
this algorithm does not rely on the forward-backward filtering for smoothed
regime probabilities, and only involves forward computation. Moreover, the
algorithm to compute score is readily extended to compute the Hessian matrix.
arXiv link: http://arxiv.org/abs/2205.01565v1
Heterogeneous Treatment Effects for Networks, Panels, and other Outcome Matrices
where units are randomized to a treatment but outcomes are measured for pairs
of units. For example, we might measure risk sharing links between households
enrolled in a microfinance program, employment relationships between workers
and firms exposed to a trade shock, or bids from bidders to items assigned to
an auction format. Such a double randomized experimental design may be
appropriate when there are social interactions, market externalities, or other
spillovers across units assigned to the same treatment. Or it may describe a
natural or quasi experiment given to the researcher. In this paper, we propose
a new empirical strategy that compares the eigenvalues of the outcome matrices
associated with each treatment. Our proposal is based on a new matrix analog of
the Fr\'echet-Hoeffding bounds that play a key role in the standard theory. We
first use this result to bound the distribution of treatment effects. We then
propose a new matrix analog of quantile treatment effects that is given by a
difference in the eigenvalues. We call this analog spectral treatment effects.
arXiv link: http://arxiv.org/abs/2205.01246v2
A short term credibility index for central banks under inflation targeting: an application to Brazil
autoregressive models to evaluate the statistical sustainability of Brazilian
inflation targeting system with the tolerance bounds. The probabilities give an
indication of the short-term credibility of the targeting system without
requiring modelling people's beliefs. We employ receiver operating
characteristic curves to determine the optimal probability threshold from which
the bank is predicted to be credible. We also investigate the added value of
including experts predictions of key macroeconomic variables.
arXiv link: http://arxiv.org/abs/2205.00924v2
A Note on "A survey of preference estimation with unobserved choice set heterogeneity" by Gregory S. Crawford, Rachel Griffith, and Alessandro Iaria
unobserved or latent consideration sets, presents a unified framework to
address the problem in practice by using "sufficient sets", defined as a
combination of past observed choices. The proposed approach is sustained in a
re-interpretation of a consistency result by McFadden (1978) for the problem of
sampling of alternatives, but the usage of that result in Crawford et al.
(2021) is imprecise in an important matter. It is stated that consistency would
be attained if any subset of the true consideration set is used for estimation,
but McFadden (1978) shows that, in general, one needs to do a sampling
correction that depends on the protocol used to draw the choice set. This note
derives the sampling correction that is required when the choice set for
estimation is built from past choices. Then, it formalizes the conditions under
which such correction would fulfill the uniform condition property and can
therefore be ignored when building practical estimators, such as the ones
analyzed by Crawford et al. (2021).
arXiv link: http://arxiv.org/abs/2205.00852v1
Higher-order Expansions and Inference for Panel Data Models
panel data models with a focus on such cases that have both serial correlation
and cross-sectional dependence. In order to establish an asymptotic theory to
support the inferential method, we develop some new and useful higher-order
expansions, such as Berry-Esseen bound and Edgeworth Expansion, under a set of
simple and general conditions. We further demonstrate the usefulness of these
theoretical results by explicitly investigating a panel data model with
interactive effects which nests many traditional panel data models as special
cases. Finally, we show the superiority of our approach over several natural
competitors using extensive numerical studies.
arXiv link: http://arxiv.org/abs/2205.00577v2
Greenhouse Gas Emissions and its Main Drivers: a Panel Assessment for EU-27 Member States
over the period 2010-2019, using a Panel EGLS model with period fixed effects.
In particular, we focused our research on studying the effects of GDP,
renewable energy, households energy consumption and waste on the greenhouse gas
emissions. In this regard, we found a positive relationship between three
independent variables (real GDP per capita, households final consumption per
capita and waste generation per capita) and greenhouse gas emissions per
capita, while the effect of the share of renewable energy in gross final energy
consumption on the dependent variable proved to be negative, but quite low. In
addition, we demonstrate that the main challenge that affects greenhouse gas
emissions is related to the structure of households energy consumption, which
is generally composed by environmentally harmful fuels. This suggests the need
to make greater efforts to support the shift to a green economy based on a
higher energy efficiency.
arXiv link: http://arxiv.org/abs/2205.00295v1
A Heteroskedasticity-Robust Overidentifying Restriction Test with High-Dimensional Covariates
linear instrumental variable models. The novelty of the proposed test is that
it allows the number of covariates and instruments to be larger than the sample
size. The test is scale-invariant and is robust to heteroskedastic errors. To
construct the final test statistic, we first introduce a test based on the
maximum norm of multiple parameters that could be high-dimensional. The
theoretical power based on the maximum norm is higher than that in the modified
Cragg-Donald test (Koles\'{a}r, 2018), the only existing test allowing for
large-dimensional covariates. Second, following the principle of power
enhancement (Fan et al., 2015), we introduce the power-enhanced test, with an
asymptotically zero component used to enhance the power to detect some extreme
alternatives with many locally invalid instruments. Finally, an empirical
example of the trade and economic growth nexus demonstrates the usefulness of
the proposed test.
arXiv link: http://arxiv.org/abs/2205.00171v3
Controlling for Latent Confounding with Triple Proxies
using noisy proxies for unobserved confounders. Our approach builds on the
results of Hu2008 who tackle the problem of general measurement error.
We call this the `triple proxy' approach because it requires three proxies that
are jointly independent conditional on unobservables. We consider three
different choices for the third proxy: it may be an outcome, a vector of
treatments, or a collection of auxiliary variables. We compare to an
alternative identification strategy introduced by Miao2018a in which
causal effects are identified using two conditionally independent proxies. We
refer to this as the `double proxy' approach. The triple proxy approach
identifies objects that are not identified by the double proxy approach,
including some that capture the variation in average treatment effects between
strata of the unobservables. Moreover, the conditional independence assumptions
in the double and triple proxy approaches are non-nested.
arXiv link: http://arxiv.org/abs/2204.13815v2
Efficient Estimation of Structural Models via Sieves
(SEES), which approximate the solution using a linear combination of basis
functions and impose equilibrium conditions as a penalty to determine the
best-fitting coefficients. Our estimators avoid the need to repeatedly solve
the model, apply to a broad class of models, and are consistent, asymptotically
normal, and asymptotically efficient. Moreover, they solve unconstrained
optimization problems with fewer unknowns and offer convenient standard error
calculations. As an illustration, we apply our method to an entry game between
Walmart and Kmart.
arXiv link: http://arxiv.org/abs/2204.13488v2
From prediction markets to interpretable collective intelligence
from an arbitrary group of experts, the probability of the truth of an
arbitrary logical proposition together with collective information that has an
explicit form and interprets this probability. Namely, we provide strong
arguments for the possibility of the development of a self-resolving prediction
market with play money that incentivizes direct information exchange between
experts. Such a system could, in particular, motivate simultaneously many
experts to collectively solve scientific or medical problems in a very
efficient manner. We also note that in our considerations, experts are not
assumed to be Bayesian.
arXiv link: http://arxiv.org/abs/2204.13424v3
Impulse response estimation via flexible local projections
by Jord\'a (2005) to a non-parametric setting using Bayesian Additive
Regression Trees. Monte Carlo experiments show that our BART-LP model is able
to capture non-linearities in the impulse responses. Our first application
shows that the fiscal multiplier is stronger in recession than in expansion
only in response to contractionary fiscal shocks, but not in response to
expansionary fiscal shocks. We then show that financial shocks generate effects
on the economy that increase more than proportionately in the size of the shock
when the shock is negative, but not when the shock is positive.
arXiv link: http://arxiv.org/abs/2204.13150v1
Estimation of Recursive Route Choice Models with Incomplete Trip Observations
situation that the trip observations are incomplete, i.e., there are
unconnected links (or nodes) in the observations. A direct approach to handle
this issue would be intractable because enumerating all paths between
unconnected links (or nodes) in a real network is typically not possible. We
exploit an expectation-maximization (EM) method that allows to deal with the
missing-data issue by alternatively performing two steps of sampling the
missing segments in the observations and solving maximum likelihood estimation
problems. Moreover, observing that the EM method would be expensive, we propose
a new estimation method based on the idea that the choice probabilities of
unconnected link observations can be exactly computed by solving systems of
linear equations. We further design a new algorithm, called as
decomposition-composition (DC), that helps reduce the number of systems of
linear equations to be solved and speed up the estimation. We compare our
proposed algorithms with some standard baselines using a dataset from a real
network and show that the DC algorithm outperforms the other approaches in
recovering missing information in the observations. Our methods work with most
of the recursive route choice models proposed in the literature, including the
recursive logit, nested recursive logit, or discounted recursive models.
arXiv link: http://arxiv.org/abs/2204.12992v1
A Multivariate Spatial and Spatiotemporal ARCH Model
conditional heteroscedasticity (ARCH) model based on a vec-representation. The
model includes instantaneous spatial autoregressive spill-over effects in the
conditional variance, as they are usually present in spatial econometric
applications. Furthermore, spatial and temporal cross-variable effects are
explicitly modelled. We transform the model to a multivariate spatiotemporal
autoregressive model using a log-squared transformation and derive a consistent
quasi-maximum-likelihood estimator (QMLE). For finite samples and different
error distributions, the performance of the QMLE is analysed in a series of
Monte-Carlo simulations. In addition, we illustrate the practical usage of the
new model with a real-world example. We analyse the monthly real-estate price
returns for three different property types in Berlin from 2002 to 2014. We find
weak (instantaneous) spatial interactions, while the temporal autoregressive
structure in the market risks is of higher importance. Interactions between the
different property types only occur in the temporally lagged variables. Thus,
we see mainly temporal volatility clusters and weak spatial volatility
spill-overs.
arXiv link: http://arxiv.org/abs/2204.12472v1
GMM is Inadmissible Under Weak Identification
bound on identification strength, asymptotically admissible (i.e. undominated)
estimators in a wide class of estimation problems must be uniformly continuous
in the sample moment function. GMM estimators are in general discontinuous in
the sample moments, and are thus inadmissible. We show, by contrast, that
bagged, or bootstrap aggregated, GMM estimators as well as quasi-Bayes
posterior means have superior continuity properties, while results in the
literature imply that they are equivalent to GMM when identification is strong.
In simulations calibrated to published instrumental variables specifications,
we find that these alternatives often outperform GMM.
arXiv link: http://arxiv.org/abs/2204.12462v3
A One-Covariate-at-a-Time Method for Nonparametric Additive Models
approach to choose significant variables in high-dimensional nonparametric
additive regression models. Similarly to Chudik, Kapetanios and Pesaran (2018),
we consider the statistical significance of individual nonparametric additive
components one at a time and take into account the multiple testing nature of
the problem. One-stage and multiple-stage procedures are both considered. The
former works well in terms of the true positive rate only if the marginal
effects of all signals are strong enough; the latter helps to pick up hidden
signals that have weak marginal effects. Simulations demonstrate the good
finite sample performance of the proposed procedures. As an empirical
application, we use the OCMT procedure on a dataset we extracted from the
Longitudinal Survey on Rural Urban Migration in China. We find that our
procedure works well in terms of the out-of-sample forecast root mean square
errors, compared with competing methods.
arXiv link: http://arxiv.org/abs/2204.12023v3
Optimal Decision Rules when Payoffs are Partially Identified
choice problems when payoffs depend on a partially-identified parameter
$\theta$ and the decision maker can use a point-identified parameter $\mu$ to
deduce restrictions on $\theta$. Examples include treatment choice under
partial identification and pricing with rich unobserved heterogeneity. Our
notion of optimality combines a minimax approach to handle the ambiguity from
partial identification of $\theta$ given $\mu$ with an average risk
minimization approach for $\mu$. We show how to implement optimal decision
rules using the bootstrap and (quasi-)Bayesian methods in both parametric and
semiparametric settings. We provide detailed applications to treatment choice
and optimal pricing. Our asymptotic approach is well suited for realistic
empirical settings in which the derivation of finite-sample optimal rules is
intractable.
arXiv link: http://arxiv.org/abs/2204.11748v3
Identification and Statistical Decision Theory
identification and statistical components. Identification analysis, which
assumes knowledge of the probability distribution generating observable data,
places an upper bound on what may be learned about population parameters of
interest with finite sample data. Yet Wald's statistical decision theory
studies decision making with sample data without reference to identification,
indeed without reference to estimation. This paper asks if identification
analysis is useful to statistical decision theory. The answer is positive, as
it can yield an informative and tractable upper bound on the achievable finite
sample performance of decision criteria. The reasoning is simple when the
decision relevant parameter is point identified. It is more delicate when the
true state is partially identified and a decision must be made under ambiguity.
Then the performance of some criteria, such as minimax regret, is enhanced by
randomizing choice of an action. This may be accomplished by making choice a
function of sample data. I find it useful to recast choice of a statistical
decision function as selection of choice probabilities for the elements of the
choice set. Using sample data to randomize choice conceptually differs from and
is complementary to its traditional use to estimate population parameters.
arXiv link: http://arxiv.org/abs/2204.11318v2
Local Gaussian process extrapolation for BART models with applications to causal inference
model offering state-of-the-art performance on out-of-sample prediction.
Despite this success, standard implementations of BART typically provide
inaccurate prediction and overly narrow prediction intervals at points outside
the range of the training data. This paper proposes a novel extrapolation
strategy that grafts Gaussian processes to the leaf nodes in BART for
predicting points outside the range of the observed data. The new method is
compared to standard BART implementations and recent frequentist
resampling-based methods for predictive inference. We apply the new approach to
a challenging problem from causal inference, wherein for some regions of
predictor space, only treated or untreated units are observed (but not both).
In simulation studies, the new approach boasts superior performance compared to
popular alternatives, such as Jackknife+.
arXiv link: http://arxiv.org/abs/2204.10963v2
Adversarial Estimators
They generalize maximum-likelihood-type estimators ('M-estimators') as their
average objective is maximized by some parameters and minimized by others. This
class subsumes the continuous-updating Generalized Method of Moments,
Generative Adversarial Networks and more recent proposals in machine learning
and econometrics. In these examples, researchers state which aspects of the
problem may in principle be used for estimation, and an adversary learns how to
emphasize them optimally. We derive the convergence rates of A-estimators under
pointwise and partial identification, and the normality of functionals of their
parameters. Unknown functions may be approximated via sieves such as deep
neural networks, for which we provide simplified low-level conditions. As a
corollary, we obtain the normality of neural-net M-estimators, overcoming
technical issues previously identified by the literature. Our theory yields
novel results about a variety of A-estimators, providing intuition and formal
justification for their success in recent applications.
arXiv link: http://arxiv.org/abs/2204.10495v3
MTE with Misspecification
responding to the instrument when selecting into treatment. We show that, in
general, the presence of non-responders biases the Marginal Treatment Effect
(MTE) curve and many of its functionals. Yet, we show that, when the propensity
score is fully supported on the unit interval, it is still possible to restore
identification of the MTE curve and its functionals with an appropriate
re-weighting.
arXiv link: http://arxiv.org/abs/2204.10445v1
Boundary Adaptive Local Polynomial Conditional Density Estimators
local polynomial techniques. The estimators are boundary adaptive and easy to
implement. We then study the (pointwise and) uniform statistical properties of
the estimators, offering characterizations of both probability concentration
and distributional approximation. In particular, we establish uniform
convergence rates in probability and valid Gaussian distributional
approximations for the Studentized t-statistic process. We also discuss
implementation issues such as consistent estimation of the covariance function
for the Gaussian approximation, optimal integrated mean squared error bandwidth
selection, and valid robust bias-corrected inference. We illustrate the
applicability of our results by constructing valid confidence bands and
hypothesis tests for both parametric specification and shape constraints,
explicitly characterizing their approximation errors. A companion R software
package implementing our main results is provided.
arXiv link: http://arxiv.org/abs/2204.10359v4
Do t-Statistic Hurdles Need to be Raised?
false discoveries in academic publications. I show these calls may be difficult
to justify empirically. Published data exhibit bias: results that fail to meet
existing hurdles are often unobserved. These unobserved results must be
extrapolated, which can lead to weak identification of revised hurdles. In
contrast, statistics that can target only published findings (e.g. empirical
Bayes shrinkage and the FDR) can be strongly identified, as data on published
findings is plentiful. I demonstrate these results theoretically and in an
empirical analysis of the cross-sectional return predictability literature.
arXiv link: http://arxiv.org/abs/2204.10275v4
From point forecasts to multivariate probabilistic forecasts: The Schaake shuffle for day-ahead electricity price forecasting
markets. Besides the risk of a single price, the dependence structure of
multiple prices is often relevant. We therefore propose a generic and
easy-to-implement method for creating multivariate probabilistic forecasts
based on univariate point forecasts of day-ahead electricity prices. While each
univariate point forecast refers to one of the day's 24 hours, the multivariate
forecast distribution models dependencies across hours. The proposed method is
based on simple copula techniques and an optional time series component. We
illustrate the method for five benchmark data sets recently provided by Lago et
al. (2020). Furthermore, we demonstrate an example for constructing realistic
prediction intervals for the weighted sum of consecutive electricity prices,
as, e.g., needed for pricing individual load profiles.
arXiv link: http://arxiv.org/abs/2204.10154v1
Optimal reconciliation with immutable forecasts
has inspired many studies on forecast reconciliation. Under this approach,
so-called base forecasts are produced for every series in the hierarchy and are
subsequently adjusted to be coherent in a second reconciliation step.
Reconciliation methods have been shown to improve forecast accuracy, but will,
in general, adjust the base forecast of every series. However, in an
operational context, it is sometimes necessary or beneficial to keep forecasts
of some variables unchanged after forecast reconciliation. In this paper, we
formulate reconciliation methodology that keeps forecasts of a pre-specified
subset of variables unchanged or "immutable". In contrast to existing
approaches, these immutable forecasts need not all come from the same level of
a hierarchy, and our method can also be applied to grouped hierarchies. We
prove that our approach preserves unbiasedness in base forecasts. Our method
can also account for correlations between base forecasting errors and ensure
non-negativity of forecasts. We also perform empirical experiments, including
an application to sales of a large scale online retailer, to assess the impacts
of our proposed methodology.
arXiv link: http://arxiv.org/abs/2204.09231v1
The 2020 Census Disclosure Avoidance System TopDown Algorithm
differential privacy for privacy-loss accounting. The algorithm ingests the
final, edited version of the 2020 Census data and the final tabulation
geographic definitions. The algorithm then creates noisy versions of key
queries on the data, referred to as measurements, using zero-Concentrated
Differential Privacy. Another key aspect of the TDA are invariants, statistics
that the Census Bureau has determined, as matter of policy, to exclude from the
privacy-loss accounting. The TDA post-processes the measurements together with
the invariants to produce a Microdata Detail File (MDF) that contains one
record for each person and one record for each housing unit enumerated in the
2020 Census. The MDF is passed to the 2020 Census tabulation system to produce
the 2020 Census Redistricting Data (P.L. 94-171) Summary File. This paper
describes the mathematics and testing of the TDA for this purpose.
arXiv link: http://arxiv.org/abs/2204.08986v1
Inference for Cluster Randomized Experiments with Non-ignorable Cluster Sizes
experiments when cluster sizes are non-ignorable. Here, by a cluster randomized
experiment, we mean one in which treatment is assigned at the cluster level. By
non-ignorable cluster sizes, we refer to the possibility that the treatment
effects may depend non-trivially on the cluster sizes. We frame our analysis in
a super-population framework in which cluster sizes are random. In this way,
our analysis departs from earlier analyses of cluster randomized experiments in
which cluster sizes are treated as non-random. We distinguish between two
different parameters of interest: the equally-weighted cluster-level average
treatment effect, and the size-weighted cluster-level average treatment effect.
For each parameter, we provide methods for inference in an asymptotic framework
where the number of clusters tends to infinity and treatment is assigned using
a covariate-adaptive stratified randomization procedure. We additionally permit
the experimenter to sample only a subset of the units within each cluster
rather than the entire cluster and demonstrate the implications of such
sampling for some commonly used estimators. A small simulation study and
empirical demonstration show the practical relevance of our theoretical
results.
arXiv link: http://arxiv.org/abs/2204.08356v7
Feature-based intermittent demand forecast combinations: bias, accuracy and inventory implications
production systems and supply chain management. In recent years, there has been
a growing focus on developing forecasting approaches for intermittent demand
from academic and practical perspectives. However, limited attention has been
given to forecast combination methods, which have achieved competitive
performance in forecasting fast-moving time series. The current study aims to
examine the empirical outcomes of some existing forecast combination methods
and propose a generalized feature-based framework for intermittent demand
forecasting. The proposed framework has been shown to improve the accuracy of
point and quantile forecasts based on two real data sets. Further, some
analysis of features, forecasting pools and computational efficiency is also
provided. The findings indicate the intelligibility and flexibility of the
proposed approach in intermittent demand forecasting and offer insights
regarding inventory decisions.
arXiv link: http://arxiv.org/abs/2204.08283v2
Nonlinear and Nonseparable Structural Functions in Fuzzy Regression Discontinuity Designs
continuous treatment variable, but the theoretical aspects of such models are
less studied. This study examines the identification and estimation of the
structural function in fuzzy RD designs with a continuous treatment variable.
The structural function fully describes the causal impact of the treatment on
the outcome. We show that the nonlinear and nonseparable structural function
can be nonparametrically identified at the RD cutoff under shape restrictions,
including monotonicity and smoothness conditions. Based on the nonparametric
identification equation, we propose a three-step semiparametric estimation
procedure and establish the asymptotic normality of the estimator. The
semiparametric estimator achieves the same convergence rate as in the case of a
binary treatment variable. As an application of the method, we estimate the
causal effect of sleep time on health status by using the discontinuity in
natural light timing at time zone boundaries.
arXiv link: http://arxiv.org/abs/2204.08168v2
Abadie's Kappa and Weighting Estimators of the Local Average Treatment Effect
covariates in instrumental variables estimation. In this paper we study the
finite sample and asymptotic properties of various weighting estimators of the
local average treatment effect (LATE), motivated by Abadie's (2003) kappa
theorem and offering the requisite flexibility relative to standard practice.
We argue that two of the estimators under consideration, which are weight
normalized, are generally preferable. Several other estimators, which are
unnormalized, do not satisfy the properties of scale invariance with respect to
the natural logarithm and translation invariance, thereby exhibiting
sensitivity to the units of measurement when estimating the LATE in logs and
the centering of the outcome variable more generally. We also demonstrate that,
when noncompliance is one sided, certain weighting estimators have the
advantage of being based on a denominator that is strictly greater than zero by
construction. This is the case for only one of the two normalized estimators,
and we recommend this estimator for wider use. We illustrate our findings with
a simulation study and three empirical applications, which clearly document the
sensitivity of unnormalized estimators to how the outcome variable is coded. We
implement the proposed estimators in the Stata package kappalate.
arXiv link: http://arxiv.org/abs/2204.07672v4
Option Pricing with Time-Varying Volatility Risk Aversion
explain observed time variations in the shape of the pricing kernel. When
combined with the Heston-Nandi GARCH model, this framework yields a tractable
option pricing model in which the variance risk ratio (VRR) emerges as a key
variable. We show that the VRR is closely linked to economic fundamentals, as
well as sentiment and uncertainty measures. A novel approximation method
provides analytical option pricing formulas, and we demonstrate substantial
reductions in pricing errors through an empirical application to the S&P 500
index, the CBOE VIX, and option prices.
arXiv link: http://arxiv.org/abs/2204.06943v4
Nonparametric Identification of Differentiated Products Demand Using Micro Data
"micro data" linking individual consumers' characteristics and choices. Our
model nests standard specifications featuring rich observed and unobserved
consumer heterogeneity as well as product/market-level unobservables that
introduce the problem of econometric endogeneity. Previous work establishes
identification of such models using market-level data and instruments for all
prices and quantities. Micro data provides a panel structure that facilitates
richer demand specifications and reduces requirements on both the number and
types of instrumental variables. We address identification of demand in the
standard case in which non-price product characteristics are assumed exogenous,
but also cover identification of demand elasticities and other key features
when product characteristics are endogenous. We discuss implications of these
results for applied work.
arXiv link: http://arxiv.org/abs/2204.06637v2
Integrating Distributed Energy Resources: Optimal Prosumer Decisions and Impacts of Net Metering Tariffs
to initiatives to reform the net energy metering (NEM) policies to address
pressing concerns of rising electricity bills, fairness of cost allocation, and
the long-term growth of distributed energy resources. This article presents an
analytical framework for the optimal prosumer consumption decision using an
inclusive NEM X tariff model that covers existing and proposed NEM tariff
designs. The structure of the optimal consumption policy lends itself to near
closed-form optimal solutions suitable for practical energy management systems
that are responsive to stochastic BTM generation and dynamic pricing. The short
and long-run performance of NEM and feed-in tariffs (FiT) are considered under
a sequential rate-setting decision process. Also presented are numerical
results that characterize social welfare distributions, cross-subsidies, and
long-run solar adoption performance for selected NEM and FiT policy designs.
arXiv link: http://arxiv.org/abs/2204.06115v3
Retrieval from Mixed Sampling Frequency: Generic Identifiability in the Unit Root VAR
developed in Anderson et al. (2016a) is concerned with retrieving an underlying
high frequency model from mixed frequency observations. In this paper we
investigate parameter-identifiability in the Johansen (1995) vector error
correction model for mixed frequency data. We prove that from the second
moments of the blocked process after taking differences at lag N (N is the slow
sampling rate), the parameters of the high frequency system are generically
identified. We treat the stock and the flow case as well as deterministic
terms.
arXiv link: http://arxiv.org/abs/2204.05952v2
Coarse Personalization
personalize and target individuals at a granular level. However, feasibility
constraints limit full personalization. In practice, firms choose segments of
individuals and assign a treatment to each segment to maximize profits: We call
this the coarse personalization problem. We propose a two-step solution that
simultaneously makes segmentation and targeting decisions. First, the firm
personalizes by estimating conditional average treatment effects. Second, the
firm discretizes using treatment effects to choose which treatments to offer
and their segments. We show that a combination of available machine learning
tools for estimating heterogeneous treatment effects and a novel application of
optimal transport methods provides a viable and efficient solution. With data
from a large-scale field experiment in promotions management, we find our
methodology outperforms extant approaches that segment on consumer
characteristics, consumer preferences, or those that only search over a
prespecified grid. Using our procedure, the firm recoups over $99.5%$ of its
expected incremental profits under full personalization while offering only
five segments. We conclude by discussing how coarse personalization arises in
other domains.
arXiv link: http://arxiv.org/abs/2204.05793v4
Portfolio Optimization Using a Consistent Vector-Based MSE Estimation Approach
portfolio's (GMVP) weights in high-dimensional settings where both observation
and population dimensions grow at a bounded ratio. Optimizing the GMVP weights
is highly influenced by the data covariance matrix estimation. In a
high-dimensional setting, it is well known that the sample covariance matrix is
not a proper estimator of the true covariance matrix since it is not invertible
when we have fewer observations than the data dimension. Even with more
observations, the sample covariance matrix may not be well-conditioned. This
paper determines the GMVP weights based on a regularized covariance matrix
estimator to overcome the aforementioned difficulties. Unlike other methods,
the proper selection of the regularization parameter is achieved by minimizing
the mean-squared error of an estimate of the noise vector that accounts for the
uncertainty in the data mean estimation. Using random-matrix-theory tools, we
derive a consistent estimator of the achievable mean-squared error that allows
us to find the optimal regularization parameter using a simple line search.
Simulation results demonstrate the effectiveness of the proposed method when
the data dimension is larger than the number of data samples or of the same
order.
arXiv link: http://arxiv.org/abs/2204.05611v1
Neyman allocation is minimax optimal for best arm identification with two arms
asymptotic minimax regret criterion, for best arm identification when there are
only two treatments. It is shown that the optimal sampling rule is the Neyman
allocation, which allocates a constant fraction of units to each treatment in a
manner that is proportional to the standard deviation of the treatment
outcomes. When the variances are equal, the optimal ratio is one-half. This
policy is independent of the data, so there is no adaptation to previous
outcomes. At the end of the experiment, the policy maker adopts the treatment
with higher average outcomes.
arXiv link: http://arxiv.org/abs/2204.05527v7
Tuning Parameter-Free Nonparametric Density Estimation from Tabulated Summary Data
the original format due to confidentiality concerns. Motivated by this
practical feature, we propose a novel nonparametric density estimation method
from tabulated summary data based on maximum entropy and prove its strong
uniform consistency. Unlike existing kernel-based estimators, our estimator is
free from tuning parameters and admits a closed-form density that is convenient
for post-estimation analysis. We apply the proposed method to the tabulated
summary data of the U.S. tax returns to estimate the income distribution.
arXiv link: http://arxiv.org/abs/2204.05480v3
Two-step estimation in linear regressions with adaptive learning
estimator in a linear regression with adaptive learning is derived when the
crucial, so-called, `gain' parameter is estimated in a first step by nonlinear
least squares from an auxiliary model. The singular limiting distribution of
the two-step estimator is normal and in general affected by the sampling
uncertainty from the first step. However, this `generated-regressor' issue
disappears for certain parameter combinations.
arXiv link: http://arxiv.org/abs/2204.05298v3
Partially Linear Models under Data Combination
covariates are observed in two different datasets that cannot be linked. This
type of data combination problem arises very frequently in empirical
microeconomics. Using recent tools from optimal transport theory, we derive a
constructive characterization of the sharp identified set. We then build on
this result and develop a novel inference method that exploits the specific
geometric properties of the identified set. Our method exhibits good
performances in finite samples, while remaining very tractable. We apply our
approach to study intergenerational income mobility over the period 1850-1930
in the United States. Our method allows us to relax the exclusion restrictions
used in earlier work, while delivering confidence regions that are informative.
arXiv link: http://arxiv.org/abs/2204.05175v3
Bootstrap Cointegration Tests in ARDL Models
bound tests in a conditional equilibrium correction model with the aim to
overcome some typical drawbacks of the latter, such as inconclusive inference
and distortion in size. The bootstrap tests are worked out under several data
generating processes, including degenerate cases. Monte Carlo simulations
confirm the better performance of the bootstrap tests with respect to bound
ones and to the asymptotic F test on the independent variables of the ARDL
model. It is also proved that any inference carried out in misspecified models,
such as unconditional ARDLs, may be misleading. Empirical applications
highlight the importance of employing the appropriate specification and provide
definitive answers to the inconclusive inference of the bound tests when
exploring the long-term equilibrium relationship between economic variables.
arXiv link: http://arxiv.org/abs/2204.04939v1
State capital involvement, managerial sentiment and firm innovation performance Evidence from China
embedded in the restructuring and governance of private enterprises through
equity participation, providing a more advantageous environment for private
enterprises in financing and innovation. However, there is a lack of knowledge
about the underlying mechanisms of SOE intervention on corporate innovation
performance. Hence, in this study, we investigated the association of state
capital intervention with innovation performance, meanwhile further
investigated the potential mediating and moderating role of managerial
sentiment and financing constraints, respectively, using all listed non-ST
firms from 2010 to 2020 as the sample. The results revealed two main findings:
1) state capital intervention would increase innovation performance through
managerial sentiment; 2) financing constraints would moderate the effect of
state capital intervention on firms' innovation performance.
arXiv link: http://arxiv.org/abs/2204.04860v1
Revenue Management Under the Markov Chain Choice Model with Joint Price and Assortment Decisions
problems in revenue management. Usually, a seller needs to jointly determine
the prices and assortment while managing a network of resources with limited
capacity. However, there is not yet a tractable method to efficiently solve
such a problem. Existing papers studying static joint optimization of price and
assortment cannot incorporate resource constraints. Then we study the revenue
management problem with resource constraints and price bounds, where the prices
and the product assortments need to be jointly determined over time. We showed
that under the Markov chain (MC) choice model (which subsumes the multinomial
logit (MNL) model), we could reformulate the choice-based joint optimization
problem as a tractable convex conic optimization problem. We also proved that
an optimal solution with a constant price vector exists even with constraints
on resources. In addition, a solution with both constant assortment and price
vector can be optimal when there is no resource constraint.
arXiv link: http://arxiv.org/abs/2204.04774v1
Super-linear Scaling Behavior for Electric Vehicle Chargers and Road Map to Addressing the Infrastructure Gap
build-out of charging infrastructure in the coming decade. We formulate the
charging infrastructure needs as a scaling analysis problem and use it to
estimate the EV infrastructure needs of the US at a county-level resolution.
Surprisingly, we find that the current EV infrastructure deployment scales
super-linearly with population, deviating from the sub-linear scaling of
gasoline stations and other infrastructure. We discuss how this demonstrates
the infancy of EV station abundance compared to other mature transportation
infrastructures. By considering the power delivery of existing gasoline
stations, and appropriate EV efficiencies, we estimate the EV infrastructure
gap at the county level, providing a road map for future EV infrastructure
expansion. Our reliance on scaling analysis allows us to make a unique forecast
in this domain.
arXiv link: http://arxiv.org/abs/2204.03094v1
Risk budget portfolios with convex Non-negative Matrix Factorization
convex Nonnegative Matrix Factorization (NMF). Unlike classical factor
analysis, PCA, or ICA, NMF ensures positive factor loadings to obtain
interpretable long-only portfolios. As the NMF factors represent separate
sources of risk, they have a quasi-diagonal correlation matrix, promoting
diversified portfolio allocations. We evaluate our method in the context of
volatility targeting on two long-only global portfolios of cryptocurrencies and
traditional assets. Our method outperforms classical portfolio allocations
regarding diversification and presents a better risk profile than hierarchical
risk parity (HRP). We assess the robustness of our findings using Monte Carlo
simulation.
arXiv link: http://arxiv.org/abs/2204.02757v2
Finitely Heterogeneous Treatment Effect in Event-study
average evolution of untreated potential outcomes is the same across different
treatment cohorts: a parallel trends assumption. In this paper, we relax the
parallel trend assumption by assuming a latent type variable and developing a
type-specific parallel trend assumption. With a finite support assumption on
the latent type variable and long pretreatment time periods, we show that an
extremum classifier consistently estimates the type assignment. Based on the
classification result, we propose a type-specific diff-in-diff estimator for
type-specific ATT. By estimating the type-specific ATT, we study heterogeneity
in treatment effect, in addition to heterogeneity in baseline outcomes.
arXiv link: http://arxiv.org/abs/2204.02346v5
Asymptotic Theory for Unit Root Moderate Deviations in Quantile Autoregressions and Predictive Regressions
parameter is specified with respect to moderate deviations from the unit
boundary of the form (1 + c / k) with a convergence sequence that diverges at a
rate slower than the sample size n. Then, extending the framework proposed by
Phillips and Magdalinos (2007), we consider the limit theory for the
near-stationary and the near-explosive cases when the model is estimated with a
conditional quantile specification function and model parameters are
quantile-dependent. Additionally, a Bahadur-type representation and limiting
distributions based on the M-estimators of the model parameters are derived.
Specifically, we show that the serial correlation coefficient converges in
distribution to a ratio of two independent random variables. Monte Carlo
simulations illustrate the finite-sample performance of the estimation
procedure under investigation.
arXiv link: http://arxiv.org/abs/2204.02073v2
Microtransit adoption in the wake of the COVID-19 pandemic: evidence from a choice experiment with transit and car commuters
mobility systems. Impacts are still debated, as these platforms supply
personalized and optimized services, while also contributing to existing
sustainability challenges. Recently, microtransit services have emerged,
promising to combine advantages of pooled on-demand rides with more sustainable
fixed-route public transit services. Understanding traveler behavior becomes a
primary focus to analyze adoption likelihood and perceptions of different
microtransit attributes. The COVID-19 pandemic context adds an additional layer
of complexity to analyzing mobility innovation acceptance. This study
investigates the potential demand for microtransit options against the
background of the pandemic. We use a stated choice experiment to study the
decision-making of Israeli public transit and car commuters when offered to use
novel microtransit options (sedan vs. passenger van). We investigate the
tradeoffs related to traditional fare and travel time attributes, along with
microtransit features; namely walking time to pickup location, vehicle sharing,
waiting time, minimum advanced reservation time, and shelter at designated
boarding locations. Additionally, we analyze two latent constructs: attitudes
towards sharing, as well as experiences and risk-perceptions related to the
COVID-19 pandemic. We develop Integrated Choice and Latent Variable models to
compare the two commuter groups in terms of the likelihood to switch to
microtransit, attribute trade-offs, sharing preferences and pandemic impacts.
The results reveal high elasticities of several time and COVID effects for car
commuters compared to relative insensitivity of transit commuters to the risk
of COVID contraction. Moreover, for car commuters, those with strong sharing
identities were more likely to be comfortable in COVID risk situations, and to
accept microtransit.
arXiv link: http://arxiv.org/abs/2204.01974v1
Policy Learning with Competing Agents
capacity constraint on the number of agents that they can treat. When agents
can respond strategically to such policies, competition arises, complicating
estimation of the optimal policy. In this paper, we study capacity-constrained
treatment assignment in the presence of such interference. We consider a
dynamic model where the decision maker allocates treatments at each time step
and heterogeneous agents myopically best respond to the previous treatment
assignment policy. When the number of agents is large but finite, we show that
the threshold for receiving treatment under a given policy converges to the
policy's mean-field equilibrium threshold. Based on this result, we develop a
consistent estimator for the policy gradient. In a semi-synthetic experiment
with data from the National Education Longitudinal Study of 1988, we
demonstrate that this estimator can be used for learning capacity-constrained
policies in the presence of strategic behavior.
arXiv link: http://arxiv.org/abs/2204.01884v5
Kernel-weighted specification testing under general distributions
settings including non-stationary regression, inference on propensity score and
panel data models. We develop the limit theory for a kernel-based specification
test of a parametric conditional mean when the law of the regressors may not be
absolutely continuous to the Lebesgue measure and is contaminated with singular
components. This result is of independent interest and may be useful in other
applications that utilize kernel smoothed U-statistics. Simulations illustrate
the non-trivial impact of the distribution of the conditioning variables on the
power properties of the test statistic.
arXiv link: http://arxiv.org/abs/2204.01683v3
A Bootstrap-Assisted Self-Normalization Approach to Inference in Cointegrating Regressions
choices to estimate a long-run variance parameter. Even in case these choices
are "optimal", the tests are severely size distorted. We propose a novel
self-normalization approach, which leads to a nuisance parameter free limiting
distribution without estimating the long-run variance parameter directly. This
makes our self-normalized test tuning parameter free and considerably less
prone to size distortions at the cost of only small power losses. In
combination with an asymptotically justified vector autoregressive sieve
bootstrap to construct critical values, the self-normalization approach shows
further improvement in small to medium samples when the level of error serial
correlation or regressor endogeneity is large. We illustrate the usefulness of
the bootstrap-assisted self-normalized test in empirical applications by
analyzing the validity of the Fisher effect in Germany and the United States.
arXiv link: http://arxiv.org/abs/2204.01373v1
Capturing positive network attributes during the estimation of recursive logit models: A prism-based approach
to many applications and extensions, an important numerical issue with respect
to the computation of value functions remains unsolved. This issue is
particularly significant for model estimation, during which the parameters are
updated every iteration and may violate the feasibility condition of the value
function. To solve this numerical issue of the value function in the model
estimation, this study performs an extensive analysis of a prism-constrained RL
(Prism-RL) model proposed by Oyama and Hato (2019), which has a path set
constrained by the prism defined based upon a state-extended network
representation. The numerical experiments have shown two important properties
of the Prism-RL model for parameter estimation. First, the prism-based approach
enables estimation regardless of the initial and true parameter values, even in
cases where the original RL model cannot be estimated due to the numerical
problem. We also successfully captured a positive effect of the presence of
street green on pedestrian route choice in a real application. Second, the
Prism-RL model achieved better fit and prediction performance than the RL
model, by implicitly restricting paths with large detour or many loops.
Defining the prism-based path set in a data-oriented manner, we demonstrated
the possibility of the Prism-RL model describing more realistic route choice
behavior. The capture of positive network attributes while retaining the
diversity of path alternatives is important in many applications such as
pedestrian route choice and sequential destination choice behavior, and thus
the prism-based approach significantly extends the practical applicability of
the RL model.
arXiv link: http://arxiv.org/abs/2204.01215v3
Robust Estimation of Conditional Factor Models
factor models. We first introduce a simple sieve estimation, and establish
asymptotic properties of the estimators under large $N$. We then provide a
bootstrap procedure for estimating the distributions of the estimators. We also
provide two consistent estimators for the number of factors. The methods allow
us not only to estimate conditional factor structures of distributions of asset
returns utilizing characteristics, but also to conduct robust inference in
conditional factor models, which enables us to analyze the cross section of
asset returns with heavy tails. We apply the methods to analyze the cross
section of individual US stock returns.
arXiv link: http://arxiv.org/abs/2204.00801v2
Decomposition of Differences in Distribution under Sample Selection and the Gender Wage Gap
outcomes of two groups when individuals self-select themselves into
participation. I differentiate between the decomposition for participants and
the entire population, highlighting how the primitive components of the model
affect each of the distributions of outcomes. Additionally, I introduce two
ancillary decompositions that help uncover the sources of differences in the
distribution of unobservables and participation between the two groups. The
estimation is done using existing quantile regression methods, for which I show
how to perform uniformly valid inference. I illustrate these methods by
revisiting the gender wage gap, finding that changes in female participation
and self-selection have been the main drivers for reducing the gap.
arXiv link: http://arxiv.org/abs/2204.00551v2
Finite Sample Inference in Incomplete Models
exact coverage of the true parameter in finite samples. Our confidence region
inverts a test, which generalizes Monte Carlo tests to incomplete models. The
test statistic is a discrete analogue of a new optimal transport
characterization of the sharp identified region. Both test statistic and
critical values rely on simulation drawn from the distribution of latent
variables and are computed using solutions to discrete optimal transport, hence
linear programming problems. We also propose a fast preliminary search in the
parameter space with an alternative, more conservative yet consistent test,
based on a parameter free critical value.
arXiv link: http://arxiv.org/abs/2204.00473v3
Estimating Separable Matching Models
with transferable and separable utility introduced in Galichon and Salani\'e
(2022). The first method is a minimum distance estimator that relies on the
generalized entropy of matching. The second relies on a reformulation of the
more special but popular Choo and Siow (2006) model; it uses generalized linear
models (GLMs) with two-way fixed effects.
arXiv link: http://arxiv.org/abs/2204.00362v1
Measuring Diagnostic Test Performance Using Imperfect Reference Tests: A Partial Identification Approach
performance use knowledge of the true health status, measured with a reference
diagnostic test. Researchers commonly assume that the reference test is
perfect, which is often not the case in practice. When the assumption fails,
conventional studies identify "apparent" performance or performance with
respect to the reference, but not true performance. This paper provides the
smallest possible bounds on the measures of true performance - sensitivity
(true positive rate) and specificity (true negative rate), or equivalently
false positive and negative rates, in standard settings. Implied bounds on
policy-relevant parameters are derived: 1) Prevalence in screened populations;
2) Predictive values. Methods for inference based on moment inequalities are
used to construct uniformly consistent confidence sets in level over a relevant
family of data distributions. Emergency Use Authorization (EUA) and independent
study data for the BinaxNOW COVID-19 antigen test demonstrate that the bounds
can be very informative. Analysis reveals that the estimated false negative
rates for symptomatic and asymptomatic patients are up to 3.17 and 4.59 times
higher than the frequently cited "apparent" false negative rate. Further
applicability of the results in the context of imperfect proxies such as survey
responses and imputed protected classes is indicated.
arXiv link: http://arxiv.org/abs/2204.00180v4
Testing the identification of causal effects in observational data
identification of the causal effect of a treatment on an outcome in
observational data, which relies on two sets of variables: observed covariates
to be controlled for and a suspected instrument. Under a causal structure
commonly found in empirical applications, the testable conditional independence
of the suspected instrument and the outcome given the treatment and the
covariates has two implications. First, the instrument is valid, i.e. it does
not directly affect the outcome (other than through the treatment) and is
unconfounded conditional on the covariates. Second, the treatment is
unconfounded conditional on the covariates such that the treatment effect is
identified. We suggest tests of this conditional independence based on machine
learning methods that account for covariates in a data-driven way and
investigate their asymptotic behavior and finite sample performance in a
simulation study. We also apply our testing approach to evaluating the impact
of fertility on female labor supply when using the sibling sex ratio of the
first two children as supposed instrument, which by and large points to a
violation of our testable implication for the moderate set of socio-economic
covariates considered.
arXiv link: http://arxiv.org/abs/2203.15890v4
Difference-in-Differences for Policy Evaluation
in empirical work in economics. This chapter reviews a number of important,
recent developments related to difference-in-differences. First, this chapter
reviews recent work pointing out limitations of two way fixed effects
regressions (these are panel data regressions that have been the dominant
approach to implementing difference-in-differences identification strategies)
that arise in empirically relevant settings where there are more than two time
periods, variation in treatment timing across units, and treatment effect
heterogeneity. Second, this chapter reviews recently proposed alternative
approaches that are able to circumvent these issues without being substantially
more complicated to implement. Third, this chapter covers a number of
extensions to these results, paying particular attention to (i) parallel trends
assumptions that hold only after conditioning on observed covariates and (ii)
strategies to partially identify causal effect parameters in
difference-in-differences applications in cases where the parallel trends
assumption may be violated.
arXiv link: http://arxiv.org/abs/2203.15646v1
Estimating Nonlinear Network Data Models with Fixed Effects
agent-specific fixed effects, including the dyadic link formation model with
homophily and degree heterogeneity. The proposed approach uses a jackknife
procedure to deal with the incidental parameters problem. The method can be
applied to both directed and undirected networks, allows for non-binary outcome
variables, and can be used to bias correct estimates of average effects and
counterfactual outcomes. I also show how the jackknife can be used to bias
correct fixed-effect averages over functions that depend on multiple nodes,
e.g. triads or tetrads in the network. As an example, I implement specification
tests for dependence across dyads, such as reciprocity or transitivity.
Finally, I demonstrate the usefulness of the estimator in an application to a
gravity model for import/export relationships across countries.
arXiv link: http://arxiv.org/abs/2203.15603v3
Network structure and fragmentation of the Argentinean interbank markets
interbank market. Both the unsecured (CALL) and the secured (REPO) markets are
examined, applying complex network analysis. Results indicate that, although
the secured market has less participants, its nodes are more densely connected
than in the unsecured market. The interrelationships in the unsecured market
are less stable, making its structure more volatile and vulnerable to negative
shocks. The analysis identifies two 'hidden' underlying sub-networks within the
REPO market: one based on the transactions collateralized by Treasury bonds
(REPO-T) and other based on the operations collateralized by Central Bank (CB)
securities (REPO-CB). The changes in monetary policy stance and monetary
conditions seem to have a substantially smaller impact in the former than in
the latter 'sub-market'. The connectivity levels within the REPO-T market and
its structure remain relatively unaffected by the (in some period pronounced)
swings in the other segment of the market. Hence, the REPO market shows signs
of fragmentation in its inner structure, according to the type of collateral
asset involved in the transactions, so the average REPO interest rate reflects
the interplay between these two partially fragmented sub-markets. This mixed
structure of the REPO market entails one of the main sources of differentiation
with respect to the CALL market.
arXiv link: http://arxiv.org/abs/2203.14488v1
Automatic Debiased Machine Learning for Dynamic Treatment Effects and General Nested Functionals
treatment regime and more generally to nested functionals. We show that the
multiply robust formula for the dynamic treatment regime with discrete
treatments can be re-stated in terms of a recursive Riesz representer
characterization of nested mean regressions. We then apply a recursive Riesz
representer estimation learning algorithm that estimates de-biasing corrections
without the need to characterize how the correction terms look like, such as
for instance, products of inverse probability weighting terms, as is done in
prior work on doubly robust estimation in the dynamic regime. Our approach
defines a sequence of loss minimization problems, whose minimizers are the
mulitpliers of the de-biasing correction, hence circumventing the need for
solving auxiliary propensity models and directly optimizing for the mean
squared error of the target de-biasing correction. We provide further
applications of our approach to estimation of dynamic discrete choice models
and estimation of long-term effects with surrogates.
arXiv link: http://arxiv.org/abs/2203.13887v5
The application of techniques derived from artificial intelligence to the prediction of the solvency of bank customers: case of the application of the cart type decision tree (dt)
from artificial intelligence technique to the prediction of the solvency of
bank customers, for this we used historical data of bank customers. However we
have adopted the process of Data Mining techniques, for this purpose we started
with a data preprocessing in which we clean the data and we deleted all rows
with outliers or missing values as well as rows with empty columns, then we
fixed the variable to be explained (dependent or Target) and we also thought to
eliminate all explanatory (independent) variables that are not significant
using univariate analysis as well as the correlation matrix, then we applied
our CART decision tree method using the SPSS tool. After completing our process
of building our model (AD-CART), we started the process of evaluating and
testing the performance of our model, by which we found that the accuracy and
precision of our model is 71%, so we calculated the error ratios, and we found
that the error rate equal to 29%, this allowed us to conclude that our model at
a fairly good level in terms of precision, predictability and very precisely in
predicting the solvency of our banking customers.
arXiv link: http://arxiv.org/abs/2203.13001v1
Correcting Attrition Bias using Changes-in-Changes
in treatment effect studies. We extend the changes-in-changes approach to
identify the average treatment effect for respondents and the entire study
population in the presence of attrition. Our method, which exploits baseline
outcome data, can be applied to randomized experiments as well as
quasi-experimental difference-in-difference designs. A formal comparison
highlights that while widely used corrections typically impose restrictions on
whether or how response depends on treatment, our proposed attrition correction
exploits restrictions on the outcome model. We further show that the conditions
required for our correction can accommodate a broad class of response models
that depend on treatment in an arbitrary way. We illustrate the implementation
of the proposed corrections in an application to a large-scale randomized
experiment.
arXiv link: http://arxiv.org/abs/2203.12740v5
Bounds for Bias-Adjusted Treatment Effect in Linear Econometric Models
omitted variable bias in estimated treatment effects are real roots of a cubic
equation involving estimated parameters from a short and intermediate
regression. The roots of the cubic are functions of $\delta$, the degree of
selection on unobservables, and $R_{max}$, the R-squared in a hypothetical long
regression that includes the unobservable confounder and all observable
controls. In this paper I propose and implement a novel algorithm to compute
roots of the cubic equation over relevant regions of the $\delta$-$R_{max}$
plane and use the roots to construct bounding sets for the true treatment
effect. The algorithm is based on two well-known mathematical results: (a) the
discriminant of the cubic equation can be used to demarcate regions of unique
real roots from regions of three real roots, and (b) a small change in the
coefficients of a polynomial equation will lead to small change in its roots
because the latter are continuous functions of the former. I illustrate my
method by applying it to the analysis of maternal behavior on child outcomes.
arXiv link: http://arxiv.org/abs/2203.12431v1
Exabel's Factor Model
risks associated with an investing strategy. In this report we describe
Exabel's factor model, we quantify the fraction of the variability of the
returns explained by the different factors, and we show some examples of annual
returns of portfolios with different factor exposure.
arXiv link: http://arxiv.org/abs/2203.12408v1
Performance evaluation of volatility estimation methods for Exabel
management. This note presents and compares estimation strategies for
volatility estimation in an estimation universe consisting on 28 629 unique
companies from February 2010 to April 2021, with 858 different portfolios. The
estimation methods are compared in terms of how they rank the volatility of the
different subsets of portfolios. The overall best performing approach estimates
volatility from direct entity returns using a GARCH model for variance
estimation.
arXiv link: http://arxiv.org/abs/2203.12402v1
Bivariate Distribution Regression with Application to Insurance Data
properties given a set of covariates, provides the mathematical foundation in
practical operations management such as risk analysis and decision-making given
observed circumstances. This article presents an estimation method for modeling
the conditional joint distribution of bivariate outcomes based on the
distribution regression and factorization methods. This method is considered
semiparametric in that it allows for flexible modeling of both the marginal and
joint distributions conditional on covariates without imposing global
parametric assumptions across the entire distribution. In contrast to existing
parametric approaches, our method can accommodate discrete, continuous, or
mixed variables, and provides a simple yet effective way to capture
distributional dependence structures between bivariate outcomes and covariates.
Various simulation results confirm that our method can perform similarly or
better in finite samples compared to the alternative methods. In an application
to the study of a motor third-party liability insurance portfolio, the proposed
method effectively estimates risk measures such as the conditional
Value-at-Risk and Expected Shortfall. This result suggests that this
semiparametric approach can serve as an alternative in insurance risk
management.
arXiv link: http://arxiv.org/abs/2203.12228v3
Performance of long short-term memory artificial neural networks in nowcasting during the COVID-19 crisis
for timely estimates of macroeconomic variables. A prior UNCTAD research paper
examined the suitability of long short-term memory artificial neural networks
(LSTM) for performing economic nowcasting of this nature. Here, the LSTM's
performance during the COVID-19 pandemic is compared and contrasted with that
of the dynamic factor model (DFM), a commonly used methodology in the field.
Three separate variables, global merchandise export values and volumes and
global services exports, were nowcast with actual data vintages and performance
evaluated for the second, third, and fourth quarters of 2020 and the first and
second quarters of 2021. In terms of both mean absolute error and root mean
square error, the LSTM obtained better performance in two-thirds of
variable/quarter combinations, as well as displayed more gradual forecast
evolutions with more consistent narratives and smaller revisions. Additionally,
a methodology to introduce interpretability to LSTMs is introduced and made
available in the accompanying nowcast_lstm Python library, which is now also
available in R, MATLAB, and Julia.
arXiv link: http://arxiv.org/abs/2203.11872v1
Dealing with Logs and Zeros in Regression Models
coefficients are interpretable as proportional effects. Yet this practice has
fundamental limitations, most notably that the log is undefined at zero,
creating an identification problem. We propose a new estimator, iterated OLS
(iOLS), which targets the normalized average treatment effect, preserving the
percentage-change interpretation while addressing these limitations. Our
procedure is the theoretically justified analogue of the ad-hoc log(1+Y)
transformation and delivers a consistent and asymptotically normal estimator of
the parameters of the exponential conditional mean model. iOLS is
computationally efficient, globally convergent, and free of the
incidental-parameter bias, while extending naturally to endogenous regressors
through iterated 2SLS. We illustrate the methods with simulations and revisit
three influential publications.
arXiv link: http://arxiv.org/abs/2203.11820v3
Predictor Selection for Synthetic Controls
characteristics (called predictors) of the treated unit. The choice of
predictors and how they are weighted plays a key role in the performance and
interpretability of synthetic control estimators. This paper proposes the use
of a sparse synthetic control procedure that penalizes the number of predictors
used in generating the counterfactual to select the most important predictors.
We derive, in a linear factor model framework, a new model selection
consistency result and show that the penalized procedure has a faster mean
squared error convergence rate. Through a simulation study, we then show that
the sparse synthetic control achieves lower bias and has better post-treatment
performance than the un-penalized synthetic control. Finally, we apply the
method to revisit the study of the passage of Proposition 99 in California in
an augmented setting with a large number of predictors available.
arXiv link: http://arxiv.org/abs/2203.11576v2
Indirect Inference for Nonlinear Panel Models with Fixed Effects
incidental parameter problem. This leads to two undesirable consequences in
applied research: (1) point estimates are subject to large biases, and (2)
confidence intervals have incorrect coverages. This paper proposes a
simulation-based method for bias reduction. The method simulates data using the
model with estimated individual effects, and finds values of parameters by
equating fixed effect estimates obtained from observed and simulated data. The
asymptotic framework provides consistency, bias correction, and asymptotic
normality results. An application and simulations to female labor force
participation illustrates the finite-sample performance of the method.
arXiv link: http://arxiv.org/abs/2203.10683v2
GAM(L)A: An econometric model for interpretable Machine Learning
boosting are often considered as black boxes or uninterpretable models which
has raised concerns from practitioners and regulators. As an alternative, we
propose in this paper to use partial linear models that are inherently
interpretable. Specifically, this article introduces GAM-lasso (GAMLA) and
GAM-autometrics (GAMA), denoted as GAM(L)A in short. GAM(L)A combines
parametric and non-parametric functions to accurately capture linearities and
non-linearities prevailing between dependent and explanatory variables, and a
variable selection procedure to control for overfitting issues. Estimation
relies on a two-step procedure building upon the double residual method. We
illustrate the predictive performance and interpretability of GAM(L)A on a
regression and a classification problem. The results show that GAM(L)A
outperforms parametric models augmented by quadratic, cubic and interaction
effects. Moreover, the results also suggest that the performance of GAM(L)A is
not significantly different from that of random forest and gradient boosting.
arXiv link: http://arxiv.org/abs/2203.11691v1
Selection and parallel trends
(DiD) designs. We derive necessary and sufficient conditions for parallel
trends assumptions under general classes of selection mechanisms. These
conditions characterize the empirical content of parallel trends. We use the
necessary conditions to provide a selection-based decomposition of the bias of
DiD and provide easy-to-implement strategies for benchmarking its components.
We also provide templates for justifying DiD in applications with and without
covariates. A reanalysis of the causal effect of NSW training programs
demonstrates the usefulness of our selection-based approach to benchmarking the
bias of DiD.
arXiv link: http://arxiv.org/abs/2203.09001v12
Lorenz map, inequality ordering and curves based on multidimensional rearrangements
rearrangements of optimal transport theory. We define a vector Lorenz map as
the integral of the vector quantile map associated with a multivariate resource
allocation. Each component of the Lorenz map is the cumulative share of each
resource, as in the traditional univariate case. The pointwise ordering of such
Lorenz maps defines a new multivariate majorization order, which is equivalent
to preference by any social planner with inequality averse multivariate rank
dependent social evaluation functional. We define a family of multi-attribute
Gini index and complete ordering based on the Lorenz map. We propose the level
sets of an Inverse Lorenz Function as a practical tool to visualize and compare
inequality in two dimensions, and apply it to income-wealth inequality in the
United States between 1989 and 2022.
arXiv link: http://arxiv.org/abs/2203.09000v4
A Simple and Computationally Trivial Estimator for Grouped Fixed Effects Models
models with clustered time patterns of unobserved heterogeneity. The method
avoids non-convex and combinatorial optimization by combining a preliminary
consistent estimator of the slope coefficient, an agglomerative
pairwise-differencing clustering of cross-sectional units, and a pooled
ordinary least squares regression. Asymptotic guarantees are established in a
framework where $T$ can grow at any power of $N$, as both $N$ and $T$ approach
infinity. Unlike most existing approaches, the proposed estimator is
computationally straightforward and does not require a known upper bound on the
number of groups. As existing approaches, this method leads to a consistent
estimation of well-separated groups and an estimator of common parameters
asymptotically equivalent to the infeasible regression controlling for the true
groups. An application revisits the statistical association between income and
democracy.
arXiv link: http://arxiv.org/abs/2203.08879v5
Measurability of functionals and of ideal point forecasts
information set $F$ is the conditional distribution of $Y$ given
$F$. In the context of point forecasts aiming to specify a functional
$T$ such as the mean, a quantile or a risk measure, the ideal point forecast is
the respective functional applied to the conditional distribution. This paper
provides a theoretical justification why this ideal forecast is actually a
forecast, that is, an $F$-measurable random variable. To that end,
the appropriate notion of measurability of $T$ is clarified and this
measurability is established for a large class of practically relevant
functionals, including elicitable ones. More generally, the measurability of
$T$ implies the measurability of any point forecast which arises by applying
$T$ to a probabilistic forecast. Similar measurability results are established
for proper scoring rules, the main tool to evaluate the predictive accuracy of
probabilistic forecasts.
arXiv link: http://arxiv.org/abs/2203.08635v1
Pairwise Valid Instruments
Variable (VSIV) estimation, a method for estimating local average treatment
effects (LATEs) in heterogeneous causal effect models when the instruments are
partially invalid. We consider settings with pairwise valid instruments, that
is, instruments that are valid for a subset of instrument value pairs. VSIV
estimation exploits testable implications of instrument validity to remove
invalid pairs and provides estimates of the LATEs for all remaining pairs,
which can be aggregated into a single parameter of interest using
researcher-specified weights. We show that the proposed VSIV estimators are
asymptotically normal under weak conditions and remove or reduce the asymptotic
bias relative to standard LATE estimators (that is, LATE estimators that do not
use testable implications to remove invalid variation). We evaluate the finite
sample properties of VSIV estimation in application-based simulations and apply
our method to estimate the returns to college education using parental
education as an instrument.
arXiv link: http://arxiv.org/abs/2203.08050v5
Non-Existent Moments of Earnings Growth
variance, skewness, and kurtosis. However, under heavy-tailed distributions,
these moments may not exist in the population. Our empirical analysis reveals
that population kurtosis, skewness, and variance often do not exist for the
conditional distribution of earnings growth. This challenges moment-based
analyses. We propose robust conditional Pareto exponents as novel earnings risk
measures, developing estimation and inference methods. Using the UK New
Earnings Survey Panel Dataset (NESPD) and US Panel Study of Income Dynamics
(PSID), we find: 1) Moments often fail to exist; 2) Earnings risk increases
over the life cycle; 3) Job stayers face higher earnings risk; 4) These
patterns persist during the 2007--2008 recession and the 2015--2016 positive
growth period.
arXiv link: http://arxiv.org/abs/2203.08014v3
Encompassing Tests for Nonparametric Regressions
models through the L2 distance. We contrast it to previous literature on the
comparison of nonparametric regression models. We then develop testing
procedures for the encompassing hypothesis that are fully nonparametric. Our
test statistics depend on kernel regression, raising the issue of bandwidth's
choice. We investigate two alternative approaches to obtain a "small bias
property" for our test statistics. We show the validity of a wild bootstrap
method. We empirically study the use of a data-driven bandwidth and illustrate
the attractive features of our tests for small and moderate samples.
arXiv link: http://arxiv.org/abs/2203.06685v3
Measuring anomalies in cigarette sales by using official data from Spanish provinces: Are there only the anomalies detected by the Empty Pack Surveys (EPS) used by Transnational Tobacco Companies (TTCs)?
by the transnational tobacco companies (TTC) to measure the illicit tobacco
trade. Furthermore, there are studies that indicate that the Empty Pack Surveys
(EPS) ordered by the TTCs are oversized. The novelty of this study is that, in
addition to detecting the anomalies analyzed in the EPSs, there are provinces
in which cigarette sales are higher than reasonable values, something that the
TTCs ignore. This study analyzed simultaneously, firstly, if the EPSs
established in each of the 47 Spanish provinces were fulfilled. Second,
anomalies observed in provinces where sales exceed expected values are
measured. To achieve the objective of the paper, provincial data on cigarette
sales, price and GDP per capita are used. These data are modeled with machine
learning techniques widely used to detect anomalies in other areas. The results
reveal that the provinces in which sales below reasonable values are observed
(as detected by the EPSs) present a clear geographical pattern. Furthermore,
the values provided by the EPSs in Spain, as indicated in the previous
literature, are slightly oversized. Finally, there are regions bordering other
countries or with a high tourist influence in which the observed sales are
higher than the expected values.
arXiv link: http://arxiv.org/abs/2203.06640v1
Synthetic Controls in Action
practice in synthetic control studies. The proposed principles follow from
formal properties of synthetic control estimators, and pertain to the nature,
implications, and prevention of over-fitting biases within a synthetic control
framework, to the interpretability of the results, and to the availability of
validation exercises. We discuss and visually demonstrate the relevance of the
proposed principles under a variety of data configurations.
arXiv link: http://arxiv.org/abs/2203.06279v1
Explainable Machine Learning for Predicting Homicide Clearance in the United States
prediction and detection of drivers of cleared homicides at the national- and
state-levels in the United States.
Methods: First, nine algorithmic approaches are compared to assess the best
performance in predicting cleared homicides country-wise, using data from the
Murder Accountability Project. The most accurate algorithm among all (XGBoost)
is then used for predicting clearance outcomes state-wise. Second, SHAP, a
framework for Explainable Artificial Intelligence, is employed to capture the
most important features in explaining clearance patterns both at the national
and state levels.
Results: At the national level, XGBoost demonstrates to achieve the best
performance overall. Substantial predictive variability is detected state-wise.
In terms of explainability, SHAP highlights the relevance of several features
in consistently predicting investigation outcomes. These include homicide
circumstances, weapons, victims' sex and race, as well as number of involved
offenders and victims.
Conclusions: Explainable Machine Learning demonstrates to be a helpful
framework for predicting homicide clearance. SHAP outcomes suggest a more
organic integration of the two theoretical perspectives emerged in the
literature. Furthermore, jurisdictional heterogeneity highlights the importance
of developing ad hoc state-level strategies to improve police performance in
clearing homicides.
arXiv link: http://arxiv.org/abs/2203.04768v1
On Robust Inference in Time Series Regression
("OLS-HC regression") has proved very useful in cross section environments.
However, several major difficulties, which are generally overlooked, must be
confronted when transferring the HC technology to time series environments via
heteroskedasticity and autocorrelation consistent standard errors ("OLS-HAC
regression"). First, in plausible time-series environments, OLS parameter
estimates can be inconsistent, so that OLS-HAC inference fails even
asymptotically. Second, most economic time series have autocorrelation, which
renders OLS parameter estimates inefficient. Third, autocorrelation similarly
renders conditional predictions based on OLS parameter estimates inefficient.
Finally, the structure of popular HAC covariance matrix estimators is
ill-suited for capturing the autoregressive autocorrelation typically present
in economic time series, which produces large size distortions and reduced
power in HAC-based hypothesis testing, in all but the largest samples. We show
that all four problems are largely avoided by the use of a simple and
easily-implemented dynamic regression procedure, which we call DURBIN. We
demonstrate the advantages of DURBIN with detailed simulations covering a range
of practical issues.
arXiv link: http://arxiv.org/abs/2203.04080v3
Honest calibration assessment for binary outcome predictions
ought to be calibrated: If an event is predicted to occur with probability $x$,
it should materialize with approximately that frequency, which means that the
so-called calibration curve $p(\cdot)$ should equal the identity, $p(x) = x$
for all $x$ in the unit interval. We propose honest calibration assessment
based on novel confidence bands for the calibration curve, which are valid only
subject to the natural assumption of isotonicity. Besides testing the classical
goodness-of-fit null hypothesis of perfect calibration, our bands facilitate
inverted goodness-of-fit tests whose rejection allows for the sought-after
conclusion of a sufficiently well specified model. We show that our bands have
a finite sample coverage guarantee, are narrower than existing approaches, and
adapt to the local smoothness of the calibration curve $p$ and the local
variance of the binary observations. In an application to model predictions of
an infant having a low birth weight, the bounds give informative insights on
model calibration.
arXiv link: http://arxiv.org/abs/2203.04065v2
When Will Arctic Sea Ice Disappear? Projections of Area, Extent, Thickness, and Volume
global climate change. We provide point, interval, and density forecasts for
four measures of Arctic sea ice: area, extent, thickness, and volume.
Importantly, we enforce the joint constraint that these measures must
simultaneously arrive at an ice-free Arctic. We apply this constrained joint
forecast procedure to models relating sea ice to atmospheric carbon dioxide
concentration and models relating sea ice directly to time. The resulting
"carbon-trend" and "time-trend" projections are mutually consistent and predict
a nearly ice-free summer Arctic Ocean by the mid-2030s with an 80% probability.
Moreover, the carbon-trend projections show that global adoption of a lower
carbon path would likely delay the arrival of a seasonally ice-free Arctic by
only a few years.
arXiv link: http://arxiv.org/abs/2203.04040v3
Bayesian Bilinear Neural Network for Predicting the Mid-price Dynamics in Limit-Order Book Markets
modern electronically-driven markets, traditional time-series econometric
methods often appear incapable of capturing the true complexity of the
multi-level interactions driving the price dynamics. While recent research has
established the effectiveness of traditional machine learning (ML) models in
financial applications, their intrinsic inability to deal with uncertainties,
which is a great concern in econometrics research and real business
applications, constitutes a major drawback. Bayesian methods naturally appear
as a suitable remedy conveying the predictive ability of ML methods with the
probabilistically-oriented practice of econometric research. By adopting a
state-of-the-art second-order optimization algorithm, we train a Bayesian
bilinear neural network with temporal attention, suitable for the challenging
time-series task of predicting mid-price movements in ultra-high-frequency
limit-order book markets. We thoroughly compare our Bayesian model with
traditional ML alternatives by addressing the use of predictive distributions
to analyze errors and uncertainties associated with the estimated parameters
and model forecasts. Our results underline the feasibility of the Bayesian
deep-learning approach and its predictive and decisional advantages in complex
econometric tasks, prompting future research in this direction.
arXiv link: http://arxiv.org/abs/2203.03613v2
Inference in Linear Dyadic Data Models with Network Spillovers
typically assume a linear model, estimate it using Ordinary Least Squares and
conduct inference using “dyadic-robust" variance estimators. The latter
assumes that dyads are uncorrelated if they do not share a common unit (e.g.,
if the same individual is not present in both pairs of data). We show that this
assumption does not hold in many empirical applications because indirect links
may exist due to network connections, generating correlated outcomes. Hence,
“dyadic-robust” estimators can be biased in such situations. We develop a
consistent variance estimator for such contexts by leveraging results in
network statistics. Our estimator has good finite sample properties in
simulations, while allowing for decay in spillover effects. We illustrate our
message with an application to politicians' voting behavior when they are
seating neighbors in the European Parliament.
arXiv link: http://arxiv.org/abs/2203.03497v5
High-Resolution Peak Demand Estimation Using Generalized Additive Models and Deep Neural Networks
given lower-resolution data. This is a relevant setup as it answers whether
limited higher-resolution monitoring helps to estimate future high-resolution
peak loads when the high-resolution data is no longer available. That question
is particularly interesting for network operators considering replacing
high-resolution monitoring predictive models due to economic considerations. We
propose models to predict half-hourly minima and maxima of high-resolution
(every minute) electricity load data while model inputs are of a lower
resolution (30 minutes). We combine predictions of generalized additive models
(GAM) and deep artificial neural networks (DNN), which are popular in load
forecasting. We extensively analyze the prediction models, including the input
parameters' importance, focusing on load, weather, and seasonal effects. The
proposed method won a data competition organized by Western Power Distribution,
a British distribution network operator. In addition, we provide a rigorous
evaluation study that goes beyond the competition frame to analyze the models'
robustness. The results show that the proposed methods are superior to the
competition benchmark concerning the out-of-sample root mean squared error
(RMSE). This holds regarding the competition month and the supplementary
evaluation study, which covers an additional eleven months. Overall, our
proposed model combination reduces the out-of-sample RMSE by 57.4% compared to
the benchmark.
arXiv link: http://arxiv.org/abs/2203.03342v2
Estimation of a Factor-Augmented Linear Model with Applications Using Student Achievement Data
on the type of restrictions that must be imposed to solve the rotational
indeterminacy of factor-augmented linear models. We study this problem and
offer several novel results on identification using internally generated
instruments. We propose a new class of estimators and establish large sample
results using recent developments on clustered samples and high-dimensional
models. We carry out simulation studies which show that the proposed approaches
improve the performance of existing methods on the estimation of unknown
factors. Lastly, we consider three empirical applications using administrative
data of students clustered in different subjects in elementary school, high
school and college.
arXiv link: http://arxiv.org/abs/2203.03051v1
Modelplasticity and Abductive Decision Making
find those useful ones starting from an imperfect model? How to make informed
data-driven decisions equipped with an imperfect model? These fundamental
questions appear to be pervasive in virtually all empirical fields -- including
economics, finance, marketing, healthcare, climate change, defense planning,
and operations research. This article presents a modern approach (builds on two
core ideas: abductive thinking and density-sharpening principle) and practical
guidelines to tackle these issues in a systematic manner.
arXiv link: http://arxiv.org/abs/2203.03040v3
Weighted-average quantile regression
framework, $\int_0^1 q_{Y|X}(u)\psi(u)du = X'\beta$, where $Y$ is a dependent
variable, $X$ is a vector of covariates, $q_{Y|X}$ is the quantile function of
the conditional distribution of $Y$ given $X$, $\psi$ is a weighting function,
and $\beta$ is a vector of parameters. We argue that this framework is of
interest in many applied settings and develop an estimator of the vector of
parameters $\beta$. We show that our estimator is $\sqrt T$-consistent and
asymptotically normal with mean zero and easily estimable covariance matrix,
where $T$ is the size of available sample. We demonstrate the usefulness of our
estimator by applying it in two empirical settings. In the first setting, we
focus on financial data and study the factor structures of the expected
shortfalls of the industry portfolios. In the second setting, we focus on wage
data and study inequality and social welfare dependence on commonly used
individual characteristics.
arXiv link: http://arxiv.org/abs/2203.03032v1
Latent Unbalancedness in Three-Way Gravity Models
models are implicitly unbalanced because uninformative observations are
redundant for the estimation. We show with real data as well as simulations
that this phenomenon, which we call latent unbalancedness, amplifies the
inference problem recently studied by Weidner and Zylkin (2021).
arXiv link: http://arxiv.org/abs/2203.02235v1
A Classifier-Lasso Approach for Estimating Production Functions with Latent Group Structures
group structures. I consider production functions that are heterogeneous across
groups but time-homogeneous within groups, and where the group membership of
the firms is unknown. My estimation procedure is fully data-driven and embeds
recent identification strategies from the production function literature into
the classifier-Lasso. Simulation experiments demonstrate that firms are
assigned to their correct latent group with probability close to one. I apply
my estimation procedure to a panel of Chilean firms and find sizable
differences in the estimates compared to the standard approach of
classification by industry.
arXiv link: http://arxiv.org/abs/2203.02220v1
A Modern Gauss-Markov Theorem? Really?
Econometrica), except for one, are not new as they coincide with classical
theorems like the good old Gauss-Markov or Aitken Theorem, respectively; the
exceptional theorem is incorrect. Hansen (2021b) corrects this theorem. As a
result, all theorems in the latter version coincide with the above mentioned
classical theorems. Furthermore, we also show that the theorems in Hansen
(2022) (the version published in Econometrica) either coincide with the
classical theorems just mentioned, or contain extra assumptions that are alien
to the Gauss-Markov or Aitken Theorem.
arXiv link: http://arxiv.org/abs/2203.01425v5
Minimax Risk in Estimating Kink Threshold and Testing Continuity
knowing whether the threshold regression model is continuous or not. The bound
goes to zero as the sample size $ n $ grows only at the cube root rate.
Motivated by this finding, we develop a continuity test for the threshold
regression model and a bootstrap to compute its p-values. The validity
of the bootstrap is established, and its finite sample property is explored
through Monte Carlo simulations.
arXiv link: http://arxiv.org/abs/2203.00349v1
Estimating causal effects with optimization-based methods: A review and empirical comparison
necessary to balance the distributions of (observable) covariates of the
treated and control groups in order to obtain an unbiased estimate of a causal
effect of interest; otherwise, a different effect size may be estimated, and
incorrect recommendations may be given. To achieve this balance, there exist a
wide variety of methods. In particular, several methods based on optimization
models have been recently proposed in the causal inference literature. While
these optimization-based methods empirically showed an improvement over a
limited number of other causal inference methods in their relative ability to
balance the distributions of covariates and to estimate causal effects, they
have not been thoroughly compared to each other and to other noteworthy causal
inference methods. In addition, we believe that there exist several unaddressed
opportunities that operational researchers could contribute with their advanced
knowledge of optimization, for the benefits of the applied researchers that use
causal inference tools. In this review paper, we present an overview of the
causal inference literature and describe in more detail the optimization-based
causal inference methods, provide a comparative analysis of the prevailing
optimization-based methods, and discuss opportunities for new methods.
arXiv link: http://arxiv.org/abs/2203.00097v1
Dynamic Spatiotemporal ARCH Models
to the geographical proximity. In this paper, we introduce a dynamic
spatiotemporal autoregressive conditional heteroscedasticity (ARCH) process to
describe the effects of (i) the log-squared time-lagged outcome variable, i.e.,
the temporal effect, (ii) the spatial lag of the log-squared outcome variable,
i.e., the spatial effect, and (iii) the spatial lag of the log-squared
time-lagged outcome variable, i.e., the spatiotemporal effect, on the
volatility of an outcome variable. Furthermore, our suggested process allows
for the fixed effects over time and space to account for the unobserved
heterogeneity. For this dynamic spatiotemporal ARCH model, we derive a
generalized method of moments (GMM) estimator based on the linear and quadratic
moment conditions of a specific transformation. We show the consistency and
asymptotic normality of the GMM estimator, and determine the best set of moment
functions. We investigate the finite-sample properties of the proposed GMM
estimator in a series of Monte-Carlo simulations with different model
specifications and error distributions. Our simulation results show that our
suggested GMM estimator has good finite sample properties. In an empirical
application, we use monthly log-returns of the average condominium prices of
each postcode of Berlin from 1995 to 2015 (190 spatial units, 240 time points)
to demonstrate the use of our suggested model. Our estimation results show that
the temporal, spatial and spatiotemporal lags of the log-squared returns have
statistically significant effects on the volatility of the log-returns.
arXiv link: http://arxiv.org/abs/2202.13856v1
Forecasting US Inflation Using Bayesian Nonparametric Models
potentially nonlinear with a strength that varies over time, and prediction
errors error may be subject to large, asymmetric shocks. Inspired by these
concerns, we develop a model for inflation forecasting that is nonparametric
both in the conditional mean and in the error using Gaussian and Dirichlet
processes, respectively. We discuss how both these features may be important in
producing accurate forecasts of inflation. In a forecasting exercise involving
CPI inflation, we find that our approach has substantial benefits, both overall
and in the left tail, with nonparametric modeling of the conditional mean being
of particular importance.
arXiv link: http://arxiv.org/abs/2202.13793v1
Personalized Subsidy Rules
long-term benefits. Typical examples include subsidized job training programs
and provisions of preventive health products, in which both behavioral
responses and associated gains can exhibit heterogeneity. This study uses the
marginal treatment effect (MTE) framework to study personalized assignments of
subsidies based on individual characteristics. First, we derive the optimality
condition for a welfare-maximizing subsidy rule by showing that the welfare can
be represented as a function of the MTE. Next, we show that subsidies generally
result in better welfare than directly mandating the encouraged behavior
because subsidy rules implicitly target individuals through unobserved
heterogeneity in the behavioral response. When there is positive selection,
that is, when individuals with higher returns are more likely to select the
encouraged behavior, the optimal subsidy rule achieves the first-best welfare,
which is the optimal welfare if a policy-maker can observe individuals' private
information. We then provide methods to (partially) identify the optimal
subsidy rule when the MTE is identified and unidentified. Particularly,
positive selection allows for the point identification of the optimal subsidy
rule even when the MTE curve is not. As an empirical application, we study the
optimal wage subsidy using the experimental data from the Jordan New
Opportunities for Women pilot study.
arXiv link: http://arxiv.org/abs/2202.13545v2
Variational inference for large Bayesian vector autoregressions
vector autoregression (VAR) models with hierarchical shrinkage priors. Our
approach does not rely on a conventional structural VAR representation of the
parameter space for posterior inference. Instead, we elicit hierarchical
shrinkage priors directly on the matrix of regression coefficients so that (1)
the prior structure directly maps into posterior inference on the reduced-form
transition matrix, and (2) posterior estimates are more robust to variables
permutation. An extensive simulation study provides evidence that our approach
compares favourably against existing linear and non-linear Markov Chain Monte
Carlo and variational Bayes methods. We investigate both the statistical and
economic value of the forecasts from our variational inference approach within
the context of a mean-variance investor allocating her wealth in a large set of
different industry portfolios. The results show that more accurate estimates
translate into substantial statistical and economic out-of-sample gains. The
results hold across different hierarchical shrinkage priors and model
dimensions.
arXiv link: http://arxiv.org/abs/2202.12644v3
A general characterization of optimal tie-breaker designs
gain from preferentially assigning a binary treatment to those with high values
of a running variable $x$. The design objective is any continuous function of
the expected information matrix in a two-line regression model, and short-term
gain is expressed as the covariance between the running variable and the
treatment indicator. We investigate how to specify design functions indicating
treatment probabilities as a function of $x$ to optimize these competing
objectives, under external constraints on the number of subjects receiving
treatment. Our results include sharp existence and uniqueness guarantees, while
accommodating the ethically appealing requirement that treatment probabilities
are non-decreasing in $x$. Under such a constraint, there always exists an
optimal design function that is constant below and above a single
discontinuity. When the running variable distribution is not symmetric or the
fraction of subjects receiving the treatment is not $1/2$, our optimal designs
improve upon a $D$-optimality objective without sacrificing short-term gain,
compared to the three level tie-breaker designs of Owen and Varian (2020) that
fix treatment probabilities at $0$, $1/2$, and $1$. We illustrate our optimal
designs with data from Head Start, an early childhood government intervention
program.
arXiv link: http://arxiv.org/abs/2202.12511v2
Fast variational Bayes methods for multinomial probit models
However, estimation with existing Markov chain Monte Carlo (MCMC) methods is
computationally costly, which limits its applicability to large choice data
sets. This paper proposes a variational Bayes method that is accurate and fast,
even when a large number of choice alternatives and observations are
considered. Variational methods usually require an analytical expression for
the unnormalized posterior density and an adequate choice of variational
family. Both are challenging to specify in a multinomial probit, which has a
posterior that requires identifying restrictions and is augmented with a large
set of latent utilities. We employ a spherical transformation on the covariance
matrix of the latent utilities to construct an unnormalized augmented posterior
that identifies the parameters, and use the conditional posterior of the latent
utilities as part of the variational family. The proposed method is faster than
MCMC, and can be made scalable to both a large number of choice alternatives
and a large number of observations. The accuracy and scalability of our method
is illustrated in numerical experiments and real purchase data with one million
observations.
arXiv link: http://arxiv.org/abs/2202.12495v2
Confidence Intervals of Treatment Effects in Panel Data Models with Interactive Fixed Effects
estimated using panel models with interactive fixed effects. We first use the
factor-based matrix completion technique proposed by Bai and Ng (2021) to
estimate the treatment effects, and then use bootstrap method to construct
confidence intervals of the treatment effects for treated units at each
post-treatment period. Our construction of confidence intervals requires
neither specific distributional assumptions on the error terms nor large number
of post-treatment periods. We also establish the validity of the proposed
bootstrap procedure that these confidence intervals have asymptotically correct
coverage probabilities. Simulation studies show that these confidence intervals
have satisfactory finite sample performances, and empirical applications using
classical datasets yield treatment effect estimates of similar magnitudes and
reliable confidence intervals.
arXiv link: http://arxiv.org/abs/2202.12078v1
Semiparametric Estimation of Dynamic Binary Choice Panel Data Models
choice models with fixed effects and dynamics (lagged dependent variables). The
model we consider has the same random utility framework as in Honore and
Kyriazidou (2000). We demonstrate that, with additional serial dependence
conditions on the process of deterministic utility and tail restrictions on the
error distribution, the (point) identification of the model can proceed in two
steps, and only requires matching the value of an index function of explanatory
variables over time, as opposed to that of each explanatory variable. Our
identification approach motivates an easily implementable, two-step maximum
score (2SMS) procedure -- producing estimators whose rates of convergence, in
contrast to Honore and Kyriazidou's (2000) methods, are independent of the
model dimension. We then derive the asymptotic properties of the 2SMS procedure
and propose bootstrap-based distributional approximations for inference. Monte
Carlo evidence indicates that our procedure performs adequately in finite
samples.
arXiv link: http://arxiv.org/abs/2202.12062v4
Distributional Counterfactual Analysis in High-Dimensional Setup
methodology to recover the counterfactual distribution when there is a single
(or a few) treated unit and possibly a high-dimensional number of potential
controls observed in a panel structure. The methodology accommodates, albeit
does not require, the number of units to be larger than the number of time
periods (high-dimensional setup). As opposed to modeling only the conditional
mean, we propose to model the entire conditional quantile function (CQF)
without intervention and estimate it using the pre-intervention period by a
l1-penalized regression. We derive non-asymptotic bounds for the estimated CQF
valid uniformly over the quantiles. The bounds are explicit in terms of the
number of time periods, the number of control units, the weak dependence
coefficient (beta-mixing), and the tail decay of the random variables. The
results allow practitioners to re-construct the entire counterfactual
distribution. Moreover, we bound the probability coverage of this estimated
CQF, which can be used to construct valid confidence intervals for the
(possibly random) treatment effect for every post-intervention period. We also
propose a new hypothesis test for the sharp null of no-effect based on the Lp
norm of deviation of the estimated CQF to the population one. Interestingly,
the null distribution is quasi-pivotal in the sense that it only depends on the
estimated CQF, Lp norm, and the number of post-intervention periods, but not on
the size of the post-intervention period. For that reason, critical values can
then be easily simulated. We illustrate the methodology by revisiting the
empirical study in Acemoglu, Johnson, Kermani, Kwak and Mitton (2016).
arXiv link: http://arxiv.org/abs/2202.11671v2
Differentially Private Estimation of Heterogeneous Causal Effects
social science often involves sensitive data where protecting privacy is
important. We introduce a general meta-algorithm for estimating conditional
average treatment effects (CATE) with differential privacy (DP) guarantees. Our
meta-algorithm can work with simple, single-stage CATE estimators such as
S-learner and more complex multi-stage estimators such as DR and R-learner. We
perform a tight privacy analysis by taking advantage of sample splitting in our
meta-algorithm and the parallel composition property of differential privacy.
In this paper, we implement our approach using DP-EBMs as the base learner.
DP-EBMs are interpretable, high-accuracy models with privacy guarantees, which
allow us to directly observe the impact of DP noise on the learned causal
model. Our experiments show that multi-stage CATE estimators incur larger
accuracy loss than single-stage CATE or ATE estimators and that most of the
accuracy loss from differential privacy is due to an increase in variance, not
biased estimates of treatment effects.
arXiv link: http://arxiv.org/abs/2202.11043v1
Multivariate Tie-breaker Designs
variable are given some (usually desirable) treatment, subjects with low values
are not, and subjects in the middle are randomized. TBDs are intermediate
between regression discontinuity designs (RDDs) and randomized controlled
trials (RCTs). TBDs allow a tradeoff between the resource allocation efficiency
of an RDD and the statistical efficiency of an RCT. We study a model where the
expected response is one multivariate regression for treated subjects and
another for control subjects. We propose a prospective D-optimality, analogous
to Bayesian optimal design, to understand design tradeoffs without reference to
a specific data set. For given covariates, we show how to use convex
optimization to choose treatment probabilities that optimize this criterion. We
can incorporate a variety of constraints motivated by economic and ethical
considerations. In our model, D-optimality for the treatment effect coincides
with D-optimality for the whole regression, and, without constraints, an RCT is
globally optimal. We show that a monotonicity constraint favoring more
deserving subjects induces sparsity in the number of distinct treatment
probabilities. We apply the convex optimization solution to a semi-synthetic
example involving triage data from the MIMIC-IV-ED database.
arXiv link: http://arxiv.org/abs/2202.10030v5
Score Driven Generalized Fitness Model for Sparse and Weighted Temporal Networks
focuses on binary graphs, often one can associate a weight to each link. In
such cases the data are better described by a weighted, or valued, network. An
important well known fact is that real world weighted networks are typically
sparse. We propose a novel time varying parameter model for sparse and weighted
temporal networks as a combination of the fitness model, appropriately
extended, and the score driven framework. We consider a zero augmented
generalized linear model to handle the weights and an observation driven
approach to describe time varying parameters. The result is a flexible approach
where the probability of a link to exist is independent from its expected
weight. This represents a crucial difference with alternative specifications
proposed in the recent literature, with relevant implications for the
flexibility of the model.
Our approach also accommodates for the dependence of the network dynamics on
external variables. We present a link forecasting analysis to data describing
the overnight exposures in the Euro interbank market and investigate whether
the influence of EONIA rates on the interbank network dynamics has changed over
time.
arXiv link: http://arxiv.org/abs/2202.09854v2
A Unified Nonparametric Test of Transformations on Distribution Functions with Nuisance Parameters
cumulative distribution functions (CDFs) in the presence of nuisance
parameters. The proposed test is constructed based on a new characterization
that avoids the estimation of nuisance parameters. The critical values are
obtained through a numerical bootstrap method which can easily be implemented
in practice. Under suitable conditions, the proposed test is shown to be
asymptotically size controlled and consistent. The local power property of the
test is established. Finally, Monte Carlo simulations and an empirical study
show that the test performs well on finite samples.
arXiv link: http://arxiv.org/abs/2202.11031v2
Long Run Risk in Stationary Structural Vector Autoregressive Models
time series with strong persistence and non-negligible long run risk. This
process represents the stationary long run component in an unobserved short-
and long-run components model involving different time scales. More
specifically, the short run component evolves in the calendar time and the long
run component evolves in an ultra long time scale. We develop the methods of
estimation and long run prediction for the univariate and multivariate
Structural VAR (SVAR) models with unobserved components and reveal the
impossibility to consistently estimate some of the long run parameters. The
approach is illustrated by a Monte-Carlo study and an application to
macroeconomic data.
arXiv link: http://arxiv.org/abs/2202.09473v1
A multivariate extension of the Misspecification-Resistant Information Criterion
[H.-L. Hsu, C.-K. Ing, H. Tong: On model selection from a finite family of
possibly misspecified time series models. The Annals of Statistics. 47 (2),
1061--1087 (2019)] is a model selection criterion for univariate parametric
time series that enjoys both the property of consistency and asymptotic
efficiency. In this article we extend the MRIC to the case where the response
is a multivariate time series and the predictor is univariate. The extension
requires novel derivations based upon random matrix theory. We obtain an
asymptotic expression for the mean squared prediction error matrix, the
vectorial MRIC and prove the consistency of its method-of-moments estimator.
Moreover, we prove its asymptotic efficiency. Finally, we show with an example
that, in presence of misspecification, the vectorial MRIC identifies the best
predictive model whereas traditional information criteria like AIC or BIC fail
to achieve the task.
arXiv link: http://arxiv.org/abs/2202.09225v1
Counterfactual Analysis of the Impact of the IMF Program on Child Poverty in the Global-South Region using Causal-Graphical Normalizing Flows
inference and deep learning models: causal-Graphical Normalizing Flows
(c-GNFs). In a recent contribution, scholars showed that normalizing flows
carry certain properties, making them particularly suitable for causal and
counterfactual analysis. However, c-GNFs have only been tested in a simulated
data setting and no contribution to date have evaluated the application of
c-GNFs on large-scale real-world data. Focusing on the AI for social
good, our study provides a counterfactual analysis of the impact of the
International Monetary Fund (IMF) program on child poverty using c-GNFs. The
analysis relies on a large-scale real-world observational data: 1,941,734
children under the age of 18, cared for by 567,344 families residing in the 67
countries from the Global-South. While the primary objective of the IMF is to
support governments in achieving economic stability, our results find that an
IMF program reduces child poverty as a positive side-effect by about
1.2$\pm$0.24 degree (`0' equals no poverty and `7' is maximum poverty). Thus,
our article shows how c-GNFs further the use of deep learning and causal
inference in AI for social good. It shows how learning algorithms can be used
for addressing the untapped potential for a significant social impact through
counterfactual inference at population level (ACE), sub-population level
(CACE), and individual level (ICE). In contrast to most works that model ACE or
CACE but not ICE, c-GNFs enable personalization using `The First Law of
Causal Inference'.
arXiv link: http://arxiv.org/abs/2202.09391v1
Synthetic Control As Online Linear Regression
learning. Specifically, we recognize synthetic control as an instance of
Follow-The-Leader (FTL). Standard results in online convex optimization then
imply that, even when outcomes are chosen by an adversary, synthetic control
predictions of counterfactual outcomes for the treated unit perform almost as
well as an oracle weighted average of control units' outcomes. Synthetic
control on differenced data performs almost as well as oracle weighted
difference-in-differences, potentially making it an attractive choice in
practice. We argue that this observation further supports the use of synthetic
control estimators in comparative case studies.
arXiv link: http://arxiv.org/abs/2202.08426v2
CAREER: A Foundation Model for Labor Sequence Data
models to small, carefully constructed longitudinal survey datasets. Although
machine learning methods offer promise for such problems, these survey datasets
are too small to take advantage of them. In recent years large datasets of
online resumes have also become available, providing data about the career
trajectories of millions of individuals. However, standard econometric models
cannot take advantage of their scale or incorporate them into the analysis of
survey data. To this end we develop CAREER, a foundation model for job
sequences. CAREER is first fit to large, passively-collected resume data and
then fine-tuned to smaller, better-curated datasets for economic inferences. We
fit CAREER to a dataset of 24 million job sequences from resumes, and adjust it
on small longitudinal survey datasets. We find that CAREER forms accurate
predictions of job sequences, outperforming econometric baselines on three
widely-used economics datasets. We further find that CAREER can be used to form
good predictions of other downstream variables. For example, incorporating
CAREER into a wage model provides better predictions than the econometric
models currently in use.
arXiv link: http://arxiv.org/abs/2202.08370v4
Fairness constraint in Structural Econometrics and Application to fair estimation using Instrumental Variables
sample that will be used to predict new observations. To this end, it
aggregates individual characteristics of the observations of the learning
sample. But this information aggregation does not consider any potential
selection on unobservables and any status-quo biases which may be contained in
the training sample. The latter bias has raised concerns around the so-called
fairness of machine learning algorithms, especially towards
disadvantaged groups. In this chapter, we review the issue of fairness in
machine learning through the lenses of structural econometrics models in which
the unknown index is the solution of a functional equation and issues of
endogeneity are explicitly accounted for. We model fairness as a linear
operator whose null space contains the set of strictly {\it fair} indexes. A
{\it fair} solution is obtained by projecting the unconstrained index into the
null space of this operator or by directly finding the closest solution of the
functional equation into this null space. We also acknowledge that policymakers
may incur a cost when moving away from the status quo. Achieving
approximate fairness is obtained by introducing a fairness penalty in
the learning procedure and balancing more or less heavily the influence between
the status quo and a full fair solution.
arXiv link: http://arxiv.org/abs/2202.08977v1
An Equilibrium Model of the First-Price Auction with Strategic Uncertainty: Theory and Empirics
uncertainty: They cannot perfectly anticipate the other bidders' bidding
behavior. We propose a model in which bidders do not know the entire
distribution of opponent bids but only the expected (winning) bid and lower and
upper bounds on the opponent bids. We characterize the optimal bidding
strategies and prove the existence of equilibrium beliefs. Finally, we apply
the model to estimate the cost distribution in highway procurement auctions and
find good performance out-of-sample.
arXiv link: http://arxiv.org/abs/2202.07517v2
Long-term Causal Inference Under Persistent Confounding via Data Combination
when both experimental and observational data are available. Since the
long-term outcome is observed only after a long delay, it is not measured in
the experimental data, but only recorded in the observational data. However,
both types of data include observations of some short-term outcomes. In this
paper, we uniquely tackle the challenge of persistent unmeasured confounders,
i.e., some unmeasured confounders that can simultaneously affect the treatment,
short-term outcomes and the long-term outcome, noting that they invalidate
identification strategies in previous literature. To address this challenge, we
exploit the sequential structure of multiple short-term outcomes, and develop
three novel identification strategies for the average long-term treatment
effect. We further propose three corresponding estimators and prove their
asymptotic consistency and asymptotic normality. We finally apply our methods
to estimate the effect of a job training program on long-term employment using
semi-synthetic data. We numerically show that our proposals outperform existing
methods that fail to handle persistent confounders.
arXiv link: http://arxiv.org/abs/2202.07234v5
Asymptotics of Cointegration Tests for High-Dimensional VAR($k$)
order $k$, VAR($k$). Additional deterministic terms such as trend or
seasonality are allowed. The number of time periods, $T$, and the number of
coordinates, $N$, are assumed to be large and of the same order. Under this
regime the first-order asymptotics of the Johansen likelihood ratio (LR),
Pillai-Bartlett, and Hotelling-Lawley tests for cointegration are derived: the
test statistics converge to nonrandom integrals. For more refined analysis, the
paper proposes and analyzes a modification of the Johansen test. The new test
for the absence of cointegration converges to the partial sum of the Airy$_1$
point process. Supporting Monte Carlo simulations indicate that the same
behavior persists universally in many situations beyond those considered in our
theorems.
The paper presents empirical implementations of the approach for the analysis
of S$&$P$100$ stocks and of cryptocurrencies. The latter example has a strong
presence of multiple cointegrating relationships, while the results for the
former are consistent with the null of no cointegration.
arXiv link: http://arxiv.org/abs/2202.07150v4
Sequential Monte Carlo With Model Tempering
time-consuming to evaluate the likelihood function. We demonstrate how Bayesian
computations for such models can be drastically accelerated by reweighting and
mutating posterior draws from an approximating model that allows for fast
likelihood evaluations, into posterior draws from the model of interest, using
a sequential Monte Carlo (SMC) algorithm. We apply the technique to the
estimation of a vector autoregression with stochastic volatility and a
nonlinear dynamic stochastic general equilibrium model. The runtime reductions
we obtain range from 27% to 88%.
arXiv link: http://arxiv.org/abs/2202.07070v1
scpi: Uncertainty Quantification for Synthetic Control Methods
intervention using weighted averages of untreated units to approximate the
counterfactual outcome that the treated unit(s) would have experienced in the
absence of the intervention. This method is useful for program evaluation and
causal inference in observational studies. We introduce the software package
scpi for prediction and inference using synthetic controls, implemented in
Python, R, and Stata. For point estimation or prediction of treatment effects,
the package offers an array of (possibly penalized) approaches leveraging the
latest optimization methods. For uncertainty quantification, the package offers
the prediction interval methods introduced by Cattaneo, Feng and Titiunik
(2021) and Cattaneo, Feng, Palomba and Titiunik (2022). The paper includes
numerical illustrations and a comparison with other synthetic control software.
arXiv link: http://arxiv.org/abs/2202.05984v3
Benign-Overfitting in Conditional Average Treatment Effect Prediction with Linear Regression
average treatment effect (CATE), with linear regression models. As the
development of machine learning for causal inference, a wide range of
large-scale models for causality are gaining attention. One problem is that
suspicions have been raised that the large-scale models are prone to
overfitting to observations with sample selection, hence the large models may
not be suitable for causal prediction. In this study, to resolve the
suspicious, we investigate on the validity of causal inference methods for
overparameterized models, by applying the recent theory of benign overfitting
(Bartlett et al., 2020). Specifically, we consider samples whose distribution
switches depending on an assignment rule, and study the prediction of CATE with
linear models whose dimension diverges to infinity. We focus on two methods:
the T-learner, which based on a difference between separately constructed
estimators with each treatment group, and the inverse probability weight
(IPW)-learner, which solves another regression problem approximated by a
propensity score. In both methods, the estimator consists of interpolators that
fit the samples perfectly. As a result, we show that the T-learner fails to
achieve the consistency except the random assignment, while the IPW-learner
converges the risk to zero if the propensity score is known. This difference
stems from that the T-learner is unable to preserve eigenspaces of the
covariances, which is necessary for benign overfitting in the overparameterized
setting. Our result provides new insights into the usage of causal inference
methods in the overparameterizated setting, in particular, doubly robust
estimators.
arXiv link: http://arxiv.org/abs/2202.05245v2
von Mises-Fisher distributions and their statistical divergence
surface of the unit ball, summarised by a concentration parameter and a mean
direction. As a quasi-Bayesian prior, the von Mises-Fisher distribution is a
convenient and parsimonious choice when parameter spaces are isomorphic to the
hypersphere (e.g., maximum score estimation in semi-parametric discrete choice,
estimation of single-index treatment assignment rules via empirical welfare
maximisation, under-identifying linear simultaneous equation models). Despite a
long history of application, measures of statistical divergence have not been
analytically characterised for von Mises-Fisher distributions. This paper
provides analytical expressions for the $f$-divergence of a von Mises-Fisher
distribution from another, distinct, von Mises-Fisher distribution in
$R^p$ and the uniform distribution over the hypersphere. This paper
also collect several other results pertaining to the von Mises-Fisher family of
distributions, and characterises the limiting behaviour of the measures of
divergence that we consider.
arXiv link: http://arxiv.org/abs/2202.05192v2
The Transfer Performance of Economic Models
estimating risk preferences in a particular subject pool or for a specific
class of lotteries. Whether a model's predictions extrapolate well across
domains depends on whether the estimated model has captured generalizable
structure. We provide a tractable formulation for this "out-of-domain"
prediction problem and define the transfer error of a model based on how well
it performs on data from a new domain. We derive finite-sample forecast
intervals that are guaranteed to cover realized transfer errors with a
user-selected probability when domains are iid, and use these intervals to
compare the transferability of economic models and black box algorithms for
predicting certainty equivalents. We find that in this application, the black
box algorithms we consider outperform standard economic models when estimated
and tested on data from the same domain, but the economic models generalize
across domains better than the black-box algorithms do.
arXiv link: http://arxiv.org/abs/2202.04796v5
Semiparametric Bayesian Estimation of Dynamic Discrete Choice Models
dynamic discrete choice models. The distribution of additive utility shocks in
the proposed framework is modeled by location-scale mixtures of extreme value
distributions with varying numbers of mixture components. Our approach exploits
the analytical tractability of extreme value distributions in the multinomial
choice settings and the flexibility of the location-scale mixtures. We
implement the Bayesian approach to inference using Hamiltonian Monte Carlo and
an approximately optimal reversible jump algorithm. In our simulation
experiments, we show that the standard dynamic logit model can deliver
misleading results, especially about counterfactuals, when the shocks are not
extreme value distributed. Our semiparametric approach delivers reliable
inference in these settings. We develop theoretical results on approximations
by location-scale mixtures in an appropriate distance and posterior
concentration of the set identified utility parameters and the distribution of
shocks in the model.
arXiv link: http://arxiv.org/abs/2202.04339v3
Regulatory Instruments for Fair Personalized Pricing
individual consumers based on their characteristics and behaviors. It has
become common practice in many industries nowadays due to the availability of a
growing amount of high granular consumer data. The discriminatory nature of
personalized pricing has triggered heated debates among policymakers and
academics on how to design regulation policies to balance market efficiency and
equity. In this paper, we propose two sound policy instruments, i.e., capping
the range of the personalized prices or their ratios. We investigate the
optimal pricing strategy of a profit-maximizing monopoly under both regulatory
constraints and the impact of imposing them on consumer surplus, producer
surplus, and social welfare. We theoretically prove that both proposed
constraints can help balance consumer surplus and producer surplus at the
expense of total surplus for common demand distributions, such as uniform,
logistic, and exponential distributions. Experiments on both simulation and
real-world datasets demonstrate the correctness of these theoretical results.
Our findings and insights shed light on regulatory policy design for the
increasingly monopolized business in the digital era.
arXiv link: http://arxiv.org/abs/2202.04245v2
Managers versus Machines: Do Algorithms Replicate Human Intuition in Credit Ratings?
replicate the behavior of bank managers who assess the risk of commercial loans
made by a large commercial US bank. Even though a typical bank already relies
on an algorithmic scorecard process to evaluate risk, bank managers are given
significant latitude in adjusting the risk score in order to account for other
holistic factors based on their intuition and experience. We show that it is
possible to find machine learning algorithms that can replicate the behavior of
the bank managers. The input to the algorithms consists of a combination of
standard financials and soft information available to bank managers as part of
the typical loan review process. We also document the presence of significant
heterogeneity in the adjustment process that can be traced to differences
across managers and industries. Our results highlight the effectiveness of
machine learning based analytic approaches to banking and the potential
challenges to high-skill jobs in the financial sector.
arXiv link: http://arxiv.org/abs/2202.04218v1
Validating Causal Inference Methods
outcomes are not fully observed for any unit. Furthermore, in observational
studies, treatment assignment is likely to be confounded. Many statistical
methods have emerged for causal inference under unconfoundedness conditions
given pre-treatment covariates, including propensity score-based methods,
prognostic score-based methods, and doubly robust methods. Unfortunately for
applied researchers, there is no `one-size-fits-all' causal method that can
perform optimally universally. In practice, causal methods are primarily
evaluated quantitatively on handcrafted simulated data. Such data-generative
procedures can be of limited value because they are typically stylized models
of reality. They are simplified for tractability and lack the complexities of
real-world data. For applied researchers, it is critical to understand how well
a method performs for the data at hand. Our work introduces a deep generative
model-based framework, Credence, to validate causal inference methods. The
framework's novelty stems from its ability to generate synthetic data anchored
at the empirical distribution for the observed sample, and therefore virtually
indistinguishable from the latter. The approach allows the user to specify
ground truth for the form and magnitude of causal effects and confounding bias
as functions of covariates. Thus simulated data sets are used to evaluate the
potential performance of various causal estimation methods when applied to data
similar to the observed sample. We demonstrate Credence's ability to accurately
assess the relative performance of causal estimation techniques in an extensive
simulation study and two real-world data applications from Lalonde and Project
STAR studies.
arXiv link: http://arxiv.org/abs/2202.04208v5
Dynamic Heterogeneous Distribution Regression Panel Models, with an Application to Labor Income Processes
heterogeneous coefficients across units. The objects of primary interest are
functionals of these coefficients, including predicted one-step-ahead and
stationary cross-sectional distributions of the outcome variable. Coefficients
and their functionals are estimated via fixed effect methods. We investigate
how these functionals vary in response to counterfactual changes in initial
conditions or covariate values. We also identify a uniformity problem related
to the robustness of inference to the unknown degree of coefficient
heterogeneity, and propose a cross-sectional bootstrap method for uniformly
valid inference on function-valued objects. We showcase the utility of our
approach through an empirical application to individual income dynamics.
Employing the annual Panel Study of Income Dynamics data, we establish the
presence of substantial coefficient heterogeneity. We then highlight some
important empirical questions that our methodology can address. First, we
quantify the impact of a negative labor income shock on the distribution of
future labor income.
arXiv link: http://arxiv.org/abs/2202.04154v4
A Neural Phillips Curve and a Deep Output Gap
hurdle that the two key components, inflation expectations and the output gap,
are both unobserved. Traditional remedies include proxying for the absentees or
extracting them via assumptions-heavy filtering procedures. I propose an
alternative route: a Hemisphere Neural Network (HNN) whose architecture yields
a final layer where components can be interpreted as latent states within a
Neural PC. There are benefits. First, HNN conducts the supervised estimation of
nonlinearities that arise when translating a high-dimensional set of observed
regressors into latent states. Second, forecasts are economically
interpretable. Among other findings, the contribution of real activity to
inflation appears understated in traditional PCs. In contrast, HNN captures the
2021 upswing in inflation and attributes it to a large positive output gap
starting from late 2020. The unique path of HNN's gap comes from dispensing
with unemployment and GDP in favor of an amalgam of nonlinearly processed
alternative tightness indicators.
arXiv link: http://arxiv.org/abs/2202.04146v2
Continuous permanent unobserved heterogeneity in dynamic discrete choice models
to control for unobserved heterogeneity. However, consistent estimation
typically requires both restrictions on the support of unobserved heterogeneity
and a high-level injectivity condition that is difficult to verify. This paper
provides primitive conditions for point identification of a broad class of DDC
models with multivariate continuous permanent unobserved heterogeneity. The
results apply to both finite- and infinite-horizon DDC models, do not require a
full support assumption, nor a long panel, and place no parametric restriction
on the distribution of unobserved heterogeneity. In addition, I propose a
seminonparametric estimator that is computationally attractive and can be
implemented using familiar parametric methods.
arXiv link: http://arxiv.org/abs/2202.03960v4
Threshold Asymmetric Conditional Autoregressive Range (TACARR) Model
(TACARR) formulation for modeling the daily price ranges of financial assets.
It is assumed that the process generating the conditional expected ranges at
each time point switches between two regimes, labeled as upward market and
downward market states. The disturbance term of the error process is also
allowed to switch between two distributions depending on the regime. It is
assumed that a self-adjusting threshold component that is driven by the past
values of the time series determines the current market regime. The proposed
model is able to capture aspects such as asymmetric and heteroscedastic
behavior of volatility in financial markets. The proposed model is an attempt
at addressing several potential deficits found in existing price range models
such as the Conditional Autoregressive Range (CARR), Asymmetric CARR (ACARR),
Feedback ACARR (FACARR) and Threshold Autoregressive Range (TARR) models.
Parameters of the model are estimated using the Maximum Likelihood (ML) method.
A simulation study shows that the ML method performs well in estimating the
TACARR model parameters. The empirical performance of the TACARR model was
investigated using IBM index data and results show that the proposed model is a
good alternative for in-sample prediction and out-of-sample forecasting of
volatility.
Key Words: Volatility Modeling, Asymmetric Volatility, CARR Models, Regime
Switching.
arXiv link: http://arxiv.org/abs/2202.03351v2
Forecasting Environmental Data: An example to ground-level ozone concentration surfaces
and health studies. This in turn fosters advances in recording and data
collection of many related real-life processes. Available tools for data
processing are often found too restrictive as they do not account for the rich
nature of such data sets. In this paper, we propose a new statistical
perspective on forecasting spatial environmental data collected sequentially
over time. We treat this data set as a surface (functional) time series with a
possibly complicated geographical domain. By employing novel techniques from
functional data analysis we develop a new forecasting methodology. Our approach
consists of two steps. In the first step, time series of surfaces are
reconstructed from measurements sampled over some spatial domain using a finite
element spline smoother. In the second step, we adapt the dynamic functional
factor model to forecast a surface time series. The advantage of this approach
is that we can account for and explore simultaneously spatial as well as
temporal dependencies in the data. A forecasting study of ground-level ozone
concentration over the geographical domain of Germany demonstrates the
practical value of this new perspective, where we compare our approach with
standard functional benchmark models.
arXiv link: http://arxiv.org/abs/2202.03332v1
Predicting Default Probabilities for Stress Tests: A Comparison of Models
assessing the resilience of financial institutions to adverse financial and
economic developments has increased significantly. One key part in such
exercises is the translation of macroeconomic variables into default
probabilities for credit risk by using macrofinancial linkage models. A key
requirement for such models is that they should be able to properly detect
signals from a wide array of macroeconomic variables in combination with a
mostly short data sample. The aim of this paper is to compare a great number of
different regression models to find the best performing credit risk model. We
set up an estimation framework that allows us to systematically estimate and
evaluate a large set of models within the same environment. Our results
indicate that there are indeed better performing models than the current
state-of-the-art model. Moreover, our comparison sheds light on other potential
credit risk models, specifically highlighting the advantages of machine
learning models and forecast combinations.
arXiv link: http://arxiv.org/abs/2202.03110v1
Detecting Structural Breaks in Foreign Exchange Markets by using the group LASSO technique
time series models with their parameters that jump scarcely. Its basic idea
owes the group LASSO (group least absolute shrinkage and selection operator).
The method practically provides estimates of such time-varying parameters of
the models. An example shows that our method can detect each structural
breakpoint's date and magnitude.
arXiv link: http://arxiv.org/abs/2202.02988v1
Difference in Differences with Time-Varying Covariates
parameters from participating in a binary treatment in a difference in
differences (DID) setup when the parallel trends assumption holds after
conditioning on observed covariates. Relative to existing work in the
econometrics literature, we consider the case where the value of covariates can
change over time and, potentially, where participating in the treatment can
affect the covariates themselves. We propose new empirical strategies in both
cases. We also consider two-way fixed effects (TWFE) regressions that include
time-varying regressors, which is the most common way that DID identification
strategies are implemented under conditional parallel trends. We show that,
even in the case with only two time periods, these TWFE regressions are not
generally robust to (i) time-varying covariates being affected by the
treatment, (ii) treatment effects and/or paths of untreated potential outcomes
depending on the level of time-varying covariates in addition to only the
change in the covariates over time, (iii) treatment effects and/or paths of
untreated potential outcomes depending on time-invariant covariates, (iv)
treatment effect heterogeneity with respect to observed covariates, and (v)
violations of strong functional form assumptions, both for outcomes over time
and the propensity score, that are unlikely to be plausible in most DID
applications. Thus, TWFE regressions can deliver misleading estimates of causal
effect parameters in a number of empirically relevant cases. We propose both
doubly robust estimands and regression adjustment/imputation strategies that
are robust to these issues while not being substantially more challenging to
implement.
arXiv link: http://arxiv.org/abs/2202.02903v3
Adaptive information-based methods for determining the co-integration rank in heteroskedastic VAR models
(pseudo-)likelihood ratio (PLR) test, for determining the co-integration rank
of a vector autoregressive (VAR) system of variables integrated of order one
can be significantly affected, even asymptotically, by unconditional
heteroskedasticity (non-stationary volatility) in the data. Known solutions to
this problem include wild bootstrap implementations of the PLR test or the use
of an information criterion, such as the BIC, to select the co-integration
rank. Although asymptotically valid in the presence of heteroskedasticity,
these methods can display very low finite sample power under some patterns of
non-stationary volatility. In particular, they do not exploit potential
efficiency gains that could be realised in the presence of non-stationary
volatility by using adaptive inference methods. Under the assumption of a known
autoregressive lag length, Boswijk and Zu (2022) develop adaptive PLR test
based methods using a non-parameteric estimate of the covariance matrix
process. It is well-known, however, that selecting an incorrect lag length can
significantly impact on the efficacy of both information criteria and bootstrap
PLR tests to determine co-integration rank in finite samples. We show that
adaptive information criteria-based approaches can be used to estimate the
autoregressive lag order to use in connection with bootstrap adaptive PLR
tests, or to jointly determine the co-integration rank and the VAR lag length
and that in both cases they are weakly consistent for these parameters in the
presence of non-stationary volatility provided standard conditions hold on the
penalty term. Monte Carlo simulations are used to demonstrate the potential
gains from using adaptive methods and an empirical application to the U.S. term
structure is provided.
arXiv link: http://arxiv.org/abs/2202.02532v1
First-order integer-valued autoregressive processes with Generalized Katz innovations
Lagrangian Katz (GLK) innovations is defined. This process family provides a
flexible modelling framework for count data, allowing for under and
over--dispersion, asymmetry, and excess of kurtosis and includes standard INAR
models such as Generalized Poisson and Negative Binomial as special cases. We
show that the GLK--INAR process is discrete semi--self--decomposable, infinite
divisible, stable by aggregation and provides stationarity conditions. Some
extensions are discussed, such as the Markov--Switching and the zero--inflated
GLK--INARs. A Bayesian inference framework and an efficient posterior
approximation procedure are introduced. The proposed models are applied to 130
time series from Google Trend, which proxy the worldwide public concern about
climate change. New evidence is found of heterogeneity across time, countries
and keywords in the persistence, uncertainty, and long--run public awareness
level.
arXiv link: http://arxiv.org/abs/2202.02029v2
Efficient Volatility Estimation for Lévy Processes with Jumps of Unbounded Variation
observations has been an active research area for more than a decade. One of
the most well-known and widely studied problems is that of estimation of the
quadratic variation of the continuous component of an It\^o semimartingale with
jumps. Several rate- and variance-efficient estimators have been proposed in
the literature when the jump component is of bounded variation. However, to
date, very few methods can deal with jumps of unbounded variation. By
developing new high-order expansions of the truncated moments of a L\'evy
process, we construct a new rate- and variance-efficient estimator for a class
of L\'evy processes of unbounded variation, whose small jumps behave like those
of a stable L\'evy process with Blumenthal-Getoor index less than $8/5$. The
proposed method is based on a two-step debiasing procedure for the truncated
realized quadratic variation of the process. Our Monte Carlo experiments
indicate that the method outperforms other efficient alternatives in the
literature in the setting covered by our theoretical framework.
arXiv link: http://arxiv.org/abs/2202.00877v1
Long-Horizon Return Predictability from Realized Volatility in Pure-Jump Point Processes
return predictability based on realized variance. To accomplish this, we
propose a parametric transaction-level model for the continuous-time log price
process based on a pure jump point process. The model determines the returns
and realized variance at any level of aggregation with properties shown to be
consistent with the stylized facts in the empirical finance literature. Under
our model, the long-memory parameter propagates unchanged from the
transaction-level drift to the calendar-time returns and the realized variance,
leading endogenously to a balanced predictive regression equation. We propose
an asymptotic framework using power-law aggregation in the predictive
regression. Within this framework, we propose a hypothesis test for long
horizon return predictability which is asymptotically correctly sized and
consistent.
arXiv link: http://arxiv.org/abs/2202.00793v1
Black-box Bayesian inference for economic agent-based models
in economics. The considerable flexibility they offer, as well as their
capacity to reproduce a variety of empirically observed behaviours of complex
systems, give them broad appeal, and the increasing availability of cheap
computing power has made their use feasible. Yet a widespread adoption in
real-world modelling and decision-making scenarios has been hindered by the
difficulty of performing parameter estimation for such models. In general,
simulation models lack a tractable likelihood function, which precludes a
straightforward application of standard statistical inference techniques.
Several recent works have sought to address this problem through the
application of likelihood-free inference techniques, in which parameter
estimates are determined by performing some form of comparison between the
observed data and simulation output. However, these approaches are (a) founded
on restrictive assumptions, and/or (b) typically require many hundreds of
thousands of simulations. These qualities make them unsuitable for large-scale
simulations in economics and can cast doubt on the validity of these inference
methods in such scenarios. In this paper, we investigate the efficacy of two
classes of black-box approximate Bayesian inference methods that have recently
drawn significant attention within the probabilistic machine learning
community: neural posterior estimation and neural density ratio estimation. We
present benchmarking experiments in which we demonstrate that neural network
based black-box methods provide state of the art parameter inference for
economic simulation models, and crucially are compatible with generic
multivariate time-series data. In addition, we suggest appropriate assessment
criteria for future benchmarking of approximate Bayesian inference procedures
for economic simulation models.
arXiv link: http://arxiv.org/abs/2202.00625v1
Estimation of Impulse-Response Functions with Dynamic Factor Models: A New Parametrization
impulse-response functions (IRFs) of dynamic factor models (DFMs). The
theoretical contribution of this paper concerns the problem of observational
equivalence between different IRFs, which implies non-identification of the IRF
parameters without further restrictions. We show how the previously proposed
minimal identification conditions are nested in the new framework and can be
further augmented with overidentifying restrictions leading to efficiency
gains. The current standard practice for the IRF estimation of DFMs is based on
principal components, compared to which the new parametrization is less
restrictive and allows for modelling richer dynamics. As the empirical
contribution of the paper, we develop an estimation method based on the EM
algorithm, which incorporates the proposed identification restrictions. In the
empirical application, we use a standard high-dimensional macroeconomic dataset
to estimate the effects of a monetary policy shock. We estimate a strong
reaction of the macroeconomic variables, while the benchmark models appear to
give qualitatively counterintuitive results. The estimation methods are
implemented in the accompanying R package.
arXiv link: http://arxiv.org/abs/2202.00310v2
Protection or Peril of Following the Crowd in a Pandemic-Concurrent Flood Evacuation
influenced by a wide range of factors, including sociodemographics, emergency
messaging, and social influence. Further complexity is introduced when multiple
hazards occur simultaneously, such as a flood evacuation taking place amid a
viral pandemic that requires physical distancing. Such multi-hazard events can
necessitate a nuanced navigation of competing decision-making strategies
wherein a desire to follow peers is weighed against contagion risks. To better
understand these nuances, we distributed an online survey during a pandemic
surge in July 2020 to 600 individuals in three midwestern and three southern
states in the United States with high risk of flooding. In this paper, we
estimate a random parameter logit model in both preference space and
willingness-to-pay space. Our results show that the directionality and
magnitude of the influence of peers' choices of whether and how to evacuate
vary widely across respondents. Overall, the decision of whether to evacuate is
positively impacted by peer behavior, while the decision of how to evacuate is
negatively impacted by peers. Furthermore, an increase in flood threat level
lessens the magnitude of these impacts. These findings have important
implications for the design of tailored emergency messaging strategies.
Specifically, emphasizing or deemphasizing the severity of each threat in a
multi-hazard scenario may assist in: (1) encouraging a reprioritization of
competing risk perceptions and (2) magnifying or neutralizing the impacts of
social influence, thereby (3) nudging evacuation decision-making toward a
desired outcome.
arXiv link: http://arxiv.org/abs/2202.00229v1
Partial Sum Processes of Residual-Based and Wald-type Break-Point Statistics in Time Series Regression Models
linear regression models by obtaining the limit theory of residual-based and
Wald-type processes. First, we establish the Brownian bridge limiting
distribution of these test statistics. Second, we study the asymptotic
behaviour of the partial-sum processes in nonstationary (linear) time series
regression models. Although, the particular comparisons of these two different
modelling environments is done from the perspective of the partial-sum
processes, it emphasizes that the presence of nuisance parameters can change
the asymptotic behaviour of the functionals under consideration. Simulation
experiments verify size distortions when testing for a break in nonstationary
time series regressions which indicates that the Brownian bridge limit cannot
provide a suitable asymptotic approximation in this case. Further research is
required to establish the cause of size distortions under the null hypothesis
of parameter stability.
arXiv link: http://arxiv.org/abs/2202.00141v2
Deep Learning Macroeconomics
that may emerge when applying econometrics to macroeconomic problems. This
research proposes deep learning as an approach to transfer learning in the
former case and to map relationships between variables in the latter case.
Although macroeconomists already apply transfer learning when assuming a given
a priori distribution in a Bayesian context, estimating a structural VAR with
signal restriction and calibrating parameters based on results observed in
other models, to name a few examples, advance in a more systematic transfer
learning strategy in applied macroeconomics is the innovation we are
introducing. We explore the proposed strategy empirically, showing that data
from different but related domains, a type of transfer learning, helps identify
the business cycle phases when there is no business cycle dating committee and
to quick estimate a economic-based output gap. Next, since deep learning
methods are a way of learning representations, those that are formed by the
composition of multiple non-linear transformations, to yield more abstract
representations, we apply deep learning for mapping low-frequency from
high-frequency variables. The results obtained show the suitability of deep
learning models applied to macroeconomic problems. First, models learned to
classify United States business cycles correctly. Then, applying transfer
learning, they were able to identify the business cycles of out-of-sample
Brazilian and European data. Along the same lines, the models learned to
estimate the output gap based on the U.S. data and obtained good performance
when faced with Brazilian data. Additionally, deep learning proved adequate for
mapping low-frequency variables from high-frequency data to interpolate,
distribute, and extrapolate time series by related series.
arXiv link: http://arxiv.org/abs/2201.13380v1
Improving Estimation Efficiency via Regression-Adjustment in Covariate-Adaptive Randomizations with Imperfect Compliance
covariates in covariate-adaptive randomizations (CARs) with imperfect subject
compliance. Our regression-adjusted estimators, which are based on the doubly
robust moment for local average treatment effects, are consistent and
asymptotically normal even with heterogeneous probability of assignment and
misspecified regression adjustments. We propose an optimal but potentially
misspecified linear adjustment and its further improvement via a nonlinear
adjustment, both of which lead to more efficient estimators than the one
without adjustments. We also provide conditions for nonparametric and
regularized adjustments to achieve the semiparametric efficiency bound under
CARs.
arXiv link: http://arxiv.org/abs/2201.13004v5
A General Description of Growth Trends
expansion. In a similar vein, a recently developed formalism enables
description of growth patterns with the optimal number of parameters (Elitzur
et al, 2020). The method has been applied to the growth of national GDP,
population and the COVID-19 pandemic; in all cases the deviations of long-term
growth patterns from pure exponential required no more than two additional
parameters, mostly only one. Here I utilize the new framework to develop a
unified formulation for all functions that describe growth deceleration,
wherein the growth rate decreases with time. The result offers the prospects
for a new general tool for trend removal in time-series analysis.
arXiv link: http://arxiv.org/abs/2201.13000v1
Pigeonhole Design: Balancing Sequential Experiments from an Online Matching Perspective
balancing when they conduct randomized experiments. For web-facing firms
running online A/B tests, however, it still remains challenging in balancing
covariate information when experimental subjects arrive sequentially. In this
paper, we study an online experimental design problem, which we refer to as the
"Online Blocking Problem." In this problem, experimental subjects with
heterogeneous covariate information arrive sequentially and must be immediately
assigned into either the control or the treated group. The objective is to
minimize the total discrepancy, which is defined as the minimum weight perfect
matching between the two groups. To solve this problem, we propose a randomized
design of experiment, which we refer to as the "Pigeonhole Design." The
pigeonhole design first partitions the covariate space into smaller spaces,
which we refer to as pigeonholes, and then, when the experimental subjects
arrive at each pigeonhole, balances the number of control and treated subjects
for each pigeonhole. We analyze the theoretical performance of the pigeonhole
design and show its effectiveness by comparing against two well-known benchmark
designs: the match-pair design and the completely randomized design. We
identify scenarios when the pigeonhole design demonstrates more benefits over
the benchmark design. To conclude, we conduct extensive simulations using
Yahoo! data to show a 10.2% reduction in variance if we use the pigeonhole
design to estimate the average treatment effect.
arXiv link: http://arxiv.org/abs/2201.12936v6
On the Use of Instrumental Variables in Mediation Analysis
affects an outcome of interest, but also how the treatment effect arises.
Causal mediation analysis provides a formal framework to identify causal
mechanisms through which a treatment affects an outcome. The most popular
identification strategy relies on so-called sequential ignorability (SI)
assumption which requires that there is no unobserved confounder that lies in
the causal paths between the treatment and the outcome. Despite its popularity,
such assumption is deemed to be too strong in many settings as it excludes the
existence of unobserved confounders. This limitation has inspired recent
literature to consider an alternative identification strategy based on an
instrumental variable (IV). This paper discusses the identification of causal
mediation effects in a setting with a binary treatment and a binary
instrumental variable that is both assumed to be random. We show that while IV
methods allow for the possible existence of unobserved confounders, additional
monotonicity assumptions are required unless the strong constant effect is
assumed. Furthermore, even when such monotonicity assumptions are satisfied, IV
estimands are not necessarily equivalent to target parameters.
arXiv link: http://arxiv.org/abs/2201.12752v1
Sharing Behavior in Ride-hailing Trips: A Machine Learning Inference Approach
sharing or pooling is important to mitigate negative externalities of
ride-hailing such as increased congestion and environmental impacts. However,
there lacks empirical evidence on what affect trip-level sharing behavior in
ride-hailing. Using a novel dataset from all ride-hailing trips in Chicago in
2019, we show that the willingness of riders to request a shared ride has
monotonically decreased from 27.0% to 12.8% throughout the year, while the trip
volume and mileage have remained statistically unchanged. We find that the
decline in sharing preference is due to an increased per-mile costs of shared
trips and shifting shorter trips to solo. Using ensemble machine learning
models, we find that the travel impedance variables (trip cost, distance, and
duration) collectively contribute to 95% and 91% of the predictive power in
determining whether a trip is requested to share and whether it is successfully
shared, respectively. Spatial and temporal attributes, sociodemographic, built
environment, and transit supply variables do not entail predictive power at the
trip level in presence of these travel impedance variables. This implies that
pricing signals are most effective to encourage riders to share their rides.
Our findings shed light on sharing behavior in ride-hailing trips and can help
devise strategies that increase shared ride-hailing, especially as the demand
recovers from pandemic.
arXiv link: http://arxiv.org/abs/2201.12696v1
Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance
active research field in econometrics. In this paper, we study the finite
sample performance of meta-learners for estimation of heterogeneous treatment
effects under the usage of sample-splitting and cross-fitting to reduce the
overfitting bias. In both synthetic and semi-synthetic simulations we find that
the performance of the meta-learners in finite samples greatly depends on the
estimation procedure. The results imply that sample-splitting and cross-fitting
are beneficial in large samples for bias reduction and efficiency of the
meta-learners, respectively, whereas full-sample estimation is preferable in
small samples. Furthermore, we derive practical recommendations for application
of specific meta-learners in empirical studies depending on particular data
characteristics such as treatment shares and sample size.
arXiv link: http://arxiv.org/abs/2201.12692v1
A projection based approach for interactive fixed effects panel data models
and conducting inference on regression parameters in panel data models with
interactive fixed effects. The method's key assumption is that factor loadings
can be decomposed into an unknown smooth function of individual characteristics
plus an idiosyncratic error term. Our estimator offers advantages over existing
approaches by taking a simple partial least squares form, eliminating the need
for iterative procedures or preliminary factor estimation. In deriving the
asymptotic properties, we discover that the limiting distribution exhibits a
discontinuity that depends on how well our basis functions explain the factor
loadings, as measured by the variance of the error factor loadings. This
finding reveals that conventional “plug-in” methods using the estimated
asymptotic covariance can produce excessively conservative coverage
probabilities. We demonstrate that uniformly valid non-conservative inference
can be achieved through the cross-sectional bootstrap method. Monte Carlo
simulations confirm the estimator's strong performance in terms of mean squared
error and good coverage results for the bootstrap procedure. We demonstrate the
practical relevance of our methodology by analyzing growth rate determinants
across OECD countries.
arXiv link: http://arxiv.org/abs/2201.11482v3
Towards Agnostic Feature-based Dynamic Pricing: Linear Policies vs Linear Valuation with Unknown Noise
sequence of products (described by feature vectors) on the fly by learning from
the binary outcomes of previous sales sessions ("Sold" if valuation $\geq$
price, and "Not Sold" otherwise). Existing works either assume noiseless linear
valuation or precisely-known noise distribution, which limits the applicability
of those algorithms in practice when these assumptions are hard to verify. In
this work, we study two more agnostic models: (a) a "linear policy" problem
where we aim at competing with the best linear pricing policy while making no
assumptions on the data, and (b) a "linear noisy valuation" problem where the
random valuation is linear plus an unknown and assumption-free noise. For the
former model, we show a $\Theta(d^{\frac13}T^{\frac23})$ minimax regret
up to logarithmic factors. For the latter model, we present an algorithm that
achieves an $O(T^{\frac34})$ regret, and improve the best-known lower
bound from $\Omega(T^{\frac35})$ to $\Omega(T^{\frac23})$. These
results demonstrate that no-regret learning is possible for feature-based
dynamic pricing under weak assumptions, but also reveal a disappointing fact
that the seemingly richer pricing feedback is not significantly more useful
than the bandit-feedback in regret reduction.
arXiv link: http://arxiv.org/abs/2201.11341v2
Standard errors for two-way clustering with serially correlated time effects
two-way clustered panels. Our proposed estimator and theory allow for arbitrary
serial dependence in the common time effects, which is excluded by existing
two-way methods, including the popular two-way cluster standard errors of
Cameron, Gelbach, and Miller (2011) and the cluster bootstrap of Menzel (2021).
Our asymptotic distribution theory is the first which allows for this level of
inter-dependence among the observations. Under weak regularity conditions, we
demonstrate that the least squares estimator is asymptotically normal, our
proposed variance estimator is consistent, and t-ratios are asymptotically
standard normal, permitting conventional inference. We present simulation
evidence that confidence intervals constructed with our proposed standard
errors obtain superior coverage performance relative to existing methods. We
illustrate the relevance of the proposed method in an empirical application to
a standard Fama-French three-factor regression.
arXiv link: http://arxiv.org/abs/2201.11304v4
Micro-level Reserving for General Insurance Claims using a Long Short-Term Memory Network
insurance claims data are aggregated and structured in development triangles
for loss reserving. In the hope of extracting predictive power from the
individual claims characteristics, researchers have recently proposed to move
away from these macro-level methods in favor of micro-level loss reserving
approaches. We introduce a discrete-time individual reserving framework
incorporating granular information in a deep learning approach named Long
Short-Term Memory (LSTM) neural network. At each time period, the network has
two tasks: first, classifying whether there is a payment or a recovery, and
second, predicting the corresponding non-zero amount, if any. We illustrate the
estimation procedure on a simulated and a real general insurance dataset. We
compare our approach with the chain-ladder aggregate method using the
predictive outstanding loss estimates and their actual values. Based on a
generalized Pareto model for excess payments over a threshold, we adjust the
LSTM reserve prediction to account for extreme payments.
arXiv link: http://arxiv.org/abs/2201.13267v1
Bootstrap inference for fixed-effect models
effects is consistent but asymptotically-biased under rectangular-array
asymptotics. The literature has thus far concentrated its effort on devising
methods to correct the maximum-likelihood estimator for its bias as a means to
salvage standard inferential procedures. Instead, we show that the parametric
bootstrap replicates the distribution of the (uncorrected) maximum-likelihood
estimator in large samples. This justifies the use of confidence sets
constructed via standard bootstrap percentile methods. No adjustment for the
presence of bias needs to be made.
arXiv link: http://arxiv.org/abs/2201.11156v1
Instrumental variable estimation of dynamic treatment effects on a duration outcome
the time Z until a subject is treated on a survival outcome T. The treatment is
not randomly assigned, T is randomly right censored by a random variable C and
the time to treatment Z is right censored by min(T,C). The endogeneity issue is
treated using an instrumental variable explaining Z and independent of the
error term of the model. We study identification in a fully nonparametric
framework. We show that our specification generates an integral equation, of
which the regression function of interest is a solution. We provide
identification conditions that rely on this identification equation. For
estimation purposes, we assume that the regression function follows a
parametric model. We propose an estimation procedure and give conditions under
which the estimator is asymptotically normal. The estimators exhibit good
finite sample properties in simulations. Our methodology is applied to find
evidence supporting the efficacy of a therapy for burn-out.
arXiv link: http://arxiv.org/abs/2201.10826v6
Combining Experimental and Observational Data for Identification and Estimation of Long-Term Causal Effects
on a long-term outcome using data from an observational and an experimental
domain. The observational data are subject to unobserved confounding.
Furthermore, subjects in the experiment are only followed for a short period;
thus, long-term effects are unobserved, though short-term effects are
available. Consequently, neither data source alone suffices for causal
inference on the long-term outcome, necessitating a principled fusion of the
two. We propose three approaches for data fusion for the purpose of identifying
and estimating the causal effect. The first assumes equal confounding bias for
short-term and long-term outcomes. The second weakens this assumption by
leveraging an observed confounder for which the short-term and long-term
potential outcomes share the same partial additive association with this
confounder. The third approach employs proxy variables of the latent confounder
of the treatment-outcome relationship, extending the proximal causal inference
framework to the data fusion setting. For each approach, we develop influence
function-based estimators and analyze their robustness properties. We
illustrate our methods by estimating the effect of class size on 8th-grade SAT
scores using data from the Project STAR experiment combined with observational
data from the Early Childhood Longitudinal Study.
arXiv link: http://arxiv.org/abs/2201.10743v4
Modeling bid and ask price dynamics with an extended Hawkes process and its empirical applications for high-frequency stock market data
ask prices using an extended Hawkes process. The model incorporates the zero
intensities of the spread-narrowing processes at the minimum bid-ask spread,
spread-dependent intensities, possible negative excitement, and nonnegative
intensities. We apply the model to high-frequency best bid and ask price data
from US stock markets. The empirical findings demonstrate a spread-narrowing
tendency, excitations of the intensities caused by previous events, the impact
of flash crashes, characteristic trends in fast trading over time, and the
different features of market participants in the various exchanges.
arXiv link: http://arxiv.org/abs/2201.10173v1
Marginal Effects for Non-Linear Prediction Functions
interpretable feature effect. However, for non-linear models and especially
generalized linear models, the estimated coefficients cannot be interpreted as
a direct feature effect on the predicted outcome. Hence, marginal effects are
typically used as approximations for feature effects, either in the shape of
derivatives of the prediction function or forward differences in prediction due
to a change in a feature value. While marginal effects are commonly used in
many scientific fields, they have not yet been adopted as a model-agnostic
interpretation method for machine learning models. This may stem from their
inflexibility as a univariate feature effect and their inability to deal with
the non-linearities found in black box models. We introduce a new class of
marginal effects termed forward marginal effects. We argue to abandon
derivatives in favor of better-interpretable forward differences. Furthermore,
we generalize marginal effects based on forward differences to multivariate
changes in feature values. To account for the non-linearity of prediction
functions, we introduce a non-linearity measure for marginal effects. We argue
against summarizing feature effects of a non-linear prediction function in a
single metric such as the average marginal effect. Instead, we propose to
partition the feature space to compute conditional average marginal effects on
feature subspaces, which serve as conditional feature effect estimates.
arXiv link: http://arxiv.org/abs/2201.08837v1
Minimax-Regret Climate Policy with Deep Uncertainty in Climate Modeling and Intergenerational Discounting
climate policies that seek to reduce greenhouse gas emissions. Policy
comparisons have often been performed by considering a planner who seeks to
make optimal trade-offs between the costs of carbon abatement and the economic
damages from climate change. The planning problem has been formalized as one of
optimal control, the objective being to minimize the total costs of abatement
and damages over a time horizon. Studying climate policy as a control problem
presumes that a planner knows enough to make optimization feasible, but
physical and economic uncertainties abound. Earlier, Manski, Sanstad, and
DeCanio proposed and studied use of the minimax-regret (MMR) decision criterion
to account for deep uncertainty in climate modeling. Here we study choice of
climate policy that minimizes maximum regret with deep uncertainty regarding
both the correct climate model and the appropriate time discount rate to use in
intergenerational assessment of policy consequences. The analysis specifies a
range of discount rates to express both empirical and normative uncertainty
about the appropriate rate. The findings regarding climate policy are novel and
informative. The MMR analysis points to use of a relatively low discount rate
of 0.02 for climate policy. The MMR decision rule keeps the maximum future
temperature increase below 2C above the 1900-10 level for most of the parameter
values used to weight costs and damages.
arXiv link: http://arxiv.org/abs/2201.08826v1
High-Dimensional Sparse Multivariate Stochastic Volatility Models
accurate forecasts compared to the MGARCH models, their estimation techniques
such as Bayesian MCMC typically suffer from the curse of dimensionality. We
propose a fast and efficient estimation approach for MSV based on a penalized
OLS framework. Specifying the MSV model as a multivariate state space model, we
carry out a two-step penalized procedure. We provide the asymptotic properties
of the two-step estimator and the oracle property of the first-step estimator
when the number of parameters diverges. The performances of our method are
illustrated through simulations and financial data.
arXiv link: http://arxiv.org/abs/2201.08584v2
Estimation of Conditional Random Coefficient Models using Machine Learning Techniques
considered in the marginal density case under strict independence of RCs and
covariates. This paper deals with the estimation of RC-densities conditional on
a (large-dimensional) set of control variables using machine learning
techniques. The conditional RC-density allows to disentangle observable from
unobservable heterogeneity in partial effects of continuous treatments adding
to a growing literature on heterogeneous effect estimation using machine
learning. %It is also informative of the conditional potential outcome
distribution. This paper proposes a two-stage sieve estimation procedure. First
a closed-form sieve approximation of the conditional RC density is derived
where each sieve coefficient can be expressed as conditional expectation
function varying with controls. Second, sieve coefficients are estimated with
generic machine learning procedures and under appropriate sample splitting
rules. The $L_2$-convergence rate of the conditional RC-density estimator is
derived. The rate is slower by a factor then typical rates of mean regression
machine learning estimators which is due to the ill-posedness of the RC density
estimation problem. The performance and applicability of the estimator is
illustrated using random forest algorithms over a range of Monte Carlo
simulations and with real data from the SOEP-IS. Here behavioral heterogeneity
in an economic experiment on portfolio choice is studied. The method reveals
two types of behavior in the population, one type complying with economic
theory and one not. The assignment to types appears largely based on
unobservables not available in the data.
arXiv link: http://arxiv.org/abs/2201.08366v1
Learning with latent group sparsity via heat flow dynamics on networks
problems is a very general phenomenon, which has attracted broad interest from
practitioners and theoreticians alike. In this work we contribute an approach
to learning under such group structure, that does not require prior information
on the group identities. Our paradigm is motivated by the Laplacian geometry of
an underlying network with a related community structure, and proceeds by
directly incorporating this into a penalty that is effectively computed via a
heat flow-based local network dynamics. In fact, we demonstrate a procedure to
construct such a network based on the available data. Notably, we dispense with
computationally intensive pre-processing involving clustering of variables,
spectral or otherwise. Our technique is underpinned by rigorous theorems that
guarantee its effective performance and provide bounds on its sample
complexity. In particular, in a wide range of settings, it provably suffices to
run the heat flow dynamics for time that is only logarithmic in the problem
dimensions. We explore in detail the interfaces of our approach with key
statistical physics models in network science, such as the Gaussian Free Field
and the Stochastic Block Model. We validate our approach by successful
applications to real-world data from a wide array of application domains,
including computer science, genetics, climatology and economics. Our work
raises the possibility of applying similar diffusion-based techniques to
classical learning tasks, exploiting the interplay between geometric, dynamical
and stochastic structures underlying the data.
arXiv link: http://arxiv.org/abs/2201.08326v1
Identification of Direct Socio-Geographical Price Discrimination: An Empirical Study on iPhones
to prices among consumers to increase profits. The welfare effects of price
discrimination are not agreed on among economists, but identification of such
actions may contribute to our standing of firms' pricing behaviors. In this
letter, I use econometric tools to analyze whether Apple Inc, one of the
largest companies in the globe, is practicing price discrimination on the basis
of socio-economical and geographical factors. My results indicate that iPhones
are significantly (p $<$ 0.01) more expensive in markets where competitions are
weak or where Apple has a strong market presence. Furthermore, iPhone prices
are likely to increase (p $<$ 0.01) in developing countries/regions or markets
with high income inequality.
arXiv link: http://arxiv.org/abs/2201.07903v1
Asymptotic properties of Bayesian inference in linear regression with a structural break
inference about slope parameters $\gamma$ in linear regression models with a
structural break. In contrast to the conventional approach to inference about
$\gamma$ that does not take into account the uncertainty of the unknown break
location $\tau$, the Bayesian approach that we consider incorporates such
uncertainty. Our main theoretical contribution is a Bernstein-von Mises type
theorem (Bayesian asymptotic normality) for $\gamma$ under a wide class of
priors, which essentially indicates an asymptotic equivalence between the
conventional frequentist and Bayesian inference. Consequently, a frequentist
researcher could look at credible intervals of $\gamma$ to check robustness
with respect to the uncertainty of $\tau$. Simulation studies show that the
conventional confidence intervals of $\gamma$ tend to undercover in finite
samples whereas the credible intervals offer more reasonable coverages in
general. As the sample size increases, the two methods coincide, as predicted
from our theoretical conclusion. Using data from Paye and Timmermann (2006) on
stock return prediction, we illustrate that the traditional confidence
intervals on $\gamma$ might underrepresent the true sampling uncertainty.
arXiv link: http://arxiv.org/abs/2201.07319v1
Large Hybrid Time-Varying Parameter VARs
structural analysis and forecasting in settings involving a few endogenous
variables. Applying these models to high-dimensional datasets has proved to be
challenging due to intensive computations and over-parameterization concerns.
We develop an efficient Bayesian sparsification method for a class of models we
call hybrid TVP-VARs--VARs with time-varying parameters in some equations but
constant coefficients in others. Specifically, for each equation, the new
method automatically decides whether the VAR coefficients and contemporaneous
relations among variables are constant or time-varying. Using US datasets of
various dimensions, we find evidence that the parameters in some, but not all,
equations are time varying. The large hybrid TVP-VAR also forecasts better than
many standard benchmarks.
arXiv link: http://arxiv.org/abs/2201.07303v2
Bayesian inference of spatial and temporal relations in AI patents for EU countries
in European Union (EU) countries addressing spatial and temporal behaviour. In
particular, the models can quantitatively describe the interaction between
countries or explain the rapidly growing trends in AI patents. For spatial
analysis Poisson regression is used to explain collaboration between a pair of
countries measured by the number of common patents. Through Bayesian inference,
we estimated the strengths of interactions between countries in the EU and the
rest of the world. In particular, a significant lack of cooperation has been
identified for some pairs of countries.
Alternatively, an inhomogeneous Poisson process combined with the logistic
curve growth accurately models the temporal behaviour by an accurate trend
line. Bayesian analysis in the time domain revealed an upcoming slowdown in
patenting intensity.
arXiv link: http://arxiv.org/abs/2201.07168v1
Who Increases Emergency Department Use? New Insights from the Oregon Health Insurance Experiment
emergency department (ED) use from the Oregon experiment. We find meaningful
heterogeneous impacts of Medicaid on ED use using causal machine learning
methods. The individualized treatment effect distribution includes a wide range
of negative and positive values, suggesting the average effect masks
substantial heterogeneity. A small group-about 14% of participants-in the right
tail of the distribution drives the overall effect. We identify priority groups
with economically significant increases in ED usage based on demographics and
previous utilization. Intensive margin effects are an important driver of
increases in ED utilization.
arXiv link: http://arxiv.org/abs/2201.07072v4
The Time-Varying Multivariate Autoregressive Index Model
volatility, and Time Varying Vector Autoregressive Models are often used to
handle such complexity in the data. Unfortunately, when the number of series
grows, they present increasing estimation and interpretation problems. This
paper tries to address this issue proposing a new Multivariate Autoregressive
Index model that features time varying means and volatility. Technically, we
develop a new estimation methodology that mix switching algorithms with the
forgetting factors strategy of Koop and Korobilis (2012). This substantially
reduces the computational burden and allows to select or weight, in real time,
the number of common components and other features of the data using Dynamic
Model Selection or Dynamic Model Averaging without further computational cost.
Using USA macroeconomic data, we provide a structural analysis and a
forecasting exercise that demonstrates the feasibility and usefulness of this
new model.
Keywords: Large datasets, Multivariate Autoregressive Index models,
Stochastic volatility, Bayesian VARs.
arXiv link: http://arxiv.org/abs/2201.07069v1
Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
available for the purposes of advertising measurement. Non-experimental data is
thus required. However, Facebook and other ad platforms use complex and
evolving processes to select ads for users. Therefore, successful
non-experimental approaches need to "undo" this selection. We analyze 663
large-scale experiments at Facebook to investigate whether this is possible
with the data typically logged at large ad platforms. With access to over 5,000
user-level features, these data are richer than what most advertisers or their
measurement partners can access. We investigate how accurately two
non-experimental methods -- double/debiased machine learning (DML) and
stratified propensity score matching (SPSM) -- can recover the experimental
effects. Although DML performs better than SPSM, neither method performs well,
even using flexible deep learning models to implement the propensity and
outcome models. The median RCT lifts are 29%, 18%, and 5% for the upper,
middle, and lower funnel outcomes, respectively. Using DML (SPSM), the median
lift by funnel is 83% (173%), 58% (176%), and 24% (64%), respectively,
indicating significant relative measurement errors. We further characterize the
circumstances under which each method performs comparatively better. Overall,
despite having access to large-scale experiments and rich user-level data, we
are unable to reliably estimate an ad campaign's causal effect.
arXiv link: http://arxiv.org/abs/2201.07055v2
Socioeconomic disparities and COVID-19: the causal connections
various ways. With the increasing use of machine learning based models in
computational socioeconomics, explaining these models while taking causal
connections into account is a necessity. In this work, we advocate the use of
an explanatory framework from cooperative game theory augmented with $do$
calculus, namely causal Shapley values. Using causal Shapley values, we analyze
socioeconomic disparities that have a causal link to the spread of COVID-19 in
the USA. We study several phases of the disease spread to show how the causal
connections change over time. We perform a causal analysis using random effects
models and discuss the correspondence between the two methods to verify our
results. We show the distinct advantages a non-linear machine learning models
have over linear models when performing a multivariate analysis, especially
since the machine learning models can map out non-linear correlations in the
data. In addition, the causal Shapley values allow for including the causal
structure in the variable importance computed for the machine learning model.
arXiv link: http://arxiv.org/abs/2201.07026v1
Difference-in-Differences Estimators for Treatments Continuously Distributed at Every Period
the treatment is often continuously distributed in every period. We propose
difference-in-differences (DID) estimators for such cases. We assume that
between consecutive periods, the treatment of some units, the switchers,
changes, while the treatment of other units, the stayers, remains constant. We
show that under a parallel-trends assumption, the slopes of switchers'
potential outcomes are nonparametrically identified by
difference-in-differences estimands comparing the outcome evolutions of
switchers and stayers with the same baseline treatment. Controlling for the
baseline treatment ensures that our estimands remain valid if the treatment's
effect changes over time. We consider two weighted averages of switchers'
slopes, and discuss their respective advantages. For each weighted average, we
propose a doubly-robust, nonparametric, and $n$-consistent estimator. We
generalize our results to the instrumental-variable case. We apply our method
to estimate the price-elasticity of gasoline consumption.
arXiv link: http://arxiv.org/abs/2201.06898v6
Homophily in preferences or meetings? Identifying and estimating an iterative network formation model
homogeneity (preferences) or by a higher probability of meeting individuals
with similar attributes (opportunity)? This paper studies identification and
estimation of an iterative network game that distinguishes between these two
mechanisms. Our approach enables us to assess the counterfactual effects of
changing the meeting protocol between agents. As an application, we study the
role of preferences and meetings in shaping classroom friendship networks in
Brazil. In a network structure in which homophily due to preferences is
stronger than homophily due to meeting opportunities, tracking students may
improve welfare. Still, the relative benefit of this policy diminishes over the
school year.
arXiv link: http://arxiv.org/abs/2201.06694v5
An Entropy-Based Approach for Nonparametrically Testing Simple Probability Distribution Hypotheses
entropy-based testing procedure that can be used to assess the validity of
simple hypotheses about a specific parametric population distribution. The
testing methodology relies on the characteristic function of the population
probability distribution being tested and is attractive in that, regardless of
the null hypothesis being tested, it provides a unified framework for
conducting such tests. The testing procedure is also computationally tractable
and relatively straightforward to implement. In contrast to some alternative
test statistics, the proposed entropy test is free from user-specified kernel
and bandwidth choices, idiosyncratic and complex regularity conditions, and/or
choices of evaluation grids. Several simulation exercises were performed to
document the empirical performance of our proposed test, including a regression
example that is illustrative of how, in some contexts, the approach can be
applied to composite hypothesis-testing situations via data transformations.
Overall, the testing procedure exhibits notable promise, exhibiting appreciable
increasing power as sample size increases for a number of alternative
distributions when contrasted with hypothesized null distributions. Possible
general extensions of the approach to composite hypothesis-testing contexts,
and directions for future work are also discussed.
arXiv link: http://arxiv.org/abs/2201.06647v1
Inferential Theory for Granular Instrumental Variables in High Dimensions
factor error structures to construct instruments to estimate structural time
series models with endogeneity even after controlling for latent factors. We
extend the GIV methodology in several dimensions. First, we extend the
identification procedure to a large $N$ and large $T$ framework, which depends
on the asymptotic Herfindahl index of the size distribution of $N$
cross-sectional units. Second, we treat both the factors and loadings as
unknown and show that the sampling error in the estimated instrument and
factors is negligible when considering the limiting distribution of the
structural parameters. Third, we show that the sampling error in the
high-dimensional precision matrix is negligible in our estimation algorithm.
Fourth, we overidentify the structural parameters with additional constructed
instruments, which leads to efficiency gains. Monte Carlo evidence is presented
to support our asymptotic theory and application to the global crude oil market
leads to new results.
arXiv link: http://arxiv.org/abs/2201.06605v2
On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation
Markov decision process with continuous states and actions. We recast the
$Q$-function estimation into a special form of the nonparametric instrumental
variables (NPIV) estimation problem. We first show that under one mild
condition the NPIV formulation of $Q$-function estimation is well-posed in the
sense of $L^2$-measure of ill-posedness with respect to the data generating
distribution, bypassing a strong assumption on the discount factor $\gamma$
imposed in the recent literature for obtaining the $L^2$ convergence rates of
various $Q$-function estimators. Thanks to this new well-posed property, we
derive the first minimax lower bounds for the convergence rates of
nonparametric estimation of $Q$-function and its derivatives in both sup-norm
and $L^2$-norm, which are shown to be the same as those for the classical
nonparametric regression (Stone, 1982). We then propose a sieve two-stage least
squares estimator and establish its rate-optimality in both norms under some
mild conditions. Our general results on the well-posedness and the minimax
lower bounds are of independent interest to study not only other nonparametric
estimators for $Q$-function but also efficient estimation on the value of any
target policy in off-policy settings.
arXiv link: http://arxiv.org/abs/2201.06169v3
Nonparametric Identification of Random Coefficients in Endogenous and Heterogeneous Aggregate Demand Models
for differentiated products with heterogeneous consumers. We consider a general
class of models that allows for the individual specific coefficients to vary
continuously across the population and give conditions under which the density
of these coefficients, and hence also functionals such as welfare measures, is
identified. A key finding is that two leading models, the BLP-model (Berry,
Levinsohn, and Pakes, 1995) and the pure characteristics model (Berry and
Pakes, 2007), require considerably different conditions on the support of the
product characteristics.
arXiv link: http://arxiv.org/abs/2201.06140v1
Treatment Effect Risk: Bounds and Inference
welfare, even if positive, there is a risk of negative effect on, say, some 10%
of the population. Assessing such risk is difficult, however, because any one
individual treatment effect (ITE) is never observed, so the 10% worst-affected
cannot be identified, while distributional treatment effects only compare the
first deciles within each treatment group, which does not correspond to any
10%-subpopulation. In this paper we consider how to nonetheless assess this
important risk measure, formalized as the conditional value at risk (CVaR) of
the ITE-distribution. We leverage the availability of pre-treatment covariates
and characterize the tightest-possible upper and lower bounds on ITE-CVaR given
by the covariate-conditional average treatment effect (CATE) function. We then
proceed to study how to estimate these bounds efficiently from data and
construct confidence intervals. This is challenging even in randomized
experiments as it requires understanding the distribution of the unknown CATE
function, which can be very complex if we use rich covariates so as to best
control for heterogeneity. We develop a debiasing method that overcomes this
and prove it enjoys favorable statistical properties even when CATE and other
nuisances are estimated by black-box machine learning or even inconsistently.
Studying a hypothetical change to French job-search counseling services, our
bounds and inference demonstrate a small social benefit entails a negative
impact on a substantial subpopulation.
arXiv link: http://arxiv.org/abs/2201.05893v2
Measuring Changes in Disparity Gaps: An Application to Health Insurance
groups, such as the gender or Black-white gap. We first show that the reduction
in disparities between groups can be written as the difference in conditional
average treatment effects (CATE) for each group. Then, using a
Kitagawa-Oaxaca-Blinder-style decomposition, we highlight how these CATE can be
decomposed into unexplained differences in CATE in other observables versus
differences in composition across other observables (e.g. the "endowment").
Finally, we apply this approach to study the impact of Medicare on American's
access to health insurance.
arXiv link: http://arxiv.org/abs/2201.05672v1
Monitoring the Economy in Real Time: Trends and Gaps in Real Activity and Prices
time series model for evaluating the output potential, output gap, Phillips
curve, and Okun's law for the US. The baseline model uses minimal theory-based
multivariate identification restrictions to inform trend-cycle decomposition,
while the alternative model adds the CBO's output gap measure as an observed
variable. The latter model results in a smoother output potential and lower
cyclical correlation between inflation and real variables but performs worse in
forecasting beyond the short term. This methodology allows for the assessment
and real-time monitoring of official trend and gap estimates.
arXiv link: http://arxiv.org/abs/2201.05556v2
Detecting Multiple Structural Breaks in Systems of Linear Regression Equations with Integrated and Stationary Regressors
estimator in combination with a backward elimination algorithm to detect
multiple structural breaks in linear regressions with multivariate responses.
Applying the two-step estimator, we jointly detect the number and location of
structural breaks, and provide consistent estimates of the coefficients. Our
framework is flexible enough to allow for a mix of integrated and stationary
regressors, as well as deterministic terms. Using simulation experiments, we
show that the proposed two-step estimator performs competitively against the
likelihood-based approach (Qu and Perron, 2007; Li and Perron, 2017; Oka and
Perron, 2018) in finite samples. However, the two-step estimator is
computationally much more efficient. An economic application to the
identification of structural breaks in the term structure of interest rates
illustrates this methodology.
arXiv link: http://arxiv.org/abs/2201.05430v4
Kernel methods for long term dose response curves
of possibly continuous actions, from short term experimental data. It arises in
artificial intelligence: the long term consequences of continuous actions may
be of interest, yet only short term rewards may be collected in exploration.
For this estimand, called the long term dose response curve, we propose a
simple nonparametric estimator based on kernel ridge regression. By embedding
the distribution of the short term experimental data with kernels, we derive
interpretable weights for extrapolating long term effects. Our method allows
actions, short term rewards, and long term rewards to be continuous in general
spaces. It also allows for nonlinearity and heterogeneity in the link between
short term effects and long term effects. We prove uniform consistency, with
nonasymptotic error bounds reflecting the effective dimension of the data. As
an application, we estimate the long term dose response curve of Project STAR,
a social program which randomly assigned students to various class sizes. We
extend our results to long term counterfactual distributions, proving weak
convergence.
arXiv link: http://arxiv.org/abs/2201.05139v2
Binary response model with many weak instruments
instruments. We employ a control function approach and a regularization scheme
to obtain better estimation results for the endogenous binary response model in
the presence of many weak instruments. Two consistent and asymptotically
normally distributed estimators are provided, each of which is called a
regularized conditional maximum likelihood estimator (RCMLE) and a regularized
nonlinear least squares estimator (RNLSE). Monte Carlo simulations show that
the proposed estimators outperform the existing ones when there are many weak
instruments. We use the proposed estimation method to examine the effect of
family income on college completion.
arXiv link: http://arxiv.org/abs/2201.04811v4
Optimal Best Arm Identification in Two-Armed Bandits with a Fixed Budget under a Small Gap
problems. One of the longstanding open questions is the existence of an optimal
strategy under which the probability of misidentification matches a lower
bound. We show that a strategy following the Neyman allocation rule (Neyman,
1934) is asymptotically optimal when the gap between the expected rewards is
small. First, we review a lower bound derived by Kaufmann et al. (2016). Then,
we propose the "Neyman Allocation (NA)-Augmented Inverse Probability weighting
(AIPW)" strategy, which consists of the sampling rule using the Neyman
allocation with an estimated standard deviation and the recommendation rule
using an AIPW estimator. Our proposed strategy is optimal because the upper
bound matches the lower bound when the budget goes to infinity and the gap goes
to zero.
arXiv link: http://arxiv.org/abs/2201.04469v8
A machine learning search for optimal GARCH parameters
efficiencies of fitting GARCH model parameters to empirical data. We employ an
Artificial Neural Network (ANN) to predict the parameters of these models. We
present a fitting algorithm for GARCH-normal(1,1) models to predict one of the
model's parameters, $\alpha_1$ and then use the analytical expressions for the
fourth order standardised moment, $\Gamma_4$ and the unconditional second order
moment, $\sigma^2$ to fit the other two parameters; $\beta_1$ and $\alpha_0$,
respectively. The speed of fitting of the parameters and quick implementation
of this approach allows for real time tracking of GARCH parameters. We further
show that different inputs to the ANN namely, higher order standardised moments
and the autocovariance of time series can be used for fitting model parameters
using the ANN, but not always with the same level of accuracy.
arXiv link: http://arxiv.org/abs/2201.03286v1
Approximate Factor Models for Functional Time Series
time-dependent curve data. Our model decomposes such data into two distinct
components: a low-dimensional predictable factor component and an unpredictable
error term. These components are identified through the autocovariance
structure of the underlying functional time series. The model parameters are
consistently estimated using the eigencomponents of a cumulative autocovariance
operator and an information criterion is proposed to determine the appropriate
number of factors. Applications to mortality and yield curve modeling
illustrate key advantages of our approach over the widely used functional
principal component analysis, as it offers parsimonious structural
representations of the underlying dynamics along with gains in out-of-sample
forecast performance.
arXiv link: http://arxiv.org/abs/2201.02532v4
Microeconomic Foundations of Decentralised Organisations
provide a fundamental change in the structure and dynamics of organisations.
The works of R.H.Coase and M. Olson, on the nature of the firm and the logic of
collective action, respectively, are revisited under the light of these
emerging new digital foundations. We also analyse how these technologies can
affect the fundamental assumptions on the role of organisations (either private
or public) as mechanisms for the coordination of labour. We propose that these
technologies can fundamentally affect: (i) the distribution of rewards within
an organisation and (ii) the structure of its transaction costs. These changes
bring the potential for addressing some of the trade-offs between the private
and public sectors.
arXiv link: http://arxiv.org/abs/2201.07666v2
Unconditional Effects of General Policy Interventions
intervention, which includes location-scale shifts and simultaneous shifts as
special cases. The location-scale shift is intended to study a counterfactual
policy aimed at changing not only the mean or location of a covariate but also
its dispersion or scale. The simultaneous shift refers to the situation where
shifts in two or more covariates take place simultaneously. For example, a
shift in one covariate is compensated at a certain rate by a shift in another
covariate. Not accounting for these possible scale or simultaneous shifts will
result in an incorrect assessment of the potential policy effects on an outcome
variable of interest. The unconditional policy parameters are estimated with
simple semiparametric estimators, for which asymptotic properties are studied.
Monte Carlo simulations are implemented to study their finite sample
performances. The proposed approach is applied to a Mincer equation to study
the effects of changing years of education on wages and to study the effect of
smoking during pregnancy on birth weight.
arXiv link: http://arxiv.org/abs/2201.02292v3
NumHTML: Numeric-Oriented Hierarchical Transformer Model for Multi-task Financial Forecasting
learning research because of the challenges it presents and the potential
rewards that even minor improvements in prediction accuracy or forecasting may
entail. Traditionally, financial forecasting has heavily relied on quantitative
indicators and metrics derived from structured financial statements. Earnings
conference call data, including text and audio, is an important source of
unstructured data that has been used for various prediction tasks using deep
earning and related approaches. However, current deep learning-based methods
are limited in the way that they deal with numeric data; numbers are typically
treated as plain-text tokens without taking advantage of their underlying
numeric structure. This paper describes a numeric-oriented hierarchical
transformer model to predict stock returns, and financial risk using
multi-modal aligned earnings calls data by taking advantage of the different
categories of numbers (monetary, temporal, percentages etc.) and their
magnitude. We present the results of a comprehensive evaluation of NumHTML
against several state-of-the-art baselines using a real-world publicly
available dataset. The results indicate that NumHTML significantly outperforms
the current state-of-the-art across a variety of evaluation metrics and that it
has the potential to offer significant financial gains in a practical trading
context.
arXiv link: http://arxiv.org/abs/2201.01770v1
What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature
difference-in-differences (DiD) and provides concrete recommendations for
practitioners. We begin by articulating a simple set of “canonical”
assumptions under which the econometrics of DiD are well-understood. We then
argue that recent advances in DiD methods can be broadly classified as relaxing
some components of the canonical DiD setup, with a focus on $(i)$ multiple
periods and variation in treatment timing, $(ii)$ potential violations of
parallel trends, or $(iii)$ alternative frameworks for inference. Our
discussion highlights the different ways that the DiD literature has advanced
beyond the canonical model, and helps to clarify when each of the papers will
be relevant for empirical work. We conclude by discussing some promising areas
for future research.
arXiv link: http://arxiv.org/abs/2201.01194v3
A Multivariate Dependence Analysis for Electricity Prices, Demand and Renewable Energy Sources
renewable energy sources by means of a multivariate copula model {while
studying Germany, the widest studied market in Europe}. The inter-dependencies
are investigated in-depth and monitored over time, with particular emphasis on
the tail behavior. To this end, suitable tail dependence measures are
introduced to take into account a multivariate extreme scenario appropriately
identified {through the} Kendall's distribution function. The empirical
evidence demonstrates a strong association between electricity prices,
renewable energy sources, and demand within a day and over the studied years.
Hence, this analysis provides guidance for further and different incentives for
promoting green energy generation while considering the time-varying
dependencies of the involved variables
arXiv link: http://arxiv.org/abs/2201.01132v1
Efficient Likelihood-based Estimation via Annealing for Dynamic Structural Macrofinance Models
non-Gaussian state-space models with high-dimensional and complex structures.
We propose an annealed controlled sequential Monte Carlo method that delivers
numerically stable and low variance estimators of the likelihood function. The
method relies on an annealing procedure to gradually introduce information from
observations and constructs globally optimal proposal distributions by solving
associated optimal control problems that yield zero variance likelihood
estimators. To perform parameter inference, we develop a new adaptive SMC$^2$
algorithm that employs likelihood estimators from annealed controlled
sequential Monte Carlo. We provide a theoretical stability analysis that
elucidates the advantages of our methodology and asymptotic results concerning
the consistency and convergence rates of our SMC$^2$ estimators. We illustrate
the strengths of our proposed methodology by estimating two popular
macrofinance models: a non-linear new Keynesian dynamic stochastic general
equilibrium model and a non-linear non-Gaussian consumption-based long-run risk
model.
arXiv link: http://arxiv.org/abs/2201.01094v1
A Double Robust Approach for Non-Monotone Missingness in Multi-Stage Data
to deal with in empirical studies. The traditional Missing at Random (MAR)
assumption is difficult to justify in such cases. Previous studies have
strengthened the MAR assumption, suggesting that the missing mechanism of any
variable is random when conditioned on a uniform set of fully observed
variables. However, empirical evidence indicates that this assumption may be
violated for variables collected at different stages. This paper proposes a new
MAR-type assumption that fits non-monotone missing scenarios involving
multi-stage variables. Based on this assumption, we construct an Augmented
Inverse Probability Weighted GMM (AIPW-GMM) estimator. This estimator features
an asymmetric format for the augmentation term, guarantees double robustness,
and achieves the closed-form semiparametric efficiency bound. We apply this
method to cases of missingness in both endogenous regressor and outcome, using
the Oregon Health Insurance Experiment as an example. We check the correlation
between missing probabilities and partially observed variables to justify the
assumption. Moreover, we find that excluding incomplete data results in a loss
of efficiency and insignificant estimators. The proposed estimator reduces the
standard error by more than 50% for the estimated effects of the Oregon Health
Plan on the elderly.
arXiv link: http://arxiv.org/abs/2201.01010v2
Deep Learning and Linear Programming for Automated Ensemble Forecasting and Interpretation
on the M4 Competition dataset by decreasing feature and model selection
assumptions, termed DONUT (DO Not UTilize human beliefs). Our assumption
reductions, primarily consisting of auto-generated features and a more diverse
model pool for the ensemble, significantly outperform the statistical,
feature-based ensemble method FFORMA by Montero-Manso et al. (2020). We also
investigate feature extraction with a Long Short-term Memory Network (LSTM)
Autoencoder and find that such features contain crucial information not
captured by standard statistical feature approaches. The ensemble weighting
model uses LSTM and statistical features to combine the models accurately. The
analysis of feature importance and interaction shows a slight superiority for
LSTM features over the statistical ones alone. Clustering analysis shows that
essential LSTM features differ from most statistical features and each other.
We also find that increasing the solution space of the weighting model by
augmenting the ensemble with new models is something the weighting model learns
to use, thus explaining part of the accuracy gains. Moreover, we present a
formal ex-post-facto analysis of an optimal combination and selection for
ensembles, quantifying differences through linear optimization on the M4
dataset. Our findings indicate that classical statistical time series features,
such as trend and seasonality, alone do not capture all relevant information
for forecasting a time series. On the contrary, our novel LSTM features contain
significantly more predictive power than the statistical ones alone, but
combining the two feature sets proved the best in practice.
arXiv link: http://arxiv.org/abs/2201.00426v2
Modelling Cournot Games as Multi-agent Multi-armed Bandits
for modeling repeated Cournot oligopoly games, where the firms acting as agents
choose from the set of arms representing production quantity (a discrete
value). Agents interact with separate and independent bandit problems. In this
formulation, each agent makes sequential choices among arms to maximize its own
reward. Agents do not have any information about the environment; they can only
see their own rewards after taking an action. However, the market demand is a
stationary function of total industry output, and random entry or exit from the
market is not allowed. Given these assumptions, we found that an
$\epsilon$-greedy approach offers a more viable learning mechanism than other
traditional MAB approaches, as it does not require any additional knowledge of
the system to operate. We also propose two novel approaches that take advantage
of the ordered action space: $\epsilon$-greedy+HL and $\epsilon$-greedy+EL.
These new approaches help firms to focus on more profitable actions by
eliminating less profitable choices and hence are designed to optimize the
exploration. We use computer simulations to study the emergence of various
equilibria in the outcomes and do the empirical analysis of joint cumulative
regrets.
arXiv link: http://arxiv.org/abs/2201.01182v1
Auction Throttling and Causal Inference of Online Advertising Effects
because experimentation is expensive, and observational data lacks random
variation. This paper identifies a pervasive source of naturally occurring,
quasi-experimental variation in user-level ad-exposure in digital advertising
campaigns. It shows how this variation can be utilized by ad-publishers to
identify the causal effect of advertising campaigns. The variation pertains to
auction throttling, a probabilistic method of budget pacing that is widely used
to spread an ad-campaign`s budget over its deployed duration, so that the
campaign`s budget is not exceeded or overly concentrated in any one period. The
throttling mechanism is implemented by computing a participation probability
based on the campaign`s budget spending rate and then including the campaign in
a random subset of available ad-auctions each period according to this
probability. We show that access to logged-participation probabilities enables
identifying the local average treatment effect (LATE) in the ad-campaign. We
present a new estimator that leverages this identification strategy and outline
a bootstrap procedure for quantifying its variability. We apply our method to
real-world ad-campaign data from an e-commerce advertising platform, which uses
such throttling for budget pacing. We show our estimate is statistically
different from estimates derived using other standard observational methods
such as OLS and two-stage least squares estimators. Our estimated conversion
lift is 110%, a more plausible number than 600%, the conversion lifts estimated
using naive observational methods.
arXiv link: http://arxiv.org/abs/2112.15155v2
Estimating a Continuous Treatment Model with Spillovers: A Control Function Approach
spillovers through social networks. We assume that one's outcome is affected
not only by his/her own treatment but also by a (weighted) average of his/her
neighbors' treatments, both of which are treated as endogenous variables. Using
a control function approach with appropriate instrumental variables, we show
that the conditional mean potential outcome can be nonparametrically
identified. We also consider a more empirically tractable semiparametric model
and develop a three-step estimation procedure for this model. As an empirical
illustration, we investigate the causal effect of the regional unemployment
rate on the crime rate.
arXiv link: http://arxiv.org/abs/2112.15114v3
Modeling and Forecasting Intraday Market Returns: a Machine Learning Approach
measures through machine learning methods in a high-frequency environment. We
implement a minute-by-minute rolling window intraday estimation method using
two nonlinear models: Long-Short-Term Memory (LSTM) neural networks and Random
Forests (RF). Our estimations show that the CBOE Volatility Index (VIX) is the
strongest candidate predictor for intraday market returns in our analysis,
specially when implemented through the LSTM model. This model also improves
significantly the performance of the lagged market return as predictive
variable. Finally, intraday RF estimation outputs indicate that there is no
performance improvement with this method, and it may even worsen the results in
some cases.
arXiv link: http://arxiv.org/abs/2112.15108v1
An Analysis of an Alternative Pythagorean Expected Win Percentage Model: Applications Using Major League Baseball Team Quality Simulations
information loss from misspecification and outperform the Pythagorean model.
This article aims to use simulated data to select the optimal expected win
percentage model among the choice of relevant alternatives. The choices include
the traditional Pythagorean model and the difference-form contest success
function (CSF). Method. We simulate 1,000 iterations of the 2014 MLB season for
the purpose of estimating and analyzing alternative models of expected win
percentage (team quality). We use the open-source, Strategic Baseball Simulator
and develop an AutoHotKey script that programmatically executes the SBS
application, chooses the correct settings for the 2014 season, enters a unique
ID for the simulation data file, and iterates these steps 1,000 times. We
estimate expected win percentage using the traditional Pythagorean model, as
well as the difference-form CSF model that is used in game theory and public
choice economics. Each model is estimated while accounting for fixed (team)
effects. We find that the difference-form CSF model outperforms the traditional
Pythagorean model in terms of explanatory power and in terms of
misspecification-based information loss as estimated by the Akaike Information
Criterion. Through parametric estimation, we further confirm that the simulator
yields realistic statistical outcomes. The simulation methodology offers the
advantage of greatly improved sample size. As the season is held constant, our
simulation-based statistical inference also allows for estimation and model
comparison without the (time series) issue of non-stationarity. The results
suggest that improved win (productivity) estimation can be achieved through
alternative CSF specifications.
arXiv link: http://arxiv.org/abs/2112.14846v1
Volatility of volatility estimation: central limit theorems for the Fourier transform estimator and empirical study of the daily time series stylized facts
integrated volatility of volatility based on the Fourier methodology, which
does not require the pre-estimation of the spot volatility. We show that the
bias-corrected estimator reaches the optimal rate $n^{1/4}$, while the
estimator without bias-correction has a slower convergence rate and a smaller
asymptotic variance. Additionally, we provide simulation results that support
the theoretical asymptotic distribution of the rate-efficient estimator and
show the accuracy of the latter in comparison with a rate-optimal estimator
based on the pre-estimation of the spot volatility. Finally, using the
rate-optimal Fourier estimator, we reconstruct the time series of the daily
volatility of volatility of the S&P500 and EUROSTOXX50 indices over long
samples and provide novel insight into the existence of stylized facts about
the volatility of volatility dynamics.
arXiv link: http://arxiv.org/abs/2112.14529v3
Nested Nonparametric Instrumental Variable Regression
nested nonparametric instrumental variable regression (nested NPIV). Recent
examples include mediated, time varying, and long term treatment effects
identified using proxy variables. In econometrics, examples arise in triangular
simultaneous equations and hedonic price systems. However, it appears that
explicit mean square convergence rates for nested NPIV are unknown, preventing
inference on some of these parameters with generic machine learning. A major
challenge is compounding ill posedness due to the nested inverse problems. To
limit how ill posedness compounds, we introduce two techniques: relative well
posedness, and multiple robustness to ill posedness. With these techniques, we
provide explicit mean square rates for nested NPIV and efficient inference for
recently identified causal parameters. Our nonasymptotic analysis accommodates
neural networks, random forests, and reproducing kernel Hilbert spaces. It
extends to causal functions, e.g. heterogeneous long term treatment effects.
arXiv link: http://arxiv.org/abs/2112.14249v4
Random Rank-Dependent Expected Utility
for finite datasets and finite prizes. The test lends itself to statistical
testing using the tools in Kitamura and Stoye (2018).
arXiv link: http://arxiv.org/abs/2112.13649v1
Estimation based on nearest neighbor matching: from density ratio to average treatment effect
groups is both conceptually natural and practically well-used. In a landmark
paper, Abadie and Imbens (2006) provided the first large-sample analysis of NN
matching under, however, a crucial assumption that the number of NNs, $M$, is
fixed. This manuscript reveals something new out of their study and shows that,
once allowing $M$ to diverge with the sample size, an intrinsic statistic in
their analysis actually constitutes a consistent estimator of the density
ratio. Furthermore, through selecting a suitable $M$, this statistic can attain
the minimax lower bound of estimation over a Lipschitz density function class.
Consequently, with a diverging $M$, the NN matching provably yields a doubly
robust estimator of the average treatment effect and is semiparametrically
efficient if the density functions are sufficiently smooth and the outcome
model is appropriately specified. It can thus be viewed as a precursor of
double machine learning estimators.
arXiv link: http://arxiv.org/abs/2112.13506v1
Multiple Randomization Designs
classical randomized controlled trial (RCT), or A/B test, a randomly selected
subset of a population of units (e.g., individuals, plots of land, or
experiences) is assigned to a treatment (treatment A), and the remainder of the
population is assigned to the control treatment (treatment B). The difference
in average outcome by treatment group is an estimate of the average effect of
the treatment. However, motivating our study, the setting for modern
experiments is often different, with the outcomes and treatment assignments
indexed by multiple populations. For example, outcomes may be indexed by buyers
and sellers, by content creators and subscribers, by drivers and riders, or by
travelers and airlines and travel agents, with treatments potentially varying
across these indices. Spillovers or interference can arise from interactions
between units across populations. For example, sellers' behavior may depend on
buyers' treatment assignment, or vice versa. This can invalidate the simple
comparison of means as an estimator for the average effect of the treatment in
classical RCTs. We propose new experiment designs for settings in which
multiple populations interact. We show how these designs allow us to study
questions about interference that cannot be answered by classical randomized
experiments. Finally, we develop new statistical methods for analyzing these
Multiple Randomization Designs.
arXiv link: http://arxiv.org/abs/2112.13495v1
Long Story Short: Omitted Variable Bias in Causal Machine Learning
common causal parameters, including (but not limited to) averages of potential
outcomes, average treatment effects, average causal derivatives, and policy
effects from covariate shifts. Our theory applies to nonparametric models,
while naturally allowing for (semi-)parametric restrictions (such as partial
linearity) when such assumptions are made. We show how simple plausibility
judgments on the maximum explanatory power of omitted variables are sufficient
to bound the magnitude of the bias, thus facilitating sensitivity analysis in
otherwise complex, nonlinear models. Finally, we provide flexible and efficient
statistical inference methods for the bounds, which can leverage modern machine
learning algorithms for estimation. These results allow empirical researchers
to perform sensitivity analyses in a flexible class of machine-learned causal
models using very simple, and interpretable, tools. We demonstrate the utility
of our approach with two empirical examples.
arXiv link: http://arxiv.org/abs/2112.13398v5
Robust Estimation of Average Treatment Effects from Panel Data
over time, it is important to correctly estimate the average treatment effect
(ATE) measure. Due to lack of robustness of the existing procedures of
estimating ATE from panel data, in this paper, we introduce a robust estimator
of the ATE and the subsequent inference procedures using the popular approach
of minimum density power divergence inference. Asymptotic properties of the
proposed ATE estimator are derived and used to construct robust test statistics
for testing parametric hypotheses related to the ATE. Besides asymptotic
analyses of efficiency and powers, extensive simulation studies are conducted
to study the finite-sample performances of our proposed estimation and testing
procedures under both pure and contaminated data. The robustness of the ATE
estimator is further investigated theoretically through the influence functions
analyses. Finally our proposal is applied to study the long-term economic
effects of the 2004 Indian Ocean earthquake and tsunami on the (per-capita)
gross domestic products (GDP) of five mostly affected countries, namely
Indonesia, Sri Lanka, Thailand, India and Maldives.
arXiv link: http://arxiv.org/abs/2112.13228v2
Bayesian Approaches to Shrinkage and Sparse Estimation
complexity, creating the need for richer statistical models. This trend is also
true for economic data, where high-dimensional and nonlinear/nonparametric
inference is the norm in several fields of applied econometric work. The
purpose of this paper is to introduce the reader to the world of Bayesian model
determination, by surveying modern shrinkage and variable selection algorithms
and methodologies. Bayesian inference is a natural probabilistic framework for
quantifying uncertainty and learning about model parameters, and this feature
is particularly important for inference in modern models of high dimensions and
increased complexity.
We begin with a linear regression setting in order to introduce various
classes of priors that lead to shrinkage/sparse estimators of comparable value
to popular penalized likelihood estimators (e.g.\ ridge, lasso). We explore
various methods of exact and approximate inference, and discuss their pros and
cons. Finally, we explore how priors developed for the simple regression
setting can be extended in a straightforward way to various classes of
interesting econometric models. In particular, the following case-studies are
considered, that demonstrate application of Bayesian shrinkage and variable
selection strategies to popular econometric contexts: i) vector autoregressive
models; ii) factor models; iii) time-varying parameter regressions; iv)
confounder selection in treatment effects models; and v) quantile regression
models. A MATLAB package and an accompanying technical manual allow the reader
to replicate many of the algorithms described in this review.
arXiv link: http://arxiv.org/abs/2112.11751v1
Doubly-Valid/Doubly-Sharp Sensitivity Analysis for Causal Inference with Unmeasured Confounding
effect (ATE) when unmeasured confounders exist but have bounded influence.
Specifically, we assume that omitted confounders could not change the odds of
treatment for any unit by more than a fixed factor. We derive the sharp partial
identification bounds implied by this assumption by leveraging distributionally
robust optimization, and we propose estimators of these bounds with several
novel robustness properties. The first is double sharpness: our estimators
consistently estimate the sharp ATE bounds when one of two nuisance parameters
is misspecified and achieve semiparametric efficiency when all nuisance
parameters are suitably consistent. The second is double validity: even when
most nuisance parameters are misspecified, our estimators still provide valid
but possibly conservative bounds for the ATE and our Wald confidence intervals
remain valid even when our estimators are not asymptotically normal. As a
result, our estimators provide a highly credible method for sensitivity
analysis of causal inferences.
arXiv link: http://arxiv.org/abs/2112.11449v2
Efficient Estimation of State-Space Mixed-Frequency VARs: A Precision-Based Approach
nowcasting. Despite their popularity, estimating such models can be
computationally intensive, especially for large systems with stochastic
volatility. To tackle the computational challenges, we propose two novel
precision-based samplers to draw the missing observations of the low-frequency
variables in these models, building on recent advances in the band and sparse
matrix algorithms for state-space models. We show via a simulation study that
the proposed methods are more numerically accurate and computationally
efficient compared to standard Kalman-filter based methods. We demonstrate how
the proposed method can be applied in two empirical macroeconomic applications:
estimating the monthly output gap and studying the response of GDP to a
monetary policy shock at the monthly frequency. Results from these two
empirical applications highlight the importance of incorporating high-frequency
indicators in macroeconomic models.
arXiv link: http://arxiv.org/abs/2112.11315v1
Ranking and Selection from Pairwise Comparisons: Empirical Bayes Methods for Citation Analysis
pairwise comparison model of Bradley and Terry to do ranking and selection of
journal influence based on nonparametric empirical Bayes procedures.
Comparisons with several other rankings are made.
arXiv link: http://arxiv.org/abs/2112.11064v1
Heckman-Selection or Two-Part models for alcohol studies? Depends
alcohol studies. Design: To estimate the determinants of problem drinking using
a Heckman and a two-part estimation model. Psychological and neuro-scientific
studies justify my underlying estimation assumptions and covariate exclusion
restrictions. Higher order tests checking for multicollinearity validate the
use of Heckman over the use of two-part estimation models. I discuss the
generalizability of the two models in applied research. Settings and
Participants: Two pooled national population surveys from 2016 and 2017 were
used: the Behavioral Risk Factor Surveillance Survey (BRFS), and the National
Survey of Drug Use and Health (NSDUH). Measurements: Participation in problem
drinking and meeting the criteria for problem drinking. Findings: Both U.S.
national surveys perform well with the Heckman model and pass all higher order
tests. The Heckman model corrects for selection bias and reveals the direction
of bias, where the two-part model does not. For example, the coefficients on
age are upward biased and unemployment is downward biased in the two-part where
the Heckman model does not have a selection bias. Covariate exclusion
restrictions are sensitive to survey conditions and are contextually
generalizable. Conclusions: The Heckman model can be used for alcohol (smoking
studies as well) if the underlying estimation specification passes higher order
tests for multicollinearity and the exclusion restrictions are justified with
integrity for the data used. Its use is merit-worthy because it corrects for
and reveals the direction and the magnitude of selection bias where the
two-part does not.
arXiv link: http://arxiv.org/abs/2112.10542v2
Robustness, Heterogeneous Treatment Effects and Covariate Shifts
the distribution of covariates. Robustness to covariate shifts is important,
for example, when evaluating the external validity of quasi-experimental
results, which are often used as a benchmark for evidence-based policy-making.
I propose a novel scalar robustness metric. This metric measures the magnitude
of the smallest covariate shift needed to invalidate a claim on the policy
effect (for example, $ATE \geq 0$) supported by the quasi-experimental
evidence. My metric links the heterogeneity of policy effects and robustness in
a flexible, nonparametric way and does not require functional form assumptions.
I cast the estimation of the robustness metric as a de-biased GMM problem. This
approach guarantees a parametric convergence rate for the robustness metric
while allowing for machine learning-based estimators of policy effect
heterogeneity (for example, lasso, random forest, boosting, neural nets). I
apply my procedure to the Oregon Health Insurance experiment. I study the
robustness of policy effects estimates of health-care utilization and financial
strain outcomes, relative to a shift in the distribution of context-specific
covariates. Such covariates are likely to differ across US states, making
quantification of robustness an important exercise for adoption of the
insurance policy in states other than Oregon. I find that the effect on
outpatient visits is the most robust among the metrics of health-care
utilization considered.
arXiv link: http://arxiv.org/abs/2112.09259v2
Reinforcing RCTs with Multiple Priors while Learning about External Validity
the design of sequential experiments. These sources may include past
experiments, expert opinions, or the experimenter's intuition. We model the
problem using a multi-prior Bayesian approach, mapping each source to a
Bayesian model and aggregating them based on posterior probabilities. Policies
are evaluated on three criteria: learning the parameters of payoff
distributions, the probability of choosing the wrong treatment, and average
rewards. Our framework demonstrates several desirable properties, including
robustness to sources lacking external validity, while maintaining strong
finite sample performance.
arXiv link: http://arxiv.org/abs/2112.09170v5
Lassoed Boosting and Linear Prediction in the Equities Market
uses the lasso in Tibshirani (1996) to screen variables and, second,
re-estimates the coefficients using the least-squares boosting method in
Friedman (2001) on every set of selected variables. Based on the large-scale
simulation experiment in Hastie et al. (2020), lassoed boosting performs as
well as the relaxed lasso in Meinshausen (2007) and, under certain scenarios,
can yield a sparser model. Applied to predicting equity returns, lassoed
boosting gives the smallest mean-squared prediction error compared to several
other methods.
arXiv link: http://arxiv.org/abs/2112.08934v4
Uniform Convergence Results for the Local Linear Regression Estimation of the Conditional Distribution
conditional distribution function $F(y|x)$. We derive three uniform convergence
results: the uniform bias expansion, the uniform convergence rate, and the
uniform asymptotic linear representation. The uniformity in the above results
is with respect to both $x$ and $y$ and therefore has not previously been
addressed in the literature on local polynomial regression. Such uniform
convergence results are especially useful when the conditional distribution
estimator is the first stage of a semiparametric estimator. We demonstrate the
usefulness of these uniform results with two examples: the stochastic
equicontinuity condition in $y$, and the estimation of the integrated
conditional distribution function.
arXiv link: http://arxiv.org/abs/2112.08546v2
Testing Instrument Validity with Covariates
for heterogeneous treatment effect models with conditioning covariates. We
assume semiparametric dependence between potential outcomes and conditioning
covariates. This allows us to obtain testable equality and inequality
restrictions among the subdensities of estimable partial residuals. We propose
jointly testing these restrictions. To improve power, we introduce
distillation, where a trimmed sample is used to test the inequality
restrictions. In Monte Carlo exercises we find gains in finite sample power
from testing restrictions jointly and distillation. We apply our test procedure
to three instruments and reject the null for one.
arXiv link: http://arxiv.org/abs/2112.08092v2
Solving the Data Sparsity Problem in Predicting the Success of the Startups with Machine Learning Methods
startup companies and investors. It is difficult due to the lack of available
data and appropriate general methods. With data platforms like Crunchbase
aggregating the information of startup companies, it is possible to predict
with machine learning algorithms. Existing research suffers from the data
sparsity problem as most early-stage startup companies do not have much data
available to the public. We try to leverage the recent algorithms to solve this
problem. We investigate several machine learning algorithms with a large
dataset from Crunchbase. The results suggest that LightGBM and XGBoost perform
best and achieve 53.03% and 52.96% F1 scores. We interpret the predictions from
the perspective of feature contribution. We construct portfolios based on the
models and achieve high success rates. These findings have substantial
implications on how machine learning methods can help startup companies and
investors.
arXiv link: http://arxiv.org/abs/2112.07985v1
Behavioral Foundations of Nested Stochastic Choice and Nested Logit
foundational and widely applied discrete choice model, through the introduction
of a non-parametric version of nested logit that we call Nested Stochastic
Choice (NSC). NSC is characterized by a single axiom that weakens Independence
of Irrelevant Alternatives based on revealed similarity to allow for the
similarity effect. Nested logit is characterized by an additional
menu-independence axiom. Our axiomatic characterization leads to a practical,
data-driven algorithm that identifies the true nest structure from choice data.
We also discuss limitations of generalizing nested logit by studying the
testable implications of cross-nested logit.
arXiv link: http://arxiv.org/abs/2112.07155v2
Factor Models with Sparse VAR Idiosyncratic Components
positive aspects of both. We employ a factor model and assume {the dynamic of
the factors is non-pervasive while} the idiosyncratic term follows a sparse
vector autoregressive model (VAR) {which allows} for cross-sectional and time
dependence. The estimation is articulated in two steps: first, the factors and
their loadings are estimated via principal component analysis and second, the
sparse VAR is estimated by regularized regression on the estimated
idiosyncratic components. We prove the consistency of the proposed estimation
approach as the time and cross-sectional dimension diverge. In the second step,
the estimation error of the first step needs to be accounted for. Here, we do
not follow the naive approach of simply plugging in the standard rates derived
for the factor estimation. Instead, we derive a more refined expression of the
error. This enables us to derive tighter rates. We discuss the implications of
our model for forecasting, factor augmented regression, bootstrap of factor
models, and time series dependence networks via semi-parametric estimation of
the inverse of the spectral density matrix.
arXiv link: http://arxiv.org/abs/2112.07149v2
Semiparametric Conditional Factor Models in Asset Pricing
conditional latent factor models. Our approach disentangles the roles of
characteristics in capturing factor betas of asset returns from “alpha.” We
construct factors by extracting principal components from Fama-MacBeth managed
portfolios. Applying this methodology to the cross-section of U.S. individual
stock returns, we find compelling evidence of substantial nonzero pricing
errors, even though our factors demonstrate superior performance in standard
asset pricing tests. Unexplained “arbitrage” portfolios earn high Sharpe
ratios, which decline over time. Combining factors with these orthogonal
portfolios produces out-of-sample Sharpe ratios exceeding 4.
arXiv link: http://arxiv.org/abs/2112.07121v5
Identifying Marginal Treatment Effects in the Presence of Sample Selection
effect (MTE) when there is sample selection. We show that the MTE is partially
identified for individuals who are always observed regardless of treatment, and
derive uniformly sharp bounds on this parameter under three increasingly
restrictive sets of assumptions. The first result imposes standard MTE
assumptions with an unrestricted sample selection mechanism. The second set of
conditions imposes monotonicity of the sample selection variable with respect
to treatment, considerably shrinking the identified set. Finally, we
incorporate a stochastic dominance assumption which tightens the lower bound
for the MTE. Our analysis extends to discrete instruments. The results rely on
a mixture reformulation of the problem where the mixture weights are
identified, extending Lee's (2009) trimming procedure to the MTE context. We
propose estimators for the bounds derived and use data made available by Deb,
Munking and Trivedi (2006) to empirically illustrate the usefulness of our
approach.
arXiv link: http://arxiv.org/abs/2112.07014v1
Quantile Regression under Limited Dependent Variable
models for the cases of censored (with lower and/or upper censoring) and binary
dependent variables. The estimators are implemented using a smoothed version of
the quantile regression objective function. Simulation exercises show that it
correctly estimates the parameters and it should be implemented instead of the
available quantile regression methods when censoring is present. An empirical
application to women's labor supply in Uruguay is considered.
arXiv link: http://arxiv.org/abs/2112.06822v1
Risk and optimal policies in bandit experiments
asymptotics. Working within the framework of diffusion processes, we define
suitable notions of asymptotic Bayes and minimax risk for these experiments.
For normally distributed rewards, the minimal Bayes risk can be characterized
as the solution to a second-order partial differential equation (PDE). Using a
limit of experiments approach, we show that this PDE characterization also
holds asymptotically under both parametric and non-parametric distributions of
the rewards. The approach further describes the state variables it is
asymptotically sufficient to restrict attention to, and thereby suggests a
practical strategy for dimension reduction. The PDEs characterizing minimal
Bayes risk can be solved efficiently using sparse matrix routines or
Monte-Carlo methods. We derive the optimal Bayes and minimax policies from
their numerical solutions. These optimal policies substantially dominate
existing methods such as Thompson sampling; the risk of the latter is often
twice as high.
arXiv link: http://arxiv.org/abs/2112.06363v16
Housing Price Prediction Model Selection Based on Lorenz and Concentration Curves: Empirical Evidence from Tehran Housing Market
City based on the area between Lorenz curve (LC) and concentration curve (CC)
of the predicted price by using 206,556 observed transaction data over the
period from March 21, 2018, to February 19, 2021. Several different methods
such as generalized linear models (GLM) and recursive partitioning and
regression trees (RPART), random forests (RF) regression models, and neural
network (NN) models were examined house price prediction. We used 90% of all
data samples which were chosen randomly to estimate the parameters of pricing
models and 10% of remaining datasets to test the accuracy of prediction.
Results showed that the area between the LC and CC curves (which are known as
ABC criterion) of real and predicted prices in the test data sample of the
random forest regression model was less than by other models under study. The
comparison of the calculated ABC criteria leads us to conclude that the
nonlinear regression models such as RF regression models give an accurate
prediction of house prices in Tehran City.
arXiv link: http://arxiv.org/abs/2112.06192v1
The Past as a Stochastic Process
have long attempted to identify patterns and categorize historical actors and
influences with some success. A stochastic process framework provides a
structured approach for the analysis of large historical datasets that allows
for detection of sometimes surprising patterns, identification of relevant
causal actors both endogenous and exogenous to the process, and comparison
between different historical cases. The combination of data, analytical tools
and the organizing theoretical framework of stochastic processes complements
traditional narrative approaches in history and archaeology.
arXiv link: http://arxiv.org/abs/2112.05876v1
On the Assumptions of Synthetic Control Methods
causal effect of large-scale interventions, e.g., the state-wide effect of a
change in policy. The idea of synthetic controls is to approximate one unit's
counterfactual outcomes using a weighted combination of some other units'
observed outcomes. The motivating question of this paper is: how does the SC
strategy lead to valid causal inferences? We address this question by
re-formulating the causal inference problem targeted by SC with a more
fine-grained model, where we change the unit of the analysis from "large units"
(e.g., states) to "small units" (e.g., individuals in states). Under this
re-formulation, we derive sufficient conditions for the non-parametric causal
identification of the causal effect. We highlight two implications of the
reformulation: (1) it clarifies where "linearity" comes from, and how it falls
naturally out of the more fine-grained and flexible model, and (2) it suggests
new ways of using available data with SC methods for valid causal inference, in
particular, new ways of selecting observations from which to estimate the
counterfactual.
arXiv link: http://arxiv.org/abs/2112.05671v2
Option Pricing with State-dependent Pricing Kernel
switching with the Realized GARCH framework. This leads to a novel pricing
kernel with a state-dependent variance risk premium and a pricing formula for
European options, which is derived with an analytical approximation method. We
apply the Markov switching Realized GARCH model to S&P 500 index options from
1990 to 2019 and find that investors' aversion to volatility-specific risk is
time-varying. The proposed framework outperforms competing models and reduces
(in-sample and out-of-sample) option pricing errors by 15% or more.
arXiv link: http://arxiv.org/abs/2112.05308v2
Realized GARCH, CBOE VIX, and the Volatility Risk Premium
the Volatility Index (VIX) and the volatility risk premium (VRP). The Realized
GARCH model is driven by two shocks, a return shock and a volatility shock, and
these are natural state variables in the stochastic discount factor (SDF). The
volatility shock endows the exponentially affine SDF with a compensation for
volatility risk. This leads to dissimilar dynamic properties under the physical
and risk-neutral measures that can explain time-variation in the VRP. In an
empirical application with the S&P 500 returns, the VIX, and the VRP, we find
that the Realized GARCH model significantly outperforms conventional GARCH
models.
arXiv link: http://arxiv.org/abs/2112.05302v1
Covariate Balancing Sensitivity Analysis for Extrapolating Randomized Trials across Locations
(RCTs) across locations is crucial for informing policy decisions in targeted
regions. Such generalization is often hindered by the lack of identifiability
due to unmeasured effect modifiers that compromise direct transport of
treatment effect estimates from one location to another. We build upon
sensitivity analysis in observational studies and propose an optimization
procedure that allows us to get bounds on the treatment effects in targeted
regions. Furthermore, we construct more informative bounds by balancing on the
moments of covariates. In simulation experiments, we show that the covariate
balancing approach is promising in getting sharper identification intervals.
arXiv link: http://arxiv.org/abs/2112.04723v1
Deep self-consistent learning of local volatility
option prices through deep self-consistent learning, by approximating both
market option prices and local volatility using deep neural networks. Our
method uses the initial-boundary value problem of the underlying Dupire's
partial differential equation solved by the parameterized option prices to
bring corrections to the parameterization in a self-consistent way. By
exploiting the differentiability of neural networks, we can evaluate Dupire's
equation locally at each strike-maturity pair; while by exploiting their
continuity, we sample strike-maturity pairs uniformly from a given domain,
going beyond the discrete points where the options are quoted. Moreover, the
absence of arbitrage opportunities are imposed by penalizing an associated loss
function as a soft constraint. For comparison with existing approaches, the
proposed method is tested on both synthetic and market option prices, which
shows an improved performance in terms of reduced interpolation and reprice
errors, as well as the smoothness of the calibrated local volatility. An
ablation study has been performed, asserting the robustness and significance of
the proposed method.
arXiv link: http://arxiv.org/abs/2201.07880v3
Efficient counterfactual estimation in semiparametric discrete choice models: a note on Chiong, Hsieh, and Shum (2017)
for calculating bounds on counterfactual demand in semiparametric discrete
choice models. Their algorithm relies on a system of inequalities indexed by
cycles of a large number $M$ of observed markets and hence seems to require
computationally infeasible enumeration of all such cycles. I show that such
enumeration is unnecessary because solving the "fully efficient" inequality
system exploiting cycles of all possible lengths $K=1,\dots,M$ can be reduced
to finding the length of the shortest path between every pair of vertices in a
complete bidirected weighted graph on $M$ vertices. The latter problem can be
solved using the Floyd--Warshall algorithm with computational complexity
$O\left(M^3\right)$, which takes only seconds to run even for thousands of
markets. Monte Carlo simulations illustrate the efficiency gain from using
cycles of all lengths, which turns out to be positive, but small.
arXiv link: http://arxiv.org/abs/2112.04637v1
Two-Way Fixed Effects and Differences-in-Differences with Heterogeneous Treatment Effects: A Survey
estimate policies' effects: 26 of the 100 most cited papers published by the
American Economic Review from 2015 to 2019 estimate such regressions. It has
recently been shown that those regressions may produce misleading estimates, if
the policy's effect is heterogeneous between groups or over time, as is often
the case. This survey reviews a fast-growing literature that documents this
issue, and that proposes alternative estimators robust to heterogeneous
effects. We use those alternative estimators to revisit Wolfers (2006).
arXiv link: http://arxiv.org/abs/2112.04565v6
Matching for causal effects via multimarginal unbalanced optimal transport
effects in observational studies. The principal challenge stems from the often
high-dimensional structure of the problem. Many methods have been introduced to
address this, with different advantages and drawbacks in computational and
statistical performance as well as interpretability. This article introduces a
natural optimal matching method based on multimarginal unbalanced optimal
transport that possesses many useful properties in this regard. It provides
interpretable weights based on the distance of matched individuals, can be
efficiently implemented via the iterative proportional fitting procedure, and
can match several treatment arms simultaneously. Importantly, the proposed
method only selects good matches from either group, hence is competitive with
the classical k-nearest neighbors approach in terms of bias and variance in
finite samples. Moreover, we prove a central limit theorem for the empirical
process of the potential functions of the optimal coupling in the unbalanced
optimal transport problem with a fixed penalty term. This implies a parametric
rate of convergence of the empirically obtained weights to the optimal weights
in the population for a fixed penalty term.
arXiv link: http://arxiv.org/abs/2112.04398v2
Nonparametric Treatment Effect Identification in School Choice
effects in centralized school assignment. In many centralized assignment
algorithms, students are subjected to both lottery-driven variation and
regression discontinuity (RD) driven variation. We characterize the full set of
identified atomic treatment effects (aTEs), defined as the conditional average
treatment effect between a pair of schools, given student characteristics.
Atomic treatment effects are the building blocks of more aggregated notions of
treatment contrasts, and common approaches to estimating aggregations of aTEs
can mask important heterogeneity. In particular, many aggregations of aTEs put
zero weight on aTEs driven by RD variation, and estimators of such aggregations
put asymptotically vanishing weight on the RD-driven aTEs. We provide a
diagnostic and recommend new aggregation schemes. Lastly, we provide estimators
and accompanying asymptotic results for inference for those aggregations.
arXiv link: http://arxiv.org/abs/2112.03872v4
A decomposition method to evaluate the `paradox of progress' with evidence for Argentina
education with larger income inequality. Two driving and competing factors
behind this phenomenon are the convexity of the `Mincer equation' (that links
wages and education) and the heterogeneity in its returns, as captured by
quantile regressions. We propose a joint least-squares and quantile regression
statistical framework to derive a decomposition in order to evaluate the
relative contribution of each explanation. The estimators are based on the
`functional derivative' approach. We apply the proposed decomposition strategy
to the case of Argentina 1992 to 2015.
arXiv link: http://arxiv.org/abs/2112.03836v1
A Bayesian take on option pricing with Gaussian processes
dependent diffusion coefficient. Calibration is, however, non-trivial as it
involves both proposing a hypothesis model of the latent function and a method
for fitting it to data. In this paper we present novel Bayesian inference with
Gaussian process priors. We obtain a rich representation of the local
volatility function with a probabilistic notion of uncertainty attached to the
calibrate. We propose an inference algorithm and apply our approach to S&P 500
market data.
arXiv link: http://arxiv.org/abs/2112.03718v1
Phase transitions in nonparametric regressions
derivatives up to the $(\gamma+1)$th order bounded in absolute values by a
common constant everywhere or a.e. (i.e., $(\gamma+1)$th degree of smoothness),
the minimax optimal rate of the mean integrated squared error (MISE) is stated
as $\left(1{n}\right)^{2\gamma+2{2\gamma+3}}$ in the literature.
This paper shows that: (i) if $n\leq\left(\gamma+1\right)^{2\gamma+3}$, the
minimax optimal MISE rate is $\log n{n\log(\log n)}$ and the optimal
degree of smoothness to exploit is roughly $\max\left\{ \left\lfloor \log
n{2\log\left(\log n\right)}\right\rfloor ,\,1\right\} $; (ii) if
$n>\left(\gamma+1\right)^{2\gamma+3}$, the minimax optimal MISE rate is
$\left(1{n}\right)^{2\gamma+2{2\gamma+3}}$ and the optimal degree
of smoothness to exploit is $\gamma+1$. The fundamental contribution of this
paper is a set of metric entropy bounds we develop for smooth function classes.
Some of our bounds are original, and some of them improve and/or generalize the
ones in the literature (e.g., Kolmogorov and Tikhomirov, 1959). Our metric
entropy bounds allow us to show phase transitions in the minimax optimal MISE
rates associated with some commonly seen smoothness classes as well as
non-standard smoothness classes, and can also be of independent interest
outside the nonparametric regression problems.
arXiv link: http://arxiv.org/abs/2112.03626v7
Visual Inference and Graphical Representation in Regression Discontinuity Designs
about readers' ability to process the statistical information they are meant to
convey ("visual inference"). We study visual inference within the context of
regression discontinuity (RD) designs by measuring how accurately readers
identify discontinuities in graphs produced from data generating processes
calibrated on 11 published papers from leading economics journals. First, we
assess the effects of different graphical representation methods on visual
inference using randomized experiments. We find that bin widths and fit lines
have the largest impacts on whether participants correctly perceive the
presence or absence of a discontinuity. Our experimental results allow us to
make evidence-based recommendations to practitioners, and we suggest using
small bins with no fit lines as a starting point to construct RD graphs.
Second, we compare visual inference on graphs constructed using our preferred
method with widely used econometric inference procedures. We find that visual
inference achieves similar or lower type I error (false positive) rates and
complements econometric inference.
arXiv link: http://arxiv.org/abs/2112.03096v2
Deep Quantile and Deep Composite Model Regression
off-the-shelf distribution that simultaneously provides a good distributional
model for the main body and the tail of the data. In particular, covariates may
have different effects for small and for large claim sizes. To cope with this
problem, we introduce a deep composite regression model whose splicing point is
given in terms of a quantile of the conditional claim size distribution rather
than a constant. To facilitate M-estimation for such models, we introduce and
characterize the class of strictly consistent scoring functions for the triplet
consisting a quantile, as well as the lower and upper expected shortfall beyond
that quantile. In a second step, this elicitability result is applied to fit
deep neural network regression models. We demonstrate the applicability of our
approach and its superiority over classical approaches on a real accident
insurance data set.
arXiv link: http://arxiv.org/abs/2112.03075v1
Gaussian Process Vector Autoregressions and Macroeconomic Uncertainty
agnostic on the precise relationship between a (possibly) large set of
macroeconomic time series and their lagged values. The main building block of
our model is a Gaussian process prior on the functional relationship that
determines the conditional mean of the model, hence the name of Gaussian
process vector autoregression (GP-VAR). A flexible stochastic volatility
specification is used to provide additional flexibility and control for
heteroskedasticity. Markov chain Monte Carlo (MCMC) estimation is carried out
through an efficient and scalable algorithm which can handle large models. The
GP-VAR is illustrated by means of simulated data and in a forecasting exercise
with US data. Moreover, we use the GP-VAR to analyze the effects of
macroeconomic uncertainty, with a particular emphasis on time variation and
asymmetries in the transmission mechanisms.
arXiv link: http://arxiv.org/abs/2112.01995v3
Inference for ROC Curves Based on Estimated Predictive Indices
inference about receiver operating characteristic (ROC) curves that are based
on predicted values from a first stage model with estimated parameters (such as
a logit regression). The term "in-sample" refers to the practice of using the
same data for model estimation (training) and subsequent evaluation, i.e., the
construction of the ROC curve. We show that in this case the first stage
estimation error has a generally non-negligible impact on the asymptotic
distribution of the ROC curve and develop the appropriate pointwise and
functional limit theory. We propose methods for simulating the distribution of
the limit process and show how to use the results in practice in comparing ROC
curves.
arXiv link: http://arxiv.org/abs/2112.01772v1
Patient-Centered Appraisal of Race-Free Clinical Risk Assessment
patient risk assessments on all observed patient covariates with predictive
power. The broad idea is that knowing more about patients enables more accurate
predictions of their health risks and, hence, better clinical decisions. This
consensus has recently unraveled with respect to a specific covariate, namely
race. There have been increasing calls for race-free risk assessment, arguing
that using race to predict patient outcomes contributes to racial disparities
and inequities in health care. Writers calling for race-free risk assessment
have not studied how it would affect the quality of clinical decisions.
Considering the matter from the patient-centered perspective of medical
economics yields a disturbing conclusion: Race-free risk assessment would harm
patients of all races.
arXiv link: http://arxiv.org/abs/2112.01639v2
Simple Alternatives to the Common Correlated Effects Model
significantly relax the common correlated effects (CCE) assumptions pioneered
by Pesaran (2006) and used in dozens of papers since. In the simplest case, we
model the unobserved factors as functions of the cross-sectional averages of
the explanatory variables and show that this is implied by Pesaran's
assumptions when the number of factors does not exceed the number of
explanatory variables. Our approach allows discrete explanatory variables and
flexible functional forms in the covariates. Plus, it extends to a framework
that easily incorporates general functions of cross-sectional moments, in
addition to heterogeneous intercepts and time trends. Our proposed estimators
include Pesaran's pooled correlated common effects (CCEP) estimator as a
special case. We also show that in the presence of heterogeneous slopes our
estimator is consistent under assumptions much weaker than those previously
used. We derive the fixed-T asymptotic normality of a general estimator and
show how to adjust for estimation of the population moments in the factor
loading equation.
arXiv link: http://arxiv.org/abs/2112.01486v1
RIF Regression via Sensitivity Curves
function (RIF) regression of Firpo, Fortin and Lemieux (2009), a relevant
method to study the effect of covariates on many statistics beyond the mean. In
empirically relevant situations where the influence function is not available
or difficult to compute, we suggest to use the sensitivity curve (Tukey,
1977) as a feasible alternative. This may be computationally cumbersome when
the sample size is large. The relevance of the proposed strategy derives from
the fact that, under general conditions, the sensitivity curve converges in
probability to the influence function. In order to save computational time we
propose to use a cubic splines non-parametric method for a random subsample and
then to interpolate to the rest of the cases where it was not computed. Monte
Carlo simulations show good finite sample properties. We illustrate the
proposed estimator with an application to the polarization index of Duclos,
Esteban and Ray (2004).
arXiv link: http://arxiv.org/abs/2112.01435v1
Structural Sieves
estimation of economic models of maximizing behavior in production or discrete
choice. We argue that certain deep networks are particularly well suited as a
nonparametric sieve to approximate regression functions that result from
nonlinear latent variable models of continuous or discrete optimization.
Multi-stage models of this type will typically generate rich interaction
effects between regressors ("inputs") in the regression function so that there
may be no plausible separability restrictions on the "reduced-form" mapping
form inputs to outputs to alleviate the curse of dimensionality. Rather,
economic shape, sparsity, or separability restrictions either at a global level
or intermediate stages are usually stated in terms of the latent variable
model. We show that restrictions of this kind are imposed in a more
straightforward manner if a sufficiently flexible version of the latent
variable model is in fact used to approximate the unknown regression function.
arXiv link: http://arxiv.org/abs/2112.01377v2
Modelling hetegeneous treatment effects by quantitle local polynomial decision tree and forest
treatment effects, this paper builds on Breiman's (2001) random forest tree
(RFT)and Wager et al.'s (2018) causal tree to parameterize the nonparametric
problem using the excellent statistical properties of classical OLS and the
division of local linear intervals based on covariate quantile points, while
preserving the random forest trees with the advantages of constructible
confidence intervals and asymptotic normality properties [Athey and Imbens
(2016),Efron (2014),Wager et al.(2014)wager2014asymptotic], we propose
a decision tree using quantile classification according to fixed rules combined
with polynomial estimation of local samples, which we call the quantile local
linear causal tree (QLPRT) and forest (QLPRF).
arXiv link: http://arxiv.org/abs/2111.15320v2
Distribution Shift in Airline Customer Behavior during COVID-19
applications assume that the data distribution at the time of online pricing is
similar to that observed during training. However, this assumption may be
violated in practice because of the dynamic nature of customer buying patterns,
particularly due to unanticipated system shocks such as COVID-19. We study the
changes in customer behavior for a major airline during the COVID-19 pandemic
by framing it as a covariate shift and concept drift detection problem. We
identify which customers changed their travel and purchase behavior and the
attributes affecting that change using (i) Fast Generalized Subset Scanning and
(ii) Causal Forests. In our experiments with simulated and real-world data, we
present how these two techniques can be used through qualitative analysis.
arXiv link: http://arxiv.org/abs/2111.14938v2
The Fixed-b Limiting Distribution and the ERP of HAR Tests Under Nonstationarity
under fixed-b asymptotics is not pivotal (even after studentization) when the
data are nonstationarity. It takes the form of a complicated function of
Gaussian processes and depends on the integrated local long-run variance and on
on the second moments of the relevant series (e.g., of the regressors and
errors for the case of the linear regression model). Hence, existing fixed-b
inference methods based on stationarity are not theoretically valid in general.
The nuisance parameters entering the fixed-b limiting distribution can be
consistently estimated under small-b asymptotics but only with nonparametric
rate of convergence. Hence, We show that the error in rejection probability
(ERP) is an order of magnitude larger than that under stationarity and is also
larger than that of HAR tests based on HAC estimators under conventional
asymptotics. These theoretical results reconcile with recent finite-sample
evidence in Casini (2021) and Casini, Deng and Perron (2021) who showing that
fixed-b HAR tests can perform poorly when the data are nonstationary. They can
be conservative under the null hypothesis and have non-monotonic power under
the alternative hypothesis irrespective of how large the sample size is.
arXiv link: http://arxiv.org/abs/2111.14590v2
Factor-augmented tree ensembles
regression trees with latent stationary factors extracted via state-space
methods. In doing so, this approach generalises time-series regression trees on
two dimensions. First, it allows to handle predictors that exhibit measurement
error, non-stationary trends, seasonality and/or irregularities such as missing
observations. Second, it gives a transparent way for using domain-specific
theory to inform time-series regression trees. Empirically, ensembles of these
factor-augmented trees provide a reliable approach for macro-finance problems.
This article highlights it focussing on the lead-lag effect between equity
volatility and the business cycle in the United States.
arXiv link: http://arxiv.org/abs/2111.14000v6
Robust Permutation Tests in Linear Instrumental Variables Regression
linear instrumental variables (IV) regression. Unlike the existing
randomization and rank-based tests in which independence between the
instruments and the error terms is assumed, the permutation Anderson- Rubin
(AR), Lagrange Multiplier (LM) and Conditional Likelihood Ratio (CLR) tests are
asymptotically similar and robust to conditional heteroskedasticity under
standard exclusion restriction i.e. the orthogonality between the instruments
and the error terms. Moreover, when the instruments are independent of the
structural error term, the permutation AR tests are exact, hence robust to
heavy tails. As such, these tests share the strengths of the rank-based tests
and the wild bootstrap AR tests. Numerical illustrations corroborate the
theoretical results.
arXiv link: http://arxiv.org/abs/2111.13774v4
Yogurts Choose Consumers? Estimation of Random-Utility Models via Two-Sided Matching
utility discrete-choice models - is equivalent to the determination of stable
outcomes in two-sided matching models. This equivalence applies to random
utility models that are not necessarily additive, smooth, nor even invertible.
Based on this equivalence, algorithms for the determination of stable matchings
provide effective computational methods for estimating these models. For
non-invertible models, the identified set of utility vectors is a lattice, and
the matching algorithms recover sharp upper and lower bounds on the utilities.
Our matching approach facilitates estimation of models that were previously
difficult to estimate, such as the pure characteristics model. An empirical
application to voting data from the 1999 European Parliament elections
illustrates the good performance of our matching-based demand inversion
algorithms in practice.
arXiv link: http://arxiv.org/abs/2111.13744v1
Expert Aggregation for Financial Forecasting
have gained a lot of interest. But choosing between several algorithms can be
challenging, as their estimation accuracy may be unstable over time. Online
aggregation of experts combine the forecasts of a finite set of models in a
single approach without making any assumption about the models. In this paper,
a Bernstein Online Aggregation (BOA) procedure is applied to the construction
of long-short strategies built from individual stock return forecasts coming
from different machine learning models. The online mixture of experts leads to
attractive portfolio performances even in environments characterised by
non-stationarity. The aggregation outperforms individual algorithms, offering a
higher portfolio Sharpe Ratio, lower shortfall, with a similar turnover.
Extensions to expert and aggregation specialisations are also proposed to
improve the overall mixture on a family of portfolio evaluation metrics.
arXiv link: http://arxiv.org/abs/2111.15365v4
Difference in Differences and Ratio in Ratios for Limited Dependent Variables
effects with observational data, but applying DD to limited dependent variables
(LDV's) Y has been problematic. This paper addresses how to apply DD and
related approaches (such as "ratio in ratios" or "ratio in odds ratios") to
binary, count, fractional, multinomial or zero-censored Y under the unifying
framework of `generalized linear models with link functions'. We evaluate DD
and the related approaches with simulation and empirical studies, and recommend
'Poisson Quasi-MLE' for non-negative (such as count or zero-censored) Y and
(multinomial) logit MLE for binary, fractional or multinomial Y.
arXiv link: http://arxiv.org/abs/2111.12948v2
Network regression and supervised centrality estimation
model network effects on a certain outcome. Empirical studies widely adopt a
two-stage procedure, which first estimates the centrality from the observed
noisy network and then infers the network effect from the estimated centrality,
even though it lacks theoretical understanding. We propose a unified modeling
framework to study the properties of centrality estimation and inference and
the subsequent network regression analysis with noisy network observations.
Furthermore, we propose a supervised centrality estimation methodology, which
aims to simultaneously estimate both centrality and network effect. We showcase
the advantages of our method compared with the two-stage method both
theoretically and numerically via extensive simulations and a case study in
predicting currency risk premiums from the global trade network.
arXiv link: http://arxiv.org/abs/2111.12921v3
Maximum Likelihood Estimation of Differentiated Products Demand Systems
et al (1995) (BLP) by maximum likelihood estimation (MLE). We derive the
maximum likelihood estimator in the case where prices are endogenously
generated by firms that set prices in Bertrand-Nash equilibrium. In Monte Carlo
simulations the MLE estimator outperforms the best-practice GMM estimator on
both bias and mean squared error when the model is correctly specified. This
remains true under some forms of misspecification. In our simulations, the
coverage of the ML estimator is close to its nominal level, whereas the GMM
estimator tends to under-cover. We conclude the paper by estimating BLP on the
car data used in the original Berry et al (1995) paper, obtaining similar
estimates with considerably tighter standard errors.
arXiv link: http://arxiv.org/abs/2111.12397v1
On Recoding Ordered Treatments as Binary Indicators
often recode treatment into an indicator for any exposure. We investigate this
estimand under the assumption that the instruments shift compliers from no
treatment to some but not from some treatment to more. We show that when there
are extensive margin compliers only (EMCO) this estimand captures a weighted
average of treatment effects that can be partially unbundled into each complier
group's potential outcome means. We also establish an equivalence between EMCO
and a two-factor selection model and apply our results to study treatment
heterogeneity in the Oregon Health Insurance Experiment.
arXiv link: http://arxiv.org/abs/2111.12258v4
Interactive Effects Panel Data Models with General Factors and Regressors
factors. An estimator based on iterated principal components is proposed, which
is shown to be not only asymptotically normal and oracle efficient, but under
certain conditions also free of the otherwise so common asymptotic incidental
parameters bias. Interestingly, the conditions required to achieve unbiasedness
become weaker the stronger the trends in the factors, and if the trending is
strong enough unbiasedness comes at no cost at all. In particular, the approach
does not require any knowledge of how many factors there are, or whether they
are deterministic or stochastic. The order of integration of the factors is
also treated as unknown, as is the order of integration of the regressors,
which means that there is no need to pre-test for unit roots, or to decide on
which deterministic terms to include in the model.
arXiv link: http://arxiv.org/abs/2111.11506v1
Orthogonal Policy Learning Under Ambiguity
when treatment effects are partially identified, as it is often the case with
observational data. By drawing connections between the treatment assignment
problem and classical decision theory, we characterize several notions of
optimal treatment policies in the presence of partial identification. Our
unified framework allows to incorporate user-defined constraints on the set of
allowable policies, such as restrictions for transparency or interpretability,
while also ensuring computational feasibility. We show how partial
identification leads to a new policy learning problem where the objective
function is directionally -- but not fully -- differentiable with respect to
the nuisance first-stage. We then propose an estimation procedure that ensures
Neyman-orthogonality with respect to the nuisance components and we provide
statistical guarantees that depend on the amount of concentration around the
points of non-differentiability in the data-generating-process. The proposed
methods are illustrated using data from the Job Partnership Training Act study.
arXiv link: http://arxiv.org/abs/2111.10904v3
Why Synthetic Control estimators are biased and what to do about it: Introducing Relaxed and Penalized Synthetic Controls
controls to the case of non-linear generative models, showing that the
synthetic control estimator is generally biased in such settings. I derive a
lower bound for the bias, showing that the only component of it that is
affected by the choice of synthetic control is the weighted sum of pairwise
differences between the treated unit and the untreated units in the synthetic
control. To address this bias, I propose a novel synthetic control estimator
that allows for a constant difference of the synthetic control to the treated
unit in the pre-treatment period, and that penalizes the pairwise
discrepancies. Allowing for a constant offset makes the model more flexible,
thus creating a larger set of potential synthetic controls, and the
penalization term allows for the selection of the potential solution that will
minimize bias. I study the properties of this estimator and propose a
data-driven process for parameterizing the penalization term.
arXiv link: http://arxiv.org/abs/2111.10784v1
Identifying Dynamic Discrete Choice Models with Hyperbolic Discounting
discounting. We show that the standard discount factor, present bias factor,
and instantaneous utility functions for the sophisticated agent are
point-identified from observed conditional choice probabilities and transition
probabilities in a finite horizon model. The main idea to achieve
identification is to exploit variation in the observed conditional choice
probabilities over time. We present the estimation method and demonstrate a
good performance of the estimator by simulation.
arXiv link: http://arxiv.org/abs/2111.10721v4
Optimized Inference in Regression Kink Designs
upon the efficiency of commonly employed procedures for the construction of
nonparametric confidence intervals in regression kink designs. The proposed
interval is centered at the half-length optimal, numerically obtained linear
minimax estimator over distributions with Lipschitz constrained conditional
mean function. Its construction ensures excellent finite sample coverage and
length properties which are demonstrated in a simulation study and an empirical
illustration. Given the Lipschitz constant that governs how much curvature one
plausibly allows for, the procedure is fully data driven, computationally
inexpensive, incorporates shape constraints and is valid irrespective of the
distribution of the assignment variable.
arXiv link: http://arxiv.org/abs/2111.10713v1
An Empirical Evaluation of the Impact of New York's Bail Reform on Crime Using Synthetic Controls
crime. New York State's Bail Elimination Act went into effect on January 1,
2020, eliminating money bail and pretrial detention for nearly all misdemeanor
and nonviolent felony defendants. Our analysis of effects on aggregate crime
rates after the reform informs the understanding of bail reform and general
deterrence. We conduct a synthetic control analysis for a comparative case
study of impact of bail reform. We focus on synthetic control analysis of
post-intervention changes in crime for assault, theft, burglary, robbery, and
drug crimes, constructing a dataset from publicly reported crime data of 27
large municipalities. Our findings, including placebo checks and other
robustness checks, show that for assault, theft, and drug crimes, there is no
significant impact of bail reform on crime; for burglary and robbery, we
similarly have null findings but the synthetic control is also more variable so
these are deemed less conclusive.
arXiv link: http://arxiv.org/abs/2111.08664v2
Optimal Stratification of Survey Experiments
first samples representative units from an eligible pool, then assigns each
sampled unit to treatment or control. To implement balanced sampling and
assignment, we introduce a new family of finely stratified designs that
generalize matched pairs randomization to propensities p(x) not equal to 1/2.
We show that two-stage stratification nonparametrically dampens the variance of
treatment effect estimation. We formulate and solve the optimal stratification
problem with heterogeneous costs and fixed budget, providing simple heuristics
for the optimal design. In settings with pilot data, we show that implementing
a consistent estimate of this design is also efficient, minimizing asymptotic
variance subject to the budget constraint. We also provide new asymptotically
exact inference methods, allowing experimenters to fully exploit the efficiency
gains from both stratified sampling and assignment. An application to nine
papers recently published in top economics journals demonstrates the value of
our methods.
arXiv link: http://arxiv.org/abs/2111.08157v2
Abductive Inference and C. S. Peirce: 150 Years Later
iconoclastic philosopher and polymath who is among the greatest of American
minds. (ii) Abductive inference -- a term coined by C. S. Peirce, which he
defined as "the process of forming explanatory hypotheses. It is the only
logical operation which introduces any new idea."
Abductive inference and quantitative economics: Abductive inference plays a
fundamental role in empirical scientific research as a tool for discovery and
data analysis. Heckman and Singer (2017) strongly advocated "Economists should
abduct." Arnold Zellner (2007) stressed that "much greater emphasis on
reductive [abductive] inference in teaching econometrics, statistics, and
economics would be desirable." But currently, there are no established theory
or practical tools that can allow an empirical analyst to abduct. This paper
attempts to fill this gap by introducing new principles and concrete procedures
to the Economics and Statistics community. I termed the proposed approach as
Abductive Inference Machine (AIM).
The historical Peirce's experiment: In 1872, Peirce conducted a series of
experiments to determine the distribution of response times to an auditory
stimulus, which is widely regarded as one of the most significant statistical
investigations in the history of nineteenth-century American mathematical
research (Stigler, 1978). On the 150th anniversary of this historical
experiment, we look back at the Peircean-style abductive inference through a
modern statistical lens. Using Peirce's data, it is shown how empirical
analysts can abduct in a systematic and automated manner using AIM.
arXiv link: http://arxiv.org/abs/2111.08054v3
An Outcome Test of Discrimination for Ranked Lists
where a (human or algorithmic) decision-maker produces a ranked list of
candidates. Ranked lists are particularly relevant in the context of online
platforms that produce search results or feeds, and also arise when human
decisionmakers express ordinal preferences over a list of candidates. We show
that non-discrimination implies a system of moment inequalities, which
intuitively impose that one cannot permute the position of a lower-ranked
candidate from one group with a higher-ranked candidate from a second group and
systematically improve the objective. Moreover, we show that that these moment
inequalities are the only testable implications of non-discrimination when the
auditor observes only outcomes and group membership by rank. We show how to
statistically test the implied inequalities, and validate our approach in an
application using data from LinkedIn.
arXiv link: http://arxiv.org/abs/2111.07889v1
Dynamic Network Quantile Regression Model
quantile connectedness using a predetermined network information. We extend the
existing network quantile autoregression model of Zhu et al. (2019b) by
explicitly allowing the contemporaneous network effects and controlling for the
common factors across quantiles. To cope with the endogeneity issue due to
simultaneous network spillovers, we adopt the instrumental variable quantile
regression (IVQR) estimation and derive the consistency and asymptotic
normality of the IVQR estimator using the near epoch dependence property of the
network process. Via Monte Carlo simulations, we confirm the satisfactory
performance of the IVQR estimator across different quantiles under the
different network structures. Finally, we demonstrate the usefulness of our
proposed approach with an application to the dataset on the stocks traded in
NYSE and NASDAQ in 2016.
arXiv link: http://arxiv.org/abs/2111.07633v1
Decoding Causality by Fictitious VAR Modeling
it would be beneficial to have figured out the cause-effect relations within
the data. Regression analysis, however, is generally for correlation relation,
and very few researches have focused on variance analysis for causality
discovery. We first set up an equilibrium for the cause-effect relations using
a fictitious vector autoregressive model. In the equilibrium, long-run
relations are identified from noise, and spurious ones are negligibly close to
zero. The solution, called causality distribution, measures the relative
strength causing the movement of all series or specific affected ones. If a
group of exogenous data affects the others but not vice versa, then, in theory,
the causality distribution for other variables is necessarily zero. The
hypothesis test of zero causality is the rule to decide a variable is
endogenous or not. Our new approach has high accuracy in identifying the true
cause-effect relations among the data in the simulation studies. We also apply
the approach to estimating the causal factors' contribution to climate change.
arXiv link: http://arxiv.org/abs/2111.07465v2
When Can We Ignore Measurement Error in the Running Variable?
variable used by the administrator to assign treatment is only observed with
error. We show that, provided the observed running variable (i) correctly
classifies the treatment assignment, and (ii) affects the conditional means of
the potential outcomes smoothly, ignoring the measurement error nonetheless
yields an estimate with a causal interpretation: the average treatment effect
for units whose observed running variable equals to the cutoff. We show that,
possibly after doughnut trimming, these assumptions accommodate a variety of
settings where support of the measurement error is not too wide. We propose to
conduct inference using bias-aware methods, which remain valid even when
discreteness or irregular support in the observed running variable may lead to
partial identification. We illustrate the results for both sharp and fuzzy
designs in an empirical application.
arXiv link: http://arxiv.org/abs/2111.07388v4
Rational AI: A comparison of human and AI responses to triggers of economic irrationality in poker
environmental triggers, such as experiencing an economic loss or gain. In this
paper we investigate whether algorithms exhibit the same behavior by examining
the observed decisions and latent risk and rationality parameters estimated by
a random utility model with constant relative risk-aversion utility function.
We use a dataset consisting of 10,000 hands of poker played by Pluribus, the
first algorithm in the world to beat professional human players and find (1)
Pluribus does shift its playing style in response to economic losses and gains,
ceteris paribus; (2) Pluribus becomes more risk-averse and rational following a
trigger but the humans become more risk-seeking and irrational; (3) the
difference in playing styles between Pluribus and the humans on the dimensions
of risk-aversion and rationality are particularly differentiable when both have
experienced a trigger. This provides support that decision-making patterns
could be used as "behavioral signatures" to identify human versus algorithmic
decision-makers in unlabeled contexts.
arXiv link: http://arxiv.org/abs/2111.07295v1
Large Order-Invariant Bayesian VARs with Stochastic Volatility
multivariate stochastic volatility are not invariant to the way the variables
are ordered due to the use of a Cholesky decomposition for the error covariance
matrix. We show that the order invariance problem in existing approaches is
likely to become more serious in large VARs. We propose the use of a
specification which avoids the use of this Cholesky decomposition. We show that
the presence of multivariate stochastic volatility allows for identification of
the proposed model and prove that it is invariant to ordering. We develop a
Markov Chain Monte Carlo algorithm which allows for Bayesian estimation and
prediction. In exercises involving artificial and real macroeconomic data, we
demonstrate that the choice of variable ordering can have non-negligible
effects on empirical results. In a macroeconomic forecasting exercise involving
VARs with 20 variables we find that our order-invariant approach leads to the
best forecasts and that some choices of variable ordering can lead to poor
forecasts using a conventional, non-order invariant, approach.
arXiv link: http://arxiv.org/abs/2111.07225v1
Asymmetric Conjugate Priors for Large Bayesian VARs
popular shrinkage prior in this setting is the natural conjugate prior as it
facilitates posterior simulation and leads to a range of useful analytical
results. This is, however, at the expense of modeling flexibility, as it rules
out cross-variable shrinkage -- i.e., shrinking coefficients on lags of other
variables more aggressively than those on own lags. We develop a prior that has
the best of both worlds: it can accommodate cross-variable shrinkage, while
maintaining many useful analytical results, such as a closed-form expression of
the marginal likelihood. This new prior also leads to fast posterior simulation
-- for a BVAR with 100 variables and 4 lags, obtaining 10,000 posterior draws
takes less than half a minute on a standard desktop. We demonstrate the
usefulness of the new prior via a structural analysis using a 15-variable VAR
with sign restrictions to identify 5 structural shocks.
arXiv link: http://arxiv.org/abs/2111.07170v1
Absolute and Relative Bias in Eight Common Observational Study Designs: Evidence from a Meta-analysis
study comparisons (WSC) compare observational and experimental estimates that
test the same hypothesis using the same treatment group, outcome, and estimand.
Meta-analyzing 39 of them, we compare mean bias and its variance for the eight
observational designs that result from combining whether there is a pretest
measure of the outcome or not, whether the comparison group is local to the
treatment group or not, and whether there is a relatively rich set of other
covariates or not. Of these eight designs, one combines all three design
elements, another has none, and the remainder include any one or two. We found
that both the mean and variance of bias decline as design elements are added,
with the lowest mean and smallest variance in a design with all three elements.
The probability of bias falling within 0.10 standard deviations of the
experimental estimate varied from 59 to 83 percent in Bayesian analyses and
from 86 to 100 percent in non-Bayesian ones -- the ranges depending on the
level of data aggregation. But confounding remains possible due to each of the
eight observational study design cells including a different set of WSC
studies.
arXiv link: http://arxiv.org/abs/2111.06941v2
Dynamic treatment effects: high-dimensional inference under model misspecification
providing insights into the time-dependent causal impact of interventions.
However, this estimation poses challenges due to time-varying confounding,
leading to potentially biased estimates. Furthermore, accurately specifying the
growing number of treatment assignments and outcome models with multiple
exposures appears increasingly challenging to accomplish. Double robustness,
which permits model misspecification, holds great value in addressing these
challenges. This paper introduces a novel "sequential model doubly robust"
estimator. We develop novel moment-targeting estimates to account for
confounding effects and establish that root-$N$ inference can be achieved as
long as at least one nuisance model is correctly specified at each exposure
time, despite the presence of high-dimensional covariates. Although the
nuisance estimates themselves do not achieve root-$N$ rates, the carefully
designed loss functions in our framework ensure final root-$N$ inference for
the causal parameter of interest. Unlike off-the-shelf high-dimensional
methods, which fail to deliver robust inference under model misspecification
even within the doubly robust framework, our newly developed loss functions
address this limitation effectively.
arXiv link: http://arxiv.org/abs/2111.06818v3
Bounds for Treatment Effects in the Presence of Anticipatory Behavior
new policy before it occurs. Such anticipatory behavior can lead to units'
outcomes becoming dependent on their future treatment assignments. In this
paper, I employ a potential-outcomes framework to analyze the treatment effect
with anticipation. I start with a classical difference-in-differences model
with two time periods and provide identified sets with easy-to-implement
estimation and inference strategies for causal parameters. Empirical
applications and generalizations are provided. I illustrate my results by
analyzing the effect of an early retirement incentive program for teachers,
which the target units were likely to anticipate, on student achievement. The
empirical results show the result can be overestimated by up to 30% in the
worst case and demonstrate the potential pitfalls of failing to consider
anticipation in policy evaluation.
arXiv link: http://arxiv.org/abs/2111.06573v2
Generalized Kernel Ridge Regression for Causal Inference with Missing-at-Random Sample Selection
curves and semiparametric treatment effects in the setting where an analyst has
access to a selected sample rather than a random sample; only for select
observations, the outcome is observed. I assume selection is as good as random
conditional on treatment and a sufficiently rich set of observed covariates,
where the covariates are allowed to cause treatment or be caused by treatment
-- an extension of missingness-at-random (MAR). I propose estimators of means,
increments, and distributions of counterfactual outcomes with closed form
solutions in terms of kernel matrix operations, allowing treatment and
covariates to be discrete or continuous, and low, high, or infinite
dimensional. For the continuous treatment case, I prove uniform consistency
with finite sample rates. For the discrete treatment case, I prove root-n
consistency, Gaussian approximation, and semiparametric efficiency.
arXiv link: http://arxiv.org/abs/2111.05277v1
Bounding Treatment Effects by Pooling Limited Information across Observations
are valid under an unconfoundedness assumption. Our bounds are designed to be
robust in challenging situations, for example, when the conditioning variables
take on a large number of different values in the observed sample, or when the
overlap condition is violated. This robustness is achieved by only using
limited "pooling" of information across observations. Namely, the bounds are
constructed as sample averages over functions of the observed outcomes such
that the contribution of each outcome only depends on the treatment status of a
limited number of observations. No information pooling across observations
leads to so-called "Manski bounds", while unlimited information pooling leads
to standard inverse propensity score weighting. We explore the intermediate
range between these two extremes and provide corresponding inference methods.
We show in Monte Carlo experiments and through two empirical application that
our bounds are indeed robust and informative in practice.
arXiv link: http://arxiv.org/abs/2111.05243v5
Optimal Decision Rules Under Partial Identification
must decide between two policies to maximize social welfare (e.g., the
population mean of an outcome) based on a finite sample. The framework
introduced in this paper allows for various types of restrictions on the
structural parameter (e.g., the smoothness of a conditional mean potential
outcome function) and accommodates settings with partial identification of
social welfare. As the main theoretical result, I derive a finite-sample
optimal decision rule under the minimax regret criterion. This rule has a
simple form, yet achieves optimality among all decision rules; no ad hoc
restrictions are imposed on the class of decision rules. I apply my results to
the problem of whether to change an eligibility cutoff in a regression
discontinuity setup, and illustrate them in an empirical application to a
school construction program in Burkina Faso.
arXiv link: http://arxiv.org/abs/2111.04926v4
Pair copula constructions of point-optimal sign-based tests for predictive linear and nonlinear regressions
linear and nonlinear predictive regressions with endogenous, persistent
regressors, and disturbances exhibiting serial (nonlinear) dependence. The
proposed approach entails considering the entire dependence structure of the
signs to capture the serial dependence, and building feasible test statistics
based on pair copula constructions of the sign process. The tests are exact and
valid in the presence of heavy tailed and nonstandard errors, as well as
heterogeneous and persistent volatility. Furthermore, they may be inverted to
build confidence regions for the parameters of the regression function.
Finally, we adopt an adaptive approach based on the split-sample technique to
maximize the power of the test by finding an appropriate alternative
hypothesis. In a Monte Carlo study, we compare the performance of the proposed
"quasi"-point-optimal sign tests based on pair copula constructions by
comparing its size and power to those of certain existing tests that are
intended to be robust against heteroskedasticity. The simulation results
maintain the superiority of our procedures to existing popular tests.
arXiv link: http://arxiv.org/abs/2111.04919v1
Exponential GARCH-Ito Volatility Models
financial data, which can accommodate low-frequency volatility dynamics by
embedding the discrete-time non-linear exponential GARCH structure with
log-integrated volatility in a continuous instantaneous volatility process. The
key feature of the proposed model is that, unlike existing GARCH-Ito models,
the instantaneous volatility process has a non-linear structure, which ensures
that the log-integrated volatilities have the realized GARCH structure. We call
this the exponential realized GARCH-Ito (ERGI) model. Given the auto-regressive
structure of the log-integrated volatility, we propose a quasi-likelihood
estimation procedure for parameter estimation and establish its asymptotic
properties. We conduct a simulation study to check the finite sample
performance of the proposed model and an empirical study with 50 assets among
the S&P 500 compositions. The numerical studies show the advantages of the new
proposed model.
arXiv link: http://arxiv.org/abs/2111.04267v1
Rate-Optimal Cluster-Randomized Designs for Spatial Interference
between any two units but the extent of interference diminishes with spatial
distance. The causal estimand is the global average treatment effect, which
compares outcomes under the counterfactuals that all or no units are treated.
We study a class of designs in which space is partitioned into clusters that
are randomized into treatment and control. For each design, we estimate the
treatment effect using a Horvitz-Thompson estimator that compares the average
outcomes of units with all or no neighbors treated, where the neighborhood
radius is of the same order as the cluster size dictated by the design. We
derive the estimator's rate of convergence as a function of the design and
degree of interference and use this to obtain estimator-design pairs that
achieve near-optimal rates of convergence under relatively minimal assumptions
on interference. We prove that the estimators are asymptotically normal and
provide a variance estimator. For practical implementation of the designs, we
suggest partitioning space using clustering algorithms.
arXiv link: http://arxiv.org/abs/2111.04219v4
Sequential Kernel Embedding for Mediated and Time-Varying Dose Response Curves
response curves based on kernel ridge regression. By embedding Pearl's
mediation formula and Robins' g-formula with kernels, we allow treatments,
mediators, and covariates to be continuous in general spaces, and also allow
for nonlinear treatment-confounder feedback. Our key innovation is a
reproducing kernel Hilbert space technique called sequential kernel embedding,
which we use to construct simple estimators that account for complex feedback.
Our estimators preserve the generality of classic identification while also
achieving nonasymptotic uniform rates. In nonlinear simulations with many
covariates, we demonstrate strong performance. We estimate mediated and
time-varying dose response curves of the US Job Corps, and clean data that may
serve as a benchmark in future work. We extend our results to mediated and
time-varying treatment effects and counterfactual distributions, verifying
semiparametric efficiency and weak convergence.
arXiv link: http://arxiv.org/abs/2111.03950v5
Bootstrap inference for panel data quantile regression
panel data quantile regression models with fixed effects. We consider
random-weighted bootstrap resampling and formally establish its validity for
asymptotic inference. The bootstrap algorithm is simple to implement in
practice by using a weighted quantile regression estimation for fixed effects
panel data. We provide results under conditions that allow for temporal
dependence of observations within individuals, thus encompassing a large class
of possible empirical applications. Monte Carlo simulations provide numerical
evidence the proposed bootstrap methods have correct finite sample properties.
Finally, we provide an empirical illustration using the environmental Kuznets
curve.
arXiv link: http://arxiv.org/abs/2111.03626v1
Structural Breaks in Interactive Effects Panels and the Stock Market Reaction to COVID-19
empirical economic research. This is particularly true in panel data comprised
of many cross-sectional units, such as individuals, firms or countries, which
are all affected by major events. The COVID-19 pandemic has affected most
sectors of the global economy, and there is by now plenty of evidence to
support this. The impact on stock markets is, however, still unclear. The fact
that most markets seem to have partly recovered while the pandemic is still
ongoing suggests that the relationship between stock returns and COVID-19 has
been subject to structural change. It is therefore important to know if a
structural break has occurred and, if it has, to infer the date of the break.
In the present paper we take this last observation as a source of motivation to
develop a new break detection toolbox that is applicable to different sized
panels, easy to implement and robust to general forms of unobserved
heterogeneity. The toolbox, which is the first of its kind, includes a test for
structural change, a break date estimator, and a break date confidence
interval. Application to a panel covering 61 countries from January 3 to
September 25, 2020, leads to the detection of a structural break that is dated
to the first week of April. The effect of COVID-19 is negative before the break
and zero thereafter, implying that while markets did react, the reaction was
short-lived. A possible explanation for this is the quantitative easing
programs announced by central banks all over the world in the second half of
March.
arXiv link: http://arxiv.org/abs/2111.03035v1
Monitoring COVID-19-induced gender differences in teleworking rates using Mobile Network Data
home-based telework as means of sustaining the production. Generally,
teleworking arrangements impacts directly worker's efficiency and motivation.
The direction of this impact, however, depends on the balance between positive
effects of teleworking (e.g. increased flexibility and autonomy) and its
downsides (e.g. blurring boundaries between private and work life). Moreover,
these effects of teleworking can be amplified in case of vulnerable groups of
workers, such as women. The first step in understanding the implications of
teleworking on women is to have timely information on the extent of teleworking
by age and gender. In the absence of timely official statistics, in this paper
we propose a method for nowcasting the teleworking trends by age and gender for
20 Italian regions using mobile network operators (MNO) data. The method is
developed and validated using MNO data together with the Italian quarterly
Labour Force Survey. Our results confirm that the MNO data have the potential
to be used as a tool for monitoring gender and age differences in teleworking
patterns. This tool becomes even more important today as it could support the
adequate gender mainstreaming in the “Next Generation EU” recovery plan and
help to manage related social impacts of COVID-19 through policymaking.
arXiv link: http://arxiv.org/abs/2111.09442v2
occ2vec: A principal approach to representing occupations using natural language processing
occupations, which can be used in matching, predictive and causal modeling, and
other economic areas. In particular, we use it to score occupations on any
definable characteristic of interest, say the degree of greenness.
Using more than 17,000 occupation-specific text descriptors, we transform each
occupation into a high-dimensional vector using natural language processing.
Similar, we assign a vector to the target characteristic and estimate the
occupational degree of this characteristic as the cosine similarity between the
vectors. The main advantages of this approach are its universal applicability
and verifiability contrary to existing ad-hoc approaches. We extensively
validate our approach on several exercises and then use it to estimate the
occupational degree of charisma and emotional intelligence (EQ). We find that
occupations that score high on these tend to have higher educational
requirements. Turning to wages, highly charismatic occupations are either found
in the lower or upper tail in the wage distribution. This is not found for EQ,
where higher levels of EQ are generally correlated with higher wages.
arXiv link: http://arxiv.org/abs/2111.02528v2
Multiplicative Component GARCH Model of Intraday Volatility
intraday conditional volatility is expressed as the product of intraday
periodic component, intraday stochastic volatility component and daily
conditional volatility component. I extend the multiplicative component
intraday volatility model of Engle (2012) and Andersen and Bollerslev (1998) by
incorporating the durations between consecutive transactions. The model can be
applied to both regularly and irregularly spaced returns. I also provide a
nonparametric estimation technique of the intraday volatility periodicity. The
empirical results suggest the model can successfully capture the
interdependency of intraday returns.
arXiv link: http://arxiv.org/abs/2111.02376v1
Leveraging Causal Graphs for Blocking in Randomized Experiments
interest. Blocking is a technique to precisely estimate the causal effects when
the experimental material is not homogeneous. It involves stratifying the
available experimental material based on the covariates causing non-homogeneity
and then randomizing the treatment within those strata (known as blocks). This
eliminates the unwanted effect of the covariates on the causal effects of
interest. We investigate the problem of finding a stable set of covariates to
be used to form blocks, that minimizes the variance of the causal effect
estimates. Using the underlying causal graph, we provide an efficient algorithm
to obtain such a set for a general semi-Markovian causal model.
arXiv link: http://arxiv.org/abs/2111.02306v2
Autoregressive conditional duration modelling of high frequency data
Conditional Durations (ACD) framework (Engle and Russell 1998). I test
different distributions assumptions for the durations. The empirical results
suggest unconditional durations approach the Gamma distributions. Moreover,
compared with exponential distributions and Weibull distributions, the ACD
model with Gamma distributed innovations provide the best fit of SPY durations.
arXiv link: http://arxiv.org/abs/2111.02300v1
What drives the accuracy of PV output forecasts?
high demand for forecasting PV output to better integrate PV generation into
power grids. Systematic knowledge regarding the factors influencing forecast
accuracy is crucially important, but still mostly unknown. In this paper, we
review 180 papers on PV forecasts and extract a database of forecast errors for
statistical analysis. We show that among the forecast models, hybrid models
consistently outperform the others and will most likely be the future of PV
output forecasting. The use of data processing techniques is positively
correlated with the forecast quality, while the lengths of the forecast horizon
and out-of-sample test set have negative effects on the forecast accuracy. We
also found that the inclusion of numerical weather prediction variables, data
normalization, and data resampling are the most effective data processing
techniques. Furthermore, we found some evidence for cherry picking in reporting
errors and recommend that the test sets be at least one year to better assess
model performance. The paper also takes the first step towards establishing a
benchmark for assessing PV output forecasts.
arXiv link: http://arxiv.org/abs/2111.02092v1
Multiple-index Nonstationary Time Series Models: Robust Estimation Theory and Practice
that involve linear combinations of time trends, stationary variables and unit
root processes as regressors. The inclusion of the three different types of
time series, along with the use of a multiple-index structure for these
variables to circumvent the curse of dimensionality, is due to both theoretical
and practical considerations. The M-type estimators (including OLS, LAD,
Huber's estimator, quantile and expectile estimators, etc.) for the index
vectors are proposed, and their asymptotic properties are established, with the
aid of the generalized function approach to accommodate a wide class of loss
functions that may not be necessarily differentiable at every point. The
proposed multiple-index model is then applied to study the stock return
predictability, which reveals strong nonlinear predictability under various
loss measures. Monte Carlo simulations are also included to evaluate the
finite-sample performance of the proposed estimators.
arXiv link: http://arxiv.org/abs/2111.02023v1
Asymptotic in a class of network models with an increasing sub-Gamma degree sequence
asymptotic properties of a class of network models with binary values with a
general link function. In this paper, we release the degree sequences of the
binary networks under a general noisy mechanism with the discrete Laplace
mechanism as a special case. We establish the asymptotic result including both
consistency and asymptotically normality of the parameter estimator when the
number of parameters goes to infinity in a class of network models. Simulations
and a real data example are provided to illustrate asymptotic results.
arXiv link: http://arxiv.org/abs/2111.01301v4
Stock Price Prediction Using Time Series, Econometric, Machine Learning, and Deep Learning Models
predictive model for stock price prediction. According to the literature, if
predictive models are correctly designed and refined, they can painstakingly
and faithfully estimate future stock values. This paper demonstrates a set of
time series, econometric, and various learning-based models for stock price
prediction. The data of Infosys, ICICI, and SUN PHARMA from the period of
January 2004 to December 2019 was used here for training and testing the models
to know which model performs best in which sector. One time series model
(Holt-Winters Exponential Smoothing), one econometric model (ARIMA), two
machine Learning models (Random Forest and MARS), and two deep learning-based
models (simple RNN and LSTM) have been included in this paper. MARS has been
proved to be the best performing machine learning model, while LSTM has proved
to be the best performing deep learning model. But overall, for all three
sectors - IT (on Infosys data), Banking (on ICICI data), and Health (on SUN
PHARMA data), MARS has proved to be the best performing model in sales
forecasting.
arXiv link: http://arxiv.org/abs/2111.01137v1
Funding liquidity, credit risk and unconventional monetary policy in the Euro area: A GVAR approach
risk shocks and unconventional monetary policy within the Euro area. To this
aim, we estimate a financial GVAR model for Germany, France, Italy and Spain on
monthly data over the period 2006-2017. The interactions between repo markets,
sovereign bonds and banks' CDS spreads are analyzed, explicitly accounting for
the country-specific effects of the ECB's asset purchase programmes. Impulse
response analysis signals marginally significant core-periphery heterogeneity,
flight-to-quality effects and spillovers between liquidity conditions and
credit risk. Simulated reductions in ECB programmes tend to result in higher
government bond yields and bank CDS spreads, especially for Italy and Spain, as
well as in falling repo trade volumes and rising repo rates across the Euro
area. However, only a few responses to shocks achieve statistical significance.
arXiv link: http://arxiv.org/abs/2111.01078v2
Nonparametric Cointegrating Regression Functions with Endogeneity and Semi-Long Memory
endogeneity and semi-long memory. We assume that semi-long memory is produced
in the regressor process by tempering of random shock coefficients. The
fundamental properties of long memory processes are thus retained in the
regressor process. Nonparametric nonlinear cointegrating regressions with
serially dependent errors and endogenous regressors driven by long memory
innovations have been considered in Wang and Phillips (2016). That work also
implemented a statistical specification test for testing whether the regression
function follows a parametric form. The limit theory of test statistic involves
the local time of fractional Brownian motion. The present paper modifies the
test statistic to be suitable for the semi-long memory case. With this
modification, the limit theory for the test involves the local time of the
standard Brownian motion and is free of the unknown parameter d. Through
simulation studies, we investigate the properties of nonparametric regression
function estimation as well as test statistic. We also demonstrate the use of
test statistic through actual data sets.
arXiv link: http://arxiv.org/abs/2111.00972v3
Financial-cycle ratios and medium-term predictions of GDP: Evidence from the United States
document the ability of specific financial ratios from the housing market and
firms' aggregate balance sheets to predict GDP over medium-term horizons in the
United States. A cyclically adjusted house price-to-rent ratio and the
liabilities-to-income ratio of the non-financial non-corporate business sector
provide the best in-sample and out-of-sample predictions of GDP growth over
horizons of one to five years, based on a wide variety of rankings. Small
forecasting models that include these indicators outperform popular
high-dimensional models and forecast combinations. The predictive power of the
two ratios appears strong during both recessions and expansions, stable over
time, and consistent with well-established macro-finance theory.
arXiv link: http://arxiv.org/abs/2111.00822v3
On Time-Varying VAR Models: Estimation, Testing and Impulse Response Analysis
e.g., forecasting, modelling policy transmission mechanism, and measuring
connection of economic agents. To better capture the dynamics, this paper
introduces a new class of time-varying VAR models in which the coefficients and
covariance matrix of the error innovations are allowed to change smoothly over
time. Accordingly, we establish a set of theories, including the impulse
responses analyses subject to both of the short-run timing and the long-run
restrictions, an information criterion to select the optimal lag, and a
Wald-type test to determine the constant coefficients. Simulation studies are
conducted to evaluate the theoretical findings. Finally, we demonstrate the
empirical relevance and usefulness of the proposed methods through an
application to the transmission mechanism of U.S. monetary policy.
arXiv link: http://arxiv.org/abs/2111.00450v1
Productivity Convergence in Manufacturing: A Hierarchical Panel Data Approach
productivity convergence analysis has three problems that have yet to be
resolved: (1) little attempt has been made to explore the hierarchical
structure of industry-level datasets; (2) industry-level technology
heterogeneity has largely been ignored; and (3) cross-sectional dependence has
rarely been allowed for. This paper aims to address these three problems within
a hierarchical panel data framework. We propose an estimation procedure and
then derive the corresponding asymptotic theory. Finally, we apply the
framework to a dataset of 23 manufacturing industries from a wide range of
countries over the period 1963-2018. Our results show that both the
manufacturing industry as a whole and individual manufacturing industries at
the ISIC two-digit level exhibit strong conditional convergence in labour
productivity, but not unconditional convergence. In addition, our results show
that both global and industry-specific shocks are important in explaining the
convergence behaviours of the manufacturing industries.
arXiv link: http://arxiv.org/abs/2111.00449v1
CP Factor Model for Dynamic Tensors
series of multidimensional arrays, called tensor time series, preserving the
inherent multidimensional structure. In this paper, we present a factor model
approach, in a form similar to tensor CP decomposition, to the analysis of
high-dimensional dynamic tensor time series. As the loading vectors are
uniquely defined but not necessarily orthogonal, it is significantly different
from the existing tensor factor models based on Tucker-type tensor
decomposition. The model structure allows for a set of uncorrelated
one-dimensional latent dynamic factor processes, making it much more convenient
to study the underlying dynamics of the time series. A new high order
projection estimator is proposed for such a factor model, utilizing the special
structure and the idea of the higher order orthogonal iteration procedures
commonly used in Tucker-type tensor factor model and general tensor CP
decomposition procedures. Theoretical investigation provides statistical error
bounds for the proposed methods, which shows the significant advantage of
utilizing the special model structure. Simulation study is conducted to further
demonstrate the finite sample properties of the estimators. Real data
application is used to illustrate the model and its interpretations.
arXiv link: http://arxiv.org/abs/2110.15517v2
Coresets for Time Series Clustering
time series data. This problem has gained importance across many fields
including biology, medicine, and economics due to the proliferation of sensors
facilitating real-time measurement and rapid drop in storage costs. In
particular, we consider the setting where the time series data on $N$ entities
is generated from a Gaussian mixture model with autocorrelations over $k$
clusters in $R^d$. Our main contribution is an algorithm to construct
coresets for the maximum likelihood objective for this mixture model. Our
algorithm is efficient, and under a mild boundedness assumption on the
covariance matrices of the underlying Gaussians, the size of the coreset is
independent of the number of entities $N$ and the number of observations for
each entity, and depends only polynomially on $k$, $d$ and $1/\varepsilon$,
where $\varepsilon$ is the error parameter. We empirically assess the
performance of our coreset with synthetic data.
arXiv link: http://arxiv.org/abs/2110.15263v1
Testing and Estimating Structural Breaks in Time Series and Panel Data in Stata
and panel data. The longer the time span, the higher the likelihood that the
model parameters have changed as a result of major disruptive events, such as
the 2007--2008 financial crisis and the 2020 COVID--19 outbreak. Detecting the
existence of breaks, and dating them is therefore necessary, not only for
estimation purposes but also for understanding drivers of change and their
effect on relationships. This article introduces a new community contributed
command called xtbreak, which provides researchers with a complete toolbox for
analysing multiple structural breaks in time series and panel data. xtbreak can
detect the existence of breaks, determine their number and location, and
provide break date confidence intervals. The new command is used to explore
changes in the relationship between COVID--19 cases and deaths in the US, using
both aggregate and state level data, and in the relationship between approval
ratings and consumer confidence, using a panel of eight countries.
arXiv link: http://arxiv.org/abs/2110.14550v3
A Scalable Inference Method For Large Dynamic Economic Systems
decade due to the economy's digitisation. With the prevalence of often black
box data-driven machine learning methods, there is a necessity to develop
interpretable machine learning methods that can conduct econometric inference,
helping policymakers leverage the new nature of economic data. We therefore
present a novel Variational Bayesian Inference approach to incorporate a
time-varying parameter auto-regressive model which is scalable for big data.
Our model is applied to a large blockchain dataset containing prices,
transactions of individual actors, analyzing transactional flows and price
movements on a very granular level. The model is extendable to any dataset
which can be modelled as a dynamical system. We further improve the simple
state-space modelling by introducing non-linearities in the forward model with
the help of machine learning architectures.
arXiv link: http://arxiv.org/abs/2110.14346v1
Forecasting with a Panel Tobit Model
forecasts for a large cross-section of short time series of censored
observations. Our fully Bayesian approach allows us to flexibly estimate the
cross-sectional distribution of heterogeneous coefficients and then implicitly
use this distribution as prior to construct Bayes forecasts for the individual
time series. In addition to density forecasts, we construct set forecasts that
explicitly target the average coverage probability for the cross-section. We
present a novel application in which we forecast bank-level loan charge-off
rates for small banks.
arXiv link: http://arxiv.org/abs/2110.14117v2
Coupling the Gini and Angles to Evaluate Economic Dispersion
dispersion. They are not sensitive to inequality at the left tail of the
distribution, where it would matter most. This paper presents a new inequality
measurement tool that gives more weight to inequality at the lower end of the
distribution, it is based on the comparison of all value pairs and synthesizes
the dispersion of the whole distribution. The differences that sum to the Gini
coefficient are scaled by angular differences between observations. The
resulting index possesses a set of desirable properties, including
normalization, scale invariance, population invariance, transfer sensitivity,
and weak decomposability.
arXiv link: http://arxiv.org/abs/2110.13847v2
Regime-Switching Density Forecasts Using Economists' Scenarios
incorporate information on multiple scenarios defined by experts. We adopt a
regime-switching framework in which sets of scenarios ("views") are used as
Bayesian priors on economic regimes. Predictive densities coming from different
views are then combined by optimizing objective functions of density
forecasting. We illustrate the approach with an empirical application to
quarterly real-time forecasts of U.S. GDP growth, in which we exploit the Fed's
macroeconomic scenarios used for bank stress tests. We show that the approach
achieves good accuracy in terms of average predictive scores and good
calibration of forecast distributions. Moreover, it can be used to evaluate the
contribution of economists' scenarios to density forecast performance.
arXiv link: http://arxiv.org/abs/2110.13761v2
Inference in Regression Discontinuity Designs with High-Dimensional Covariates
covariates, possibly much more than the number of observations, can be used to
increase the precision of treatment effect estimates. We consider a two-step
estimator which first selects a small number of "important" covariates through
a localized Lasso-type procedure, and then, in a second step, estimates the
treatment effect by including the selected covariates linearly into the usual
local linear estimator. We provide an in-depth analysis of the algorithm's
theoretical properties, showing that, under an approximate sparsity condition,
the resulting estimator is asymptotically normal, with asymptotic bias and
variance that are conceptually similar to those obtained in low-dimensional
settings. Bandwidth selection and inference can be carried out using standard
methods. We also provide simulations and an empirical application.
arXiv link: http://arxiv.org/abs/2110.13725v3
Bayesian Estimation and Comparison of Conditional Moment Models
of the outcomes is specified up to a set of conditional moment restrictions.
The nonparametric exponentially tilted empirical likelihood function is
constructed to satisfy a sequence of unconditional moments based on an
increasing (in sample size) vector of approximating functions (such as tensor
splines based on the splines of each conditioning variable). For any given
sample size, results are robust to the number of expanded moments. We derive
Bernstein-von Mises theorems for the behavior of the posterior distribution
under both correct and incorrect specification of the conditional moments,
subject to growth rate conditions (slower under misspecification) on the number
of approximating functions. A large-sample theory for comparing different
conditional moment models is also developed. The central result is that the
marginal likelihood criterion selects the model that is less misspecified. We
also introduce sparsity-based model search for high-dimensional conditioning
variables, and provide efficient MCMC computations for high-dimensional
parameters. Along with clarifying examples, the framework is illustrated with
real-data applications to risk-factor determination in finance, and causal
inference under conditional ignorability.
arXiv link: http://arxiv.org/abs/2110.13531v1
Negotiating Networks in Oligopoly Markets for Price-Sensitive Products
sellers and buyers simultaneously in an oligopoly market for a price-sensitive
product. In this setting, the aim of the seller network is to come up with a
price for a given context such that the expected revenue is maximized by
considering the buyer's satisfaction as well. On the other hand, the aim of the
buyer network is to assign probability of purchase to the offered price to
mimic the real world buyers' responses while also showing price sensitivity
through its action. In other words, rejecting the unnecessarily high priced
products. Similar to generative adversarial networks, this framework
corresponds to a minimax two-player game. In our experiments with simulated and
real-world transaction data, we compared our framework with the baseline model
and demonstrated its potential through proposed evaluation metrics.
arXiv link: http://arxiv.org/abs/2110.13303v1
Covariate Balancing Methods for Randomized Controlled Trials Are Not Adversarially Robust
randomized trial is to split the population into control and treatment groups
then compare the average response of the treatment group receiving the
treatment to the control group receiving the placebo.
In order to ensure that the difference between the two groups is caused only
by the treatment, it is crucial that the control and the treatment groups have
similar statistics. Indeed, the validity and reliability of a trial are
determined by the similarity of two groups' statistics. Covariate balancing
methods increase the similarity between the distributions of the two groups'
covariates. However, often in practice, there are not enough samples to
accurately estimate the groups' covariate distributions. In this paper, we
empirically show that covariate balancing with the Standardized Means
Difference (SMD) covariate balancing measure, as well as Pocock's sequential
treatment assignment method, are susceptible to worst-case treatment
assignments. Worst-case treatment assignments are those admitted by the
covariate balance measure, but result in highest possible ATE estimation
errors. We developed an adversarial attack to find adversarial treatment
assignment for any given trial. Then, we provide an index to measure how close
the given trial is to the worst-case. To this end, we provide an
optimization-based algorithm, namely Adversarial Treatment ASsignment in
TREatment Effect Trials (ATASTREET), to find the adversarial treatment
assignments.
arXiv link: http://arxiv.org/abs/2110.13262v3
Functional instrumental variable regression with an application to estimating the impact of immigration on native wages
study the relationship between function-valued response and exogenous
explanatory variables. However, in practice, it is hard to expect that the
explanatory variables of interest are perfectly exogenous, due to, for example,
the presence of omitted variables and measurement error. Despite its empirical
relevance, it was not until recently that this issue of endogeneity was studied
in the literature on functional regression, and the development in this
direction does not seem to sufficiently meet practitioners' needs; for example,
this issue has been discussed with paying particular attention on consistent
estimation and thus distributional properties of the proposed estimators still
remain to be further explored. To fill this gap, this paper proposes new
consistent FPCA-based instrumental variable estimators and develops their
asymptotic properties in detail. Simulation experiments under a wide range of
settings show that the proposed estimators perform considerably well. We apply
our methodology to estimate the impact of immigration on native wages.
arXiv link: http://arxiv.org/abs/2110.12722v3
On Parameter Estimation in Unobserved Components Models subject to Linear Inequality Constraints
a nonstandard density using a multivariate Gaussian density. Such nonstandard
densities usually arise while developing posterior samplers for unobserved
components models involving inequality constraints on the parameters. For
instance, Chan et al. (2016) provided a new model of trend inflation with
linear inequality constraints on the stochastic trend. We implemented the
proposed quadratic programming-based method for this model and compared it to
the existing approximation. We observed that the proposed method works as well
as the existing approximation in terms of the final trend estimates while
achieving gains in terms of sample efficiency.
arXiv link: http://arxiv.org/abs/2110.12149v2
Slow Movers in Panel Data
movers (units with little within-variations). In the presence of many slow
movers, conventional econometric methods can fail to work. We propose a novel
method of robust inference for the average partial effects in correlated random
coefficient models robustly across various distributions of within-variations,
including the cases with many stayers and/or many slow movers in a unified
manner. In addition to this robustness property, our proposed method entails
smaller biases and hence improves accuracy in inference compared to existing
alternatives. Simulation studies demonstrate our theoretical claims about these
properties: the conventional 95% confidence interval covers the true parameter
value with 37-93% frequencies, whereas our proposed one achieves 93-96%
coverage frequencies.
arXiv link: http://arxiv.org/abs/2110.12041v1
DMS, AE, DAA: methods and applications of adaptive time series model selection, ensemble, and financial evaluation
Model Selection (DMS), Adaptive Ensemble (AE), and Dynamic Asset Allocation
(DAA). The methods respectively handle model selection, ensembling, and
contextual evaluation in financial time series. Empirically, we use the methods
to forecast the returns of four key indices in the US market, incorporating
information from the VIX and Yield curves. We present financial applications of
the learning results, including fully-automated portfolios and dynamic hedging
strategies. The strategies strongly outperform long-only benchmarks over our
testing period, spanning from Q4 2015 to the end of 2021. The key outputs of
the learning methods are interpreted during the 2020 market crash.
arXiv link: http://arxiv.org/abs/2110.11156v3
Attention Overload
alternatives compete for the decision maker's attention, and hence the
attention that each alternative receives decreases as the choice problem
becomes larger. Using this nonparametric restriction on the random attention
formation, we show that a fruitful revealed preference theory can be developed
and provide testable implications on the observed choice behavior that can be
used to (point or partially) identify the decision maker's preference and
attention frequency. We then enhance our attention overload model to
accommodate heterogeneous preferences. Due to the nonparametric nature of our
identifying assumption, we must discipline the amount of heterogeneity in the
choice model: we propose the idea of List-based Attention Overload, where
alternatives are presented to the decision makers as a list that correlates
with both heterogeneous preferences and random attention. We show that
preference and attention frequencies are (point or partially) identifiable
under nonparametric assumptions on the list and attention formation mechanisms,
even when the true underlying list is unknown to the researcher. Building on
our identification results, for both preference and attention frequencies, we
develop econometric methods for estimation and inference that are valid in
settings with a large number of alternatives and choice problems, a distinctive
feature of the economic environment we consider. We provide a software package
in R implementing our empirical methods, and illustrate them in a simulation
study.
arXiv link: http://arxiv.org/abs/2110.10650v4
One Instrument to Rule Them All: The Bias and Coverage of Just-ID IV
instrumental variables (just-ID IV) estimators, arguing that in most
microeconometric applications, the usual inference strategies are likely
reliable. Three widely-cited applications are used to explain why this is so.
We then consider pretesting strategies of the form $t_{1}>c$, where $t_{1}$ is
the first-stage $t$-statistic, and the first-stage sign is given. Although
pervasive in empirical practice, pretesting on the first-stage $F$-statistic
exacerbates bias and distorts inference. We show, however, that median bias is
both minimized and roughly halved by setting $c=0$, that is by screening on the
sign of the estimated first stage. This bias reduction is a free
lunch: conventional confidence interval coverage is unchanged by screening on
the estimated first-stage sign. To the extent that IV analysts sign-screen
already, these results strengthen the case for a sanguine view of the
finite-sample behavior of just-ID IV.
arXiv link: http://arxiv.org/abs/2110.10556v7
Bi-integrative analysis of two-dimensional heterogeneous panel data model
individuals and/or change over time have received increasingly more attention
in statistics and econometrics. This paper proposes a two-dimensional
heterogeneous panel regression model that incorporate a group structure of
individual heterogeneous effects with cohort formation for their
time-variations, which allows common coefficients between nonadjacent time
points. A bi-integrative procedure that detects the information regarding group
and cohort patterns simultaneously via a doubly penalized least square with
concave fused penalties is introduced. We use an alternating direction method
of multipliers (ADMM) algorithm that automatically bi-integrates the
two-dimensional heterogeneous panel data model pertaining to a common one.
Consistency and asymptotic normality for the proposed estimators are developed.
We show that the resulting estimators exhibit oracle properties, i.e., the
proposed estimator is asymptotically equivalent to the oracle estimator
obtained using the known group and cohort structures. Furthermore, the
simulation studies provide supportive evidence that the proposed method has
good finite sample performance. A real data empirical application has been
provided to highlight the proposed method.
arXiv link: http://arxiv.org/abs/2110.10480v1
Difference-in-Differences with Geocoded Microdata
at a specific location using geocoded microdata. This estimator compares units
immediately next to treatment (an inner-ring) to units just slightly further
away (an outer-ring). I introduce intuitive assumptions needed to identify the
average treatment effect among the affected units and illustrates pitfalls that
occur when these assumptions fail. Since one of these assumptions requires
knowledge of exactly how far treatment effects are experienced, I propose a new
method that relaxes this assumption and allows for nonparametric estimation
using partitioning-based least squares developed in Cattaneo et. al. (2019).
Since treatment effects typically decay/change over distance, this estimator
improves analysis by estimating a treatment effect curve as a function of
distance from treatment. This is contrast to the traditional method which, at
best, identifies the average effect of treatment. To illustrate the advantages
of this method, I show that Linden and Rockoff (2008) under estimate the
effects of increased crime risk on home values closest to the treatment and
overestimate how far the effects extend by selecting a treatment ring that is
too wide.
arXiv link: http://arxiv.org/abs/2110.10192v1
Revisiting identification concepts in Bayesian analysis
of statistical and econometric models. First, for unidentified models we
demonstrate that there are situations where the introduction of a
non-degenerate prior distribution can make a parameter that is nonidentified in
frequentist theory identified in Bayesian theory. In other situations, it is
preferable to work with the unidentified model and construct a Markov Chain
Monte Carlo (MCMC) algorithms for it instead of introducing identifying
assumptions. Second, for partially identified models we demonstrate how to
construct the prior and posterior distributions for the identified set
parameter and how to conduct Bayesian analysis. Finally, for models that
contain some parameters that are identified and others that are not we show
that marginalizing out the identified parameter from the likelihood with
respect to its conditional prior, given the nonidentified parameter, allows the
data to be informative about the nonidentified and partially identified
parameter. The paper provides examples and simulations that illustrate how to
implement our techniques.
arXiv link: http://arxiv.org/abs/2110.09954v1
Exact Bias Correction for Linear Adjustment of Randomized Controlled Trials
the linear regression estimator was biased for the analysis of randomized
controlled trials under the randomization model. Under Freedman's assumptions,
we derive exact closed-form bias corrections for the linear regression
estimator with and without treatment-by-covariate interactions. We show that
the limiting distribution of the bias corrected estimator is identical to the
uncorrected estimator, implying that the asymptotic gains from adjustment can
be attained without introducing any risk of bias. Taken together with results
from Lin (2013), our results show that Freedman's theoretical arguments against
the use of regression adjustment can be completely resolved with minor
modifications to practice.
arXiv link: http://arxiv.org/abs/2110.08425v2
Covariate Adjustment in Regression Discontinuity Designs
method for causal inference and program evaluation. While its canonical
formulation only requires a score and an outcome variable, it is common in
empirical work to encounter RD analyses where additional variables are used for
adjustment. This practice has led to misconceptions about the role of covariate
adjustment in RD analysis, from both methodological and empirical perspectives.
In this chapter, we review the different roles of covariate adjustment in RD
designs, and offer methodological guidance for its correct use.
arXiv link: http://arxiv.org/abs/2110.08410v2
Detecting long-range dependence for time-varying linear models
coefficient regression models, where the covariates and errors are locally
stationary, allowing complex temporal dynamics and heteroscedasticity. We
develop KPSS, R/S, V/S, and K/S-type statistics based on the nonparametric
residuals. Under the null hypothesis, the local alternatives as well as the
fixed alternatives, we derive the limiting distributions of the test
statistics. As the four types of test statistics could degenerate when the
time-varying mean, variance, long-run variance of errors, covariates, and the
intercept lie in certain hyperplanes, we show the bootstrap-assisted tests are
consistent under both degenerate and non-degenerate scenarios. In particular,
in the presence of covariates the exact local asymptotic power of the
bootstrap-assisted tests can enjoy the same order as that of the classical KPSS
test of long memory for strictly stationary series. The asymptotic theory is
built on a new Gaussian approximation technique for locally stationary
long-memory processes with short-memory covariates, which is of independent
interest. The effectiveness of our tests is demonstrated by extensive
simulation studies and real data analysis.
arXiv link: http://arxiv.org/abs/2110.08089v5
Choice probabilities and correlations in closed-form route choice models: specifications and drawbacks
correlations, of existing and new specifications of closed-form route choice
models with flexible correlation patterns, namely the Link Nested Logit (LNL),
the Paired Combinatorial Logit (PCL) and the more recent Combination of Nested
Logit (CoNL) models. Following a consolidated track in the literature, choice
probabilities and correlations of the Multinomial Probit (MNP) model by
(Daganzo and Sheffi, 1977) are taken as target. Laboratory experiments on
small/medium-size networks are illustrated, also leveraging a procedure for
practical calculation of correlations of any GEV models, proposed by (Marzano
2014). Results show that models with inherent limitations in the coverage of
the domain of feasible correlations yield unsatisfactory performance, whilst
the specifications of the CoNL proposed in the paper appear the best in fitting
both MNP correlations and probabilities. Performance of the models are
appreciably ameliorated by introducing lower bounds to the nesting parameters.
Overall, the paper provides guidance for the practical application of tested
models.
arXiv link: http://arxiv.org/abs/2110.07224v1
Machine Learning, Deep Learning, and Hedonic Methods for Real Estate Price Prediction
home values have been accumulating. For several decades, to estimate the sale
price of the residential properties, appraisers have been walking through the
properties, observing the property, collecting data, and making use of the
hedonic pricing models. However, this method bears some costs and by nature is
subjective and biased. To minimize human involvement and the biases in the real
estate appraisals and boost the accuracy of the real estate market price
prediction models, in this research we design data-efficient learning machines
capable of learning and extracting the relation or patterns between the inputs
(features for the house) and output (value of the houses). We compare the
performance of some machine learning and deep learning algorithms, specifically
artificial neural networks, random forest, and k nearest neighbor approaches to
that of hedonic method on house price prediction in the city of Boulder,
Colorado. Even though this study has been done over the houses in the city of
Boulder it can be generalized to the housing market in any cities. The results
indicate non-linear association between the dwelling features and dwelling
prices. In light of these findings, this study demonstrates that random forest
and artificial neural networks algorithms can be better alternatives over the
hedonic regression analysis for prediction of the house prices in the city of
Boulder, Colorado.
arXiv link: http://arxiv.org/abs/2110.07151v1
Efficient Estimation in NPIV Models: A Comparison of Various Neural Networks-Based Estimators
approximate complex functions of high dimensional variables more effectively
than linear sieves. We investigate the performance of various ANNs in
nonparametric instrumental variables (NPIV) models of moderately high
dimensional covariates that are relevant to empirical economics. We present two
efficient procedures for estimation and inference on a weighted average
derivative (WAD): an orthogonalized plug-in with optimally-weighted sieve
minimum distance (OP-OSMD) procedure and a sieve efficient score (ES)
procedure. Both estimators for WAD use ANN sieves to approximate the unknown
NPIV function and are root-n asymptotically normal and first-order equivalent.
We provide a detailed practitioner's recipe for implementing both efficient
procedures. We compare their finite-sample performances in various simulation
designs that involve smooth NPIV function of up to 13 continuous covariates,
different nonlinearities and covariate correlations. Some Monte Carlo findings
include: 1) tuning and optimization are more delicate in ANN estimation; 2)
given proper tuning, both ANN estimators with various architectures can perform
well; 3) easier to tune ANN OP-OSMD estimators than ANN ES estimators; 4)
stable inferences are more difficult to achieve with ANN (than spline)
estimators; 5) there are gaps between current implementations and approximation
theories. Finally, we apply ANN NPIV to estimate average partial derivatives in
two empirical demand examples with multivariate covariates.
arXiv link: http://arxiv.org/abs/2110.06763v4
Partial Identification of Marginal Treatment Effects with discrete instruments and misreported treatment
effect ($MTE$) when the binary treatment variable is potentially misreported
and the instrumental variable is discrete. Identification results are derived
under different sets of nonparametric assumptions. The identification results
are illustrated in identifying the marginal treatment effects of food stamps on
health.
arXiv link: http://arxiv.org/abs/2110.06285v3
Fixed $T$ Estimation of Linear Panel Data Models with Interactive Fixed Effects
interactive fixed effects, where one dimension of the panel, typically time,
may be fixed. To this end, a novel transformation is introduced that reduces
the model to a lower dimension, and, in doing so, relieves the model of
incidental parameters in the cross-section. The central result of this paper
demonstrates that transforming the model and then applying the principal
component (PC) estimator of bai_panel_2009 delivers $n$
consistent estimates of regression slope coefficients with $T$ fixed. Moreover,
these estimates are shown to be asymptotically unbiased in the presence of
cross-sectional dependence, serial dependence, and with the inclusion of
dynamic regressors, in stark contrast to the usual case. The large $n$, large
$T$ properties of this approach are also studied, where many of these results
carry over to the case in which $n$ is growing sufficiently fast relative to
$T$. Transforming the model also proves to be useful beyond estimation, a point
illustrated by showing that with $T$ fixed, the eigenvalue ratio test of
horenstein provides a consistent test for the number of factors when
applied to the transformed model.
arXiv link: http://arxiv.org/abs/2110.05579v1
$β$-Intact-VAE: Identifying and Estimating Causal Effects under Limited Overlap
and estimation of treatment effects (TEs) under limited overlap; that is, when
subjects with certain features belong to a single treatment group. We use a
latent variable to model a prognostic score which is widely used in
biostatistics and sufficient for TEs; i.e., we build a generative prognostic
model. We prove that the latent variable recovers a prognostic score, and the
model identifies individualized treatment effects. The model is then learned as
\beta-Intact-VAE--a new type of variational autoencoder (VAE). We derive the TE
error bounds that enable representations balanced for treatment groups
conditioned on individualized features. The proposed method is compared with
recent methods using (semi-)synthetic datasets.
arXiv link: http://arxiv.org/abs/2110.05225v1
Two-stage least squares with a randomly right censored outcome
estimate the causal effect of some endogenous regressors on a randomly right
censored outcome in the linear model. The proposal replaces the usual ordinary
least squares regressions of the standard 2SLS by weighted least squares
regressions. The weights correspond to the inverse probability of censoring. We
show consistency and asymptotic normality of the estimator. The estimator
exhibits good finite sample performances in simulations.
arXiv link: http://arxiv.org/abs/2110.05107v1
High-dimensional Inference for Dynamic Treatment Effects
inference, particularly when confronted with high-dimensional confounders.
Doubly robust (DR) approaches have emerged as promising tools for estimating
treatment effects due to their flexibility. However, we showcase that the
traditional DR approaches that only focus on the DR representation of the
expected outcomes may fall short of delivering optimal results. In this paper,
we propose a novel DR representation for intermediate conditional outcome
models that leads to superior robustness guarantees. The proposed method
achieves consistency even with high-dimensional confounders, as long as at
least one nuisance function is appropriately parametrized for each exposure
time and treatment path. Our results represent a significant step forward as
they provide new robustness guarantees. The key to achieving these results is
our new DR representation, which offers superior inferential performance while
requiring weaker assumptions. Lastly, we confirm our findings in practice
through simulations and a real data application.
arXiv link: http://arxiv.org/abs/2110.04924v4
Smooth Tests for Normality in ANOVA
variance (ANOVA) models, yet it is seldom subjected to formal testing in
practice. In this paper, we develop Neyman's smooth tests for assessing
normality in a broad class of ANOVA models. The proposed test statistics are
constructed via the Gaussian probability integral transformation of ANOVA
residuals and are shown to follow an asymptotic Chi-square distribution under
the null hypothesis, with degrees of freedom determined by the dimension of the
smooth model. We further propose a data-driven selection of the model dimension
based on a modified Schwarz's criterion. Monte Carlo simulations demonstrate
that the tests maintain the nominal size and achieve high power against a wide
range of alternatives. Our framework thus provides a systematic and effective
tool for formally validating the normality assumption in ANOVA models.
arXiv link: http://arxiv.org/abs/2110.04849v3
Nonparametric Tests of Conditional Independence for Time Series
time series data. Our methods are motivated from the difference between joint
conditional cumulative distribution function (CDF) and the product of
conditional CDFs. The difference is transformed into a proper conditional
moment restriction (CMR), which forms the basis for our testing procedure. Our
test statistics are then constructed using the integrated moment restrictions
that are equivalent to the CMR. We establish the asymptotic behavior of the
test statistics under the null, the alternative, and the sequence of local
alternatives converging to conditional independence at the parametric rate. Our
tests are implemented with the assistance of a multiplier bootstrap. Monte
Carlo simulations are conducted to evaluate the finite sample performance of
the proposed tests. We apply our tests to examine the predictability of equity
risk premium using variance risk premium for different horizons and find that
there exist various degrees of nonlinear predictability at mid-run and long-run
horizons.
arXiv link: http://arxiv.org/abs/2110.04847v1
Various issues around the L1-norm distance
familiarizing researchers working in applied fields -- such as physics or
economics -- with notions or formulas that they use daily without always
identifying all their theoretical features or potentialities. Various
situations where the L1-norm distance E|X-Y| between real-valued random
variables intervene are closely examined. The axiomatic surrounding this
distance is also explored. We constantly try to build bridges between the
concrete uses of E|X-Y| and the underlying probabilistic model. An alternative
interpretation of this distance is also examined, as well as its relation to
the Gini index (economics) and the Lukaszyk-Karmovsky distance (physics). The
main contributions are the following: (a) We show that under independence,
triangle inequality holds for the normalized form E|X-Y|/(E|X| + E|Y|). (b) In
order to present a concrete advance, we determine the analytic form of E|X-Y|
and of its normalized expression when X and Y are independent with Gaussian or
uniform distribution. The resulting formulas generalize relevant tools already
in use in areas such as physics and economics. (c) We propose with all the
required rigor a brief one-dimensional introduction to the optimal transport
problem, essentially for a L1 cost function. The chosen illustrations and
examples should be of great help for newcomers to the field. New proofs and new
results are proposed.
arXiv link: http://arxiv.org/abs/2110.04787v3
On the asymptotic behavior of bubble date estimators
to allow the forth regime followed by the unit root process after recovery. We
provide the asymptotic and finite sample justification of the consistency of
the collapse date estimator in the two-regime AR(1) model. The consistency
allows us to split the sample before and after the date of collapse and to
consider the estimation of the date of exuberation and date of recovery
separately. We have also found that the limiting behavior of the recovery date
varies depending on the extent of explosiveness and recovering.
arXiv link: http://arxiv.org/abs/2110.04500v3
A Primer on Deep Learning for Causal Inference
deep neural networks under the potential outcomes framework. It provides an
intuitive introduction on how deep learning can be used to estimate/predict
heterogeneous treatment effects and extend causal inference to settings where
confounding is non-linear, time varying, or encoded in text, networks, and
images. To maximize accessibility, we also introduce prerequisite concepts from
causal inference and deep learning. The survey differs from other treatments of
deep learning and causal inference in its sharp focus on observational causal
estimation, its extended exposition of key algorithms, and its detailed
tutorials for implementing, training, and selecting among deep estimators in
Tensorflow 2 available at github.com/kochbj/Deep-Learning-for-Causal-Inference.
arXiv link: http://arxiv.org/abs/2110.04442v2
Estimating High Dimensional Monotone Index Models by Iterative Convex Optimization1
monotone index models. This class of models has been popular in the applied and
theoretical econometrics literatures as it includes discrete choice,
nonparametric transformation, and duration models. A main advantage of our
approach is computational. For instance, rank estimation procedures such as
those proposed in Han (1987) and Cavanagh and Sherman (1998) that optimize a
nonsmooth, non convex objective function are difficult to use with more than a
few regressors and so limits their use in with economic data sets. For such
monotone index models with increasing dimension, we propose to use a new class
of estimators based on batched gradient descent (BGD) involving nonparametric
methods such as kernel estimation or sieve estimation, and study their
asymptotic properties. The BGD algorithm uses an iterative procedure where the
key step exploits a strictly convex objective function, resulting in
computational advantages. A contribution of our approach is that our model is
large dimensional and semiparametric and so does not require the use of
parametric distributional assumptions.
arXiv link: http://arxiv.org/abs/2110.04388v2
Dyadic double/debiased machine learning for analyzing determinants of free trade agreements
about parameters in econometric models using machine learning for nuisance
parameters estimation when data are dyadic. We propose a dyadic cross fitting
method to remove over-fitting biases under arbitrary dyadic dependence.
Together with the use of Neyman orthogonal scores, this novel cross fitting
method enables root-$n$ consistent estimation and inference robustly against
dyadic dependence. We illustrate an application of our general framework to
high-dimensional network link formation models. With this method applied to
empirical data of international economic networks, we reexamine determinants of
free trade agreements (FTA) viewed as links formed in the dyad composed of
world economies. We document that standard methods may lead to misleading
conclusions for numerous classic determinants of FTA formation due to biased
point estimates or standard errors which are too small.
arXiv link: http://arxiv.org/abs/2110.04365v3
Many Proxy Controls
unobserved confounding factors. The proxies are divided into two sets that are
independent conditional on the confounders. One set of proxies are `negative
control treatments' and the other are `negative control outcomes'. Existing
work applies to low-dimensional settings with a fixed number of proxies and
confounders. In this work we consider linear models with many proxy controls
and possibly many confounders. A key insight is that if each group of proxies
is strictly larger than the number of confounding factors, then a matrix of
nuisance parameters has a low-rank structure and a vector of nuisance
parameters has a sparse structure. We can exploit the rank-restriction and
sparsity to reduce the number of free parameters to be estimated. The number of
unobserved confounders is not known a priori but we show that it is identified,
and we apply penalization methods to adapt to this quantity. We provide an
estimator with a closed-form as well as a doubly-robust estimator that must be
evaluated using numerical methods. We provide conditions under which our
doubly-robust estimator is uniformly root-$n$ consistent, asymptotically
centered normal, and our suggested confidence intervals have asymptotically
correct coverage. We provide simulation evidence that our methods achieve
better performance than existing approaches in high dimensions, particularly
when the number of proxies is substantially larger than the number of
confounders.
arXiv link: http://arxiv.org/abs/2110.03973v1
Heterogeneous Overdispersed Count Data Regressions via Double Penalized Estimations
$\ell_1$-regularized for heterogeneous overdispersed count data via negative
binomial regressions. Under the restricted eigenvalue conditions, we prove the
oracle inequalities for Lasso estimators of two partial regression coefficients
for the first time, using concentration inequalities of empirical processes.
Furthermore, derived from the oracle inequalities, the consistency and
convergence rate for the estimators are the theoretical guarantees for further
statistical inference. Finally, both simulations and a real data analysis
demonstrate that the new methods are effective.
arXiv link: http://arxiv.org/abs/2110.03552v2
Investigating Growth at Risk Using a Multi-country Non-parametric Quantile Factor Model
each quantile, the response function is a convex combination of a linear model
and a non-linear function, which we approximate using Bayesian Additive
Regression Trees (BART). Cross-sectional information at the pth quantile is
captured through a conditionally heteroscedastic latent factor. The
non-parametric feature of our model enhances flexibility, while the panel
feature, by exploiting cross-country information, increases the number of
observations in the tails. We develop Bayesian Markov chain Monte Carlo (MCMC)
methods for estimation and forecasting with our quantile factor BART model
(QF-BART), and apply them to study growth at risk dynamics in a panel of 11
advanced economies.
arXiv link: http://arxiv.org/abs/2110.03411v1
Solving Multistage Stochastic Linear Programming via Regularized Linear Decision Rules: An Application to Hydrothermal Dispatch Planning
challenge for many application areas. Long-term hydrothermal dispatch planning
(LHDP) materializes this challenge in a real-world problem that affects
electricity markets, economies, and natural resources worldwide. No closed-form
solutions are available for MSLP and the definition of non-anticipative
policies with high-quality out-of-sample performance is crucial. Linear
decision rules (LDR) provide an interesting simulation-based framework for
finding high-quality policies for MSLP through two-stage stochastic models. In
practical applications, however, the number of parameters to be estimated when
using an LDR may be close to or higher than the number of scenarios of the
sample average approximation problem, thereby generating an in-sample overfit
and poor performances in out-of-sample simulations. In this paper, we propose a
novel regularized LDR to solve MSLP based on the AdaLASSO (adaptive least
absolute shrinkage and selection operator). The goal is to use the parsimony
principle, as largely studied in high-dimensional linear regression models, to
obtain better out-of-sample performance for LDR applied to MSLP. Computational
experiments show that the overfit threat is non-negligible when using classical
non-regularized LDR to solve the LHDP, one of the most studied MSLP with
relevant applications. Our analysis highlights the following benefits of the
proposed framework in comparison to the non-regularized benchmark: 1)
significant reductions in the number of non-zero coefficients (model
parsimony), 2) substantial cost reductions in out-of-sample evaluations, and 3)
improved spot-price profiles.
arXiv link: http://arxiv.org/abs/2110.03146v3
Robust Generalized Method of Moments: A Finite Sample Viewpoint
parameter is identified by a set of moment conditions. A generic method of
solving moment conditions is the Generalized Method of Moments (GMM). However,
classical GMM estimation is potentially very sensitive to outliers. Robustified
GMM estimators have been developed in the past, but suffer from several
drawbacks: computational intractability, poor dimension-dependence, and no
quantitative recovery guarantees in the presence of a constant fraction of
outliers. In this work, we develop the first computationally efficient GMM
estimator (under intuitive assumptions) that can tolerate a constant $\epsilon$
fraction of adversarially corrupted samples, and that has an $\ell_2$ recovery
guarantee of $O(\epsilon)$. To achieve this, we draw upon and extend a
recent line of work on algorithmic robust statistics for related but simpler
problems such as mean estimation, linear regression and stochastic
optimization. As two examples of the generality of our algorithm, we show how
our estimation algorithm and assumptions apply to instrumental variables linear
and logistic regression. Moreover, we experimentally validate that our
estimator outperforms classical IV regression and two-stage Huber regression on
synthetic and semi-synthetic datasets with corruption.
arXiv link: http://arxiv.org/abs/2110.03070v2
RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests
of high-dimensional or non-parametric regression functions.
$n$-consistent and asymptotically normal estimation of the object of
interest requires debiasing to reduce the effects of regularization and/or
model selection on the object of interest. Debiasing is typically achieved by
adding a correction term to the plug-in estimator of the functional, which
leads to properties such as semi-parametric efficiency, double robustness, and
Neyman orthogonality. We implement an automatic debiasing procedure based on
automatically learning the Riesz representation of the linear functional using
Neural Nets and Random Forests. Our method only relies on black-box evaluation
oracle access to the linear functional and does not require knowledge of its
analytic form. We propose a multitasking Neural Net debiasing method with
stochastic gradient descent minimization of a combined Riesz representer and
regression loss, while sharing representation layers for the two functions. We
also propose a Random Forest method which learns a locally linear
representation of the Riesz function. Even though our method applies to
arbitrary functionals, we experimentally find that it performs well compared to
the state of art neural net based algorithm of Shi et al. (2019) for the case
of the average treatment effect functional. We also evaluate our method on the
problem of estimating average marginal effects with continuous treatments,
using semi-synthetic data of gasoline price changes on gasoline demand.
arXiv link: http://arxiv.org/abs/2110.03031v3
New insights into price drivers of crude oil futures markets: Evidence from quantile ARDL approach
crude oil futures prices during the COVID-19 pandemic period. We perform
comparative analysis of WTI and newly-launched Shanghai crude oil futures (SC)
via the Autoregressive Distributed Lag (ARDL) model and Quantile Autoregressive
Distributed Lag (QARDL) model. The empirical results confirm that economic
policy uncertainty, stock markets, interest rates and coronavirus panic are
important drivers of WTI futures prices. Our findings also suggest that the US
and China's stock markets play vital roles in movements of SC futures prices.
Meanwhile, CSI300 stock index has a significant positive short-run impact on SC
futures prices while S&P500 prices possess a positive nexus with SC futures
prices both in long-run and short-run. Overall, these empirical evidences
provide practical implications for investors and policymakers.
arXiv link: http://arxiv.org/abs/2110.02693v1
Distcomp: Comparing distributions
whether or not two distributions differ at each possible value while
controlling the probability of any false positive, even in finite samples.
Syntax and the underlying methodology (from Goldman and Kaplan, 2018) are
discussed. Multiple examples illustrate the distcomp command, including
revisiting the experimental data of Gneezy and List (2006) and the regression
discontinuity design of Cattaneo, Frandsen, and Titiunik (2015).
arXiv link: http://arxiv.org/abs/2110.02327v1
Gambits: Theory and Evidence
of Gambits. A Gambit is a combination of psychological and technical factors
designed to disrupt predictable play. Chess provides an environment to study
gambits and behavioral game theory. Our theory is based on the Bellman
optimality path for sequential decision-making. This allows us to calculate the
$Q$-values of a Gambit where material (usually a pawn) is sacrificed for
dynamic play. On the empirical side, we study the effectiveness of a number of
popular chess Gambits. This is a natural setting as chess Gambits require a
sequential assessment of a set of moves (a.k.a. policy) after the Gambit has
been accepted. Our analysis uses Stockfish 14.1 to calculate the optimal
Bellman $Q$ values, which fundamentally measures if a position is winning or
losing. To test whether Bellman's equation holds in play, we estimate the
transition probabilities to the next board state via a database of expert human
play. This then allows us to test whether the Gambiteer is following the
optimal path in his decision-making. Our methodology is applied to the popular
Stafford and reverse Stafford (a.k.a. Boden-Kieretsky-Morphy) Gambit and other
common ones including the Smith-Morra, Goring, Danish and Halloween Gambits. We
build on research in human decision-making by proving an irrational skewness
preference within agents in chess. We conclude with directions for future
research.
arXiv link: http://arxiv.org/abs/2110.02755v5
A New Multivariate Predictive Model for Stock Returns
returns could be predicted. This research aims to create a new multivariate
model, which includes dividend yield, earnings-to-price ratio, book-to-market
ratio as well as consumption-wealth ratio as explanatory variables, for future
stock returns predictions. The new multivariate model will be assessed for its
forecasting performance using empirical analysis. The empirical analysis is
performed on S&P500 quarterly data from Quarter 1, 1952 to Quarter 4, 2019 as
well as S&P500 monthly data from Month 12, 1920 to Month 12, 2019. Results have
shown this new multivariate model has predictability for future stock returns.
When compared to other benchmark models, the new multivariate model performs
the best in terms of the Root Mean Squared Error (RMSE) most of the time.
arXiv link: http://arxiv.org/abs/2110.01873v1
Beware the Gini Index! A New Inequality Measure
example, a Pareto distribution with exponent 1.5 (which has infinite variance)
has the same Gini index as any exponential distribution (a mere 0.5). This is
because the Gini index is relatively robust to extreme observations: while a
statistic's robustness to extremes is desirable for data potentially distorted
by outliers, it is misleading for heavy-tailed distributions, which inherently
exhibit extremes. We propose an alternative inequality index: the variance
normalized by the second moment. This ratio is more stable (hence more
reliable) for large samples from an infinite-variance distribution than the
Gini index paradoxically. Moreover, the new index satisfies the normative
axioms of inequality measurement; notably, it is decomposable into inequality
within and between subgroups, unlike the Gini index.
arXiv link: http://arxiv.org/abs/2110.01741v1
Effect or Treatment Heterogeneity? Policy Evaluation with Aggregated and Disaggregated Treatments
be disaggregated into multiple treatment versions. Thus, effects can be
heterogeneous due to either effect or treatment heterogeneity. We propose a
decomposition method that uncovers masked heterogeneity, avoids spurious
discoveries, and evaluates treatment assignment quality. The estimation and
inference procedure based on double/debiased machine learning allows for
high-dimensional confounding, many treatments and extreme propensity scores.
Our applications suggest that heterogeneous effects of smoking on birthweight
are partially due to different smoking intensities and that gender gaps in Job
Corps effectiveness are largely explained by differential selection into
vocational training.
arXiv link: http://arxiv.org/abs/2110.01427v4
Identification and Estimation in a Time-Varying Endogenous Random Coefficient Panel Data Model
where regressors can be correlated with time-varying and individual-specific
random coefficients through both a fixed effect and a time-varying random
shock. I develop a new panel data-based identification method to identify the
average partial effect and the local average response function. The
identification strategy employs a sufficient statistic to control for the fixed
effect and a conditional control variable for the random shock. Conditional on
these two controls, the residual variation in the regressors is driven solely
by the exogenous instrumental variables, and thus can be exploited to identify
the parameters of interest. The constructive identification analysis leads to
three-step series estimators, for which I establish rates of convergence and
asymptotic normality. To illustrate the method, I estimate a heterogeneous
Cobb-Douglas production function for manufacturing firms in China, finding
substantial variations in output elasticities across firms.
arXiv link: http://arxiv.org/abs/2110.00982v2
Hierarchical Gaussian Process Models for Regression Discontinuity/Kink under Sharp and Fuzzy Designs
Regression Discontinuity/Kink (RD/RK) under sharp and fuzzy designs. Our
estimators are based on Gaussian Process (GP) regression and classification.
The GP methods are powerful probabilistic machine learning approaches that are
advantageous in terms of derivative estimation and uncertainty quantification,
facilitating RK estimation and inference of RD/RK models. These estimators are
extended to hierarchical GP models with an intermediate Bayesian neural network
layer and can be characterized as hybrid deep learning models. Monte Carlo
simulations show that our estimators perform comparably to and sometimes better
than competing estimators in terms of precision, coverage and interval length.
The hierarchical GP models considerably improve upon one-layer GP models. We
apply the proposed methods to estimate the incumbency advantage of US house
elections. Our estimations suggest a significant incumbency advantage in terms
of both vote share and probability of winning in the next elections. Lastly we
present an extension to accommodate covariate adjustment.
arXiv link: http://arxiv.org/abs/2110.00921v2
Probabilistic Prediction for Binary Treatment Choice: with focus on personalized medicine
treatment choice with sample data, using maximum regret to evaluate the
performance of treatment rules. The specific new contribution is to study as-if
optimization using estimates of illness probabilities in clinical choice
between surveillance and aggressive treatment. Beyond its specifics, the paper
sends a broad message. Statisticians and computer scientists have addressed
conditional prediction for decision making in indirect ways, the former
applying classical statistical theory and the latter measuring prediction
accuracy in test samples. Neither approach is satisfactory. Statistical
decision theory provides a coherent, generally applicable methodology.
arXiv link: http://arxiv.org/abs/2110.00864v1
Relative Contagiousness of Emerging Virus Variants: An Analysis of the Alpha, Delta, and Omicron SARS-CoV-2 Variants
of two virus variants. Maximum likelihood estimation and inference is
conveniently invariant to variation in the total number of cases over the
sample period and can be expressed as a logistic regression. We apply the model
to Danish SARS-CoV-2 variant data. We estimate the reproduction numbers of
Alpha and Delta to be larger than that of the ancestral variant by a factor of
1.51 [CI 95%: 1.50, 1.53] and 3.28 [CI 95%: 3.01, 3.58], respectively. In a
predominately vaccinated population, we estimate Omicron to be 3.15 [CI 95%:
2.83, 3.50] times more infectious than Delta. Forecasting the proportion of an
emerging virus variant is straight forward and we proceed to show how the
effective reproduction number for a new variant can be estimated without
contemporary sequencing results. This is useful for assessing the state of the
pandemic in real time as we illustrate empirically with the inferred effective
reproduction number for the Alpha variant.
arXiv link: http://arxiv.org/abs/2110.00533v3
Stochastic volatility model with range-based correction and leverage
within the framework of stochastic volatility with leverage. A new
representation of the probability density function for the price range is
provided, and its accurate sampling algorithm is developed. A Bayesian
estimation using Markov chain Monte Carlo (MCMC) method is provided for the
model parameters and unobserved variables. MCMC samples can be generated
rigorously, despite the estimation procedure requiring sampling from a density
function with the sum of an infinite series. The empirical results obtained
using data from the U.S. market indices are consistent with the stylized facts
in the financial market, such as the existence of the leverage effect. In
addition, to explore the model's predictive ability, a model comparison based
on the volatility forecast performance is conducted.
arXiv link: http://arxiv.org/abs/2110.00039v2
Causal Matrix Completion
sparse subset of noisy observations. Traditionally, it is assumed that the
entries of the matrix are "missing completely at random" (MCAR), i.e., each
entry is revealed at random, independent of everything else, with uniform
probability. This is likely unrealistic due to the presence of "latent
confounders", i.e., unobserved factors that determine both the entries of the
underlying matrix and the missingness pattern in the observed matrix. For
example, in the context of movie recommender systems -- a canonical application
for matrix completion -- a user who vehemently dislikes horror films is
unlikely to ever watch horror films. In general, these confounders yield
"missing not at random" (MNAR) data, which can severely impact any inference
procedure that does not correct for this bias. We develop a formal causal model
for matrix completion through the language of potential outcomes, and provide
novel identification arguments for a variety of causal estimands of interest.
We design a procedure, which we call "synthetic nearest neighbors" (SNN), to
estimate these causal estimands. We prove finite-sample consistency and
asymptotic normality of our estimator. Our analysis also leads to new
theoretical results for the matrix completion literature. In particular, we
establish entry-wise, i.e., max-norm, finite-sample consistency and asymptotic
normality results for matrix completion with MNAR data. As a special case, this
also provides entry-wise bounds for matrix completion with MCAR data. Across
simulated and real data, we demonstrate the efficacy of our proposed estimator.
arXiv link: http://arxiv.org/abs/2109.15154v1
Towards Principled Causal Effect Estimation by Deep Identifiable Models
treatment effects (TEs). Representing the confounder as a latent variable, we
propose Intact-VAE, a new variant of variational autoencoder (VAE), motivated
by the prognostic score that is sufficient for identifying TEs. Our VAE also
naturally gives representations balanced for treatment groups, using its prior.
Experiments on (semi-)synthetic datasets show state-of-the-art performance
under diverse settings, including unobserved confounding. Based on the
identifiability of our model, we prove identification of TEs under
unconfoundedness, and also discuss (possible) extensions to harder settings.
arXiv link: http://arxiv.org/abs/2109.15062v2
Nonparametric Bounds on Treatment Effects with Imperfect Instruments
nonparametric models. We derive nonparametric bounds on the average treatment
effect when an imperfect instrument is available. As in Nevo and Rosen (2012),
we assume that the correlation between the imperfect instrument and the
unobserved latent variables has the same sign as the correlation between the
endogenous variable and the latent variables. We show that the monotone
treatment selection and monotone instrumental variable restrictions, introduced
by Manski and Pepper (2000, 2009), jointly imply this assumption. Moreover, we
show how the monotone treatment response assumption can help tighten the
bounds. The identified set can be written in the form of intersection bounds,
which is more conducive to inference. We illustrate our methodology using the
National Longitudinal Survey of Young Men data to estimate returns to
schooling.
arXiv link: http://arxiv.org/abs/2109.14785v1
Testing the Presence of Implicit Hiring Quotas with Application to German Universities
and economics in particular. This paper introduces a test to detect an
under-researched form of hiring bias: implicit quotas. I derive a test under
the Null of random hiring that requires no information about individual hires
under some assumptions. I derive the asymptotic distribution of this test
statistic and, as an alternative, propose a parametric bootstrap procedure that
samples from the exact distribution. This test can be used to analyze a variety
of other hiring settings. I analyze the distribution of female professors at
German universities across 50 different disciplines. I show that the
distribution of women, given the average number of women in the respective
field, is highly unlikely to result from a random allocation of women across
departments and more likely to stem from an implicit quota of one or two women
on the department level. I also show that a large part of the variation in the
share of women across STEM and non-STEM disciplines could be explained by a
two-women quota on the department level. These findings have important
implications for the potential effectiveness of policies aimed at reducing
underrepresentation and providing evidence of how stakeholders perceive and
evaluate diversity.
arXiv link: http://arxiv.org/abs/2109.14343v2
Forecasting the COVID-19 vaccine uptake rate: An infodemiological study in the US
approved emergency vaccines. Public-health practitioners and policymakers must
understand the predicted populational willingness for vaccines and implement
relevant stimulation measures. This study developed a framework for predicting
vaccination uptake rate based on traditional clinical data-involving an
autoregressive model with autoregressive integrated moving average (ARIMA)- and
innovative web search queries-involving a linear regression with ordinary least
squares/least absolute shrinkage and selection operator, and machine-learning
with boost and random forest. For accuracy, we implemented a stacking
regression for the clinical data and web search queries. The stacked regression
of ARIMA (1,0,8) for clinical data and boost with support vector machine for
web data formed the best model for forecasting vaccination speed in the US. The
stacked regression provided a more accurate forecast. These results can help
governments and policymakers predict vaccine demand and finance relevant
programs.
arXiv link: http://arxiv.org/abs/2109.13971v2
No-Regret Forecasting with Egalitarian Committees
equal-weight scheme tends to outperform sophisticated methods of combining
individual forecasts. Exploiting this finding, we propose a hedge egalitarian
committees algorithm (HECA), which can be implemented via mixed integer
quadratic programming. Specifically, egalitarian committees are formed by the
ridge regression with shrinkage toward equal weights; subsequently, the
forecasts provided by these committees are averaged by the hedge algorithm. We
establish the no-regret property of HECA. Using data collected from the ECB
Survey of Professional Forecasters, we find the superiority of HECA relative to
the equal-weight scheme during the COVID-19 recession.
arXiv link: http://arxiv.org/abs/2109.13801v1
Macroeconomic forecasting with LSTM and mixed frequency time series data
when applyingwith macroeconomic time series data sampled at different
frequencies. We first present how theconventional LSTM model can be adapted to
the time series observed at mixed frequencies when thesame mismatch ratio is
applied for all pairs of low-frequency output and higher-frequency variable.
Togeneralize the LSTM to the case of multiple mismatch ratios, we adopt the
unrestricted Mixed DAtaSampling (U-MIDAS) scheme (Foroni et al., 2015) into the
LSTM architecture. We assess via bothMonte Carlo simulations and empirical
application the out-of-sample predictive performance. Ourproposed models
outperform the restricted MIDAS model even in a set up favorable to the
MIDASestimator. For real world application, we study forecasting a quarterly
growth rate of Thai realGDP using a vast array of macroeconomic indicators both
quarterly and monthly. Our LSTM withU-MIDAS scheme easily beats the simple
benchmark AR(1) model at all horizons, but outperformsthe strong benchmark
univariate LSTM only at one and six months ahead. Nonetheless, we find thatour
proposed model could be very helpful in the period of large economic downturns
for short-termforecast. Simulation and empirical results seem to support the
use of our proposed LSTM withU-MIDAS scheme to nowcasting application.
arXiv link: http://arxiv.org/abs/2109.13777v1
Gaussian and Student's $t$ mixture vector autoregressive model with application to the effects of the Euro area monetary policy shock
distributions is introduced. As its mixture components, our model incorporates
conditionally homoskedastic linear Gaussian vector autoregressions and
conditionally heteroskedastic linear Student's $t$ vector autoregressions. For
a $p$th order model, the mixing weights depend on the full distribution of the
preceding $p$ observations, which leads to attractive practical and theoretical
properties such as ergodicity and full knowledge of the stationary distribution
of $p+1$ consecutive observations. A structural version of the model with
statistically identified shocks is also proposed. The empirical application
studies the effects of the Euro area monetary policy shock. We fit a two-regime
model to the data and find the effects, particularly on inflation, stronger in
the regime that mainly prevails before the Financial crisis than in the regime
that mainly dominates after it. The introduced methods are implemented in the
accompanying R package gmvarkit.
arXiv link: http://arxiv.org/abs/2109.13648v4
bqror: An R package for Bayesian Quantile Regression in Ordinal Models
regression for ordinal models introduced in Rahman (2016). The paper classifies
ordinal models into two types and offers computationally efficient, yet simple,
Markov chain Monte Carlo (MCMC) algorithms for estimating ordinal quantile
regression. The generic ordinal model with 3 or more outcomes (labeled ORI
model) is estimated by a combination of Gibbs sampling and Metropolis-Hastings
algorithm. Whereas an ordinal model with exactly 3 outcomes (labeled ORII
model) is estimated using Gibbs sampling only. In line with the Bayesian
literature, we suggest using marginal likelihood for comparing alternative
quantile regression models and explain how to compute the same. The models and
their estimation procedures are illustrated via multiple simulation studies and
implemented in two applications. The article also describes several other
functions contained within the bqror package, which are necessary for
estimation, inference, and assessing model fit.
arXiv link: http://arxiv.org/abs/2109.13606v3
Assessing Outcome-to-Outcome Interference in Sibling Fixed Effects Models
effects while offsetting unobserved sibling-invariant confounding. However,
treatment estimates are biased if an individual's outcome affects their
sibling's outcome. We propose a robustness test for assessing the presence of
outcome-to-outcome interference in linear two-sibling FE models. We regress a
gain-score--the difference between siblings' continuous outcomes--on both
siblings' treatments and on a pre-treatment observed FE. Under certain
restrictions, the observed FE's partial regression coefficient signals the
presence of outcome-to-outcome interference. Monte Carlo simulations
demonstrated the robustness test under several models. We found that an
observed FE signaled outcome-to-outcome spillover if it was directly associated
with an sibling-invariant confounder of treatments and outcomes, directly
associated with a sibling's treatment, or directly and equally associated with
both siblings' outcomes. However, the robustness test collapsed if the observed
FE was directly but differentially associated with siblings' outcomes or if
outcomes affected siblings' treatments.
arXiv link: http://arxiv.org/abs/2109.13399v2
Design and validation of an index to measure development in rural areas through stakeholder participation
based on a set of 25 demographic, economic, environmental, and social welfare
indicators previously selected through a Delphi approach. Three widely accepted
aggregation methods were then tested: a mixed arithmetic/geometric mean without
weightings for each indicator; a weighted arithmetic mean using the weights
previously generated by the Delphi panel and an aggregation through Principal
Component Analysis. These three methodologies were later applied to 9
Portuguese NUTS III regions, and the results were presented to a group of
experts in rural development who indicated which of the three forms of
aggregation best measured the levels of rural development of the different
territories. Finally, it was concluded that the unweighted arithmetic/geometric
mean was the most accurate methodology for aggregating indicators to create a
Rural Development Index.
arXiv link: http://arxiv.org/abs/2109.12568v1
Periodicity in Cryptocurrency Volatility and Liquidity
cryptocurrencies, Bitcoin and Ether, using data from two centralized exchanges
(Coinbase Pro and Binance) and a decentralized exchange (Uniswap V2). We find
systematic patterns in both volatility and volume across day-of-the-week,
hour-of-the-day, and within the hour. These patterns have grown stronger over
the years and can be related to algorithmic trading and funding times in
futures markets. We also document that price formation mainly takes place on
the centralized exchanges while price adjustments on the decentralized
exchanges can be sluggish.
arXiv link: http://arxiv.org/abs/2109.12142v2
Combining Discrete Choice Models and Neural Networks through Embeddings: Formulation, Interpretability and Performance
choice models using Artificial Neural Networks (ANNs). In particular, we use
continuous vector representations, called embeddings, for encoding categorical
or discrete explanatory variables with a special focus on interpretability and
model transparency. Although embedding representations within the logit
framework have been conceptualized by Pereira (2019), their dimensions do not
have an absolute definitive meaning, hence offering limited behavioral insights
in this earlier work. The novelty of our work lies in enforcing
interpretability to the embedding vectors by formally associating each of their
dimensions to a choice alternative. Thus, our approach brings benefits much
beyond a simple parsimonious representation improvement over dummy encoding, as
it provides behaviorally meaningful outputs that can be used in travel demand
analysis and policy decisions. Additionally, in contrast to previously
suggested ANN-based Discrete Choice Models (DCMs) that either sacrifice
interpretability for performance or are only partially interpretable, our
models preserve interpretability of the utility coefficients for all the input
variables despite being based on ANN principles. The proposed models were
tested on two real world datasets and evaluated against benchmark and baseline
models that use dummy-encoding. The results of the experiments indicate that
our models deliver state-of-the-art predictive performance, outperforming
existing ANN-based models while drastically reducing the number of required
network parameters.
arXiv link: http://arxiv.org/abs/2109.12042v2
Linear Panel Regressions with Two-Way Unobserved Heterogeneity
an unknown smooth function of two-way unobserved fixed effects. In standard
additive or interactive fixed effect models the individual specific and time
specific effects are assumed to enter with a known functional form (additive or
multiplicative). In this paper, we allow for this functional form to be more
general and unknown. We discuss two different estimation approaches that allow
consistent estimation of the regression parameters in this setting as the
number of individuals and the number of time periods grow to infinity. The
first approach uses the interactive fixed effect estimator in Bai (2009), which
is still applicable here, as long as the number of factors in the estimation
grows asymptotically. The second approach first discretizes the two-way
unobserved heterogeneity (similar to what Bonhomme, Lamadon and Manresa 2021
are doing for one-way heterogeneity) and then estimates a simple linear fixed
effect model with additive two-way grouped fixed effects. For both estimation
methods we obtain asymptotic convergence results, perform Monte Carlo
simulations, and employ the estimators in an empirical application to UK house
price data.
arXiv link: http://arxiv.org/abs/2109.11911v3
Treatment Effects in Market Equilibrium
taking into account both the direct benefit of the treatment and any spillovers
induced by changes to the market equilibrium. The standard way to address these
challenges is to evaluate interventions via cluster-randomized experiments,
where each cluster corresponds to an isolated market. This approach, however,
cannot be used when we only have access to a single market (or a small number
of markets). Here, we show how to identify and estimate policy-relevant
treatment effects using a unit-level randomized trial run within a single large
market. A standard Bernoulli-randomized trial allows consistent estimation of
direct effects, and of treatment heterogeneity measures that can be used for
welfare-improving targeting. Estimating spillovers - as well as providing
confidence intervals for the direct effect - requires estimates of price
elasticities, which we provide using an augmented experimental design. Our
results rely on all spillovers being mediated via the (observed) prices of a
finite number of traded goods, and the market power of any single unit decaying
as the market gets large. We illustrate our results using a simulation
calibrated to a conditional cash transfer experiment in the Philippines.
arXiv link: http://arxiv.org/abs/2109.11647v4
A Wavelet Method for Panel Models with Jump Discontinuities in the Parameters
exists for univariate time series, research on large panel data models has not
been as extensive. In this paper, a novel method for estimating panel models
with multiple structural changes is proposed. The breaks are allowed to occur
at unknown points in time and may affect the multivariate slope parameters
individually. Our method adapts Haar wavelets to the structure of the observed
variables in order to detect the change points of the parameters consistently.
We also develop methods to address endogenous regressors within our modeling
framework. The asymptotic property of our estimator is established. In our
application, we examine the impact of algorithmic trading on standard measures
of market quality such as liquidity and volatility over a time period that
covers the financial meltdown that began in 2007. We are able to detect jumps
in regression slope parameters automatically without using ad-hoc subsample
selection criteria.
arXiv link: http://arxiv.org/abs/2109.10950v1
Algorithms for Inference in SVARs Identified with Sign and Zero Restrictions
autoregressions that are set-identified with sign and zero restrictions by
showing that the system of restrictions is equivalent to a system of sign
restrictions in a lower-dimensional space. Consequently, algorithms applicable
under sign restrictions can be extended to allow for zero restrictions.
Specifically, I extend algorithms proposed in Amir-Ahmadi and Drautzburg (2021)
to check whether the identified set is nonempty and to sample from the
identified set without rejection sampling. I compare the new algorithms to
alternatives by applying them to variations of the model considered by Arias et
al. (2019), who estimate the effects of US monetary policy using sign and zero
restrictions on the monetary policy reaction function. The new algorithms are
particularly useful when a rich set of sign restrictions substantially
truncates the identified set given the zero restrictions.
arXiv link: http://arxiv.org/abs/2109.10676v2
A new look at the anthropogenic global warming consensus: an econometric forecast based on the ARIMA model of paleoclimate series
paleotemperature time series model and compare it with the prevailing
consensus. The ARIMA - Autoregressive Integrated Moving Average Process model
was used for this purpose. The results show that the parameter estimates of the
model were below what is established by the anthropogenic current and
governmental organs, such as the IPCC (UN), considering a 100-year scenario,
which suggests a period of temperature reduction and a probable cooling. Thus,
we hope with this study to contribute to the discussion by adding a statistical
element of paleoclimate in counterpoint to the current scientific consensus and
place the debate in a long-term historical dimension, in line with other
existing research on the topic.
arXiv link: http://arxiv.org/abs/2109.10419v2
Modeling and Analysis of Discrete Response Data: Applications to Public Opinion on Marijuana Legalization in the United States
variable models, namely discrete choice models, where the dependent (response
or outcome) variable takes values which are discrete, inherently ordered, and
characterized by an underlying continuous latent variable. Within this setting,
the dependent variable may take only two discrete values (such as 0 and 1)
giving rise to binary models (e.g., probit and logit models) or more than two
values (say $j=1,2, \ldots, J$, where $J$ is some integer, typically small)
giving rise to ordinal models (e.g., ordinal probit and ordinal logit models).
In these models, the primary goal is to model the probability of
responses/outcomes conditional on the covariates. We connect the outcomes of a
discrete choice model to the random utility framework in economics, discuss
estimation techniques, present the calculation of covariate effects and
measures to assess model fitting. Some recent advances in discrete data
modeling are also discussed. Following the theoretical review, we utilize the
binary and ordinal models to analyze public opinion on marijuana legalization
and the extent of legalization -- a socially relevant but controversial topic
in the United States. We obtain several interesting results including that past
use of marijuana, belief about legalization and political partisanship are
important factors that shape the public opinion.
arXiv link: http://arxiv.org/abs/2109.10122v2
Unifying Design-based Inference: On Bounding and Estimating the Variance of any Linear Estimator in any Experimental Design
in experimental analysis. Results are applicable to virtually any combination
of experimental design, linear estimator (e.g., difference-in-means, OLS, WLS)
and variance bound, allowing for unified treatment and a basis for systematic
study and comparison of designs using matrix spectral analysis. A proposed
variance estimator reproduces Eicker-Huber-White (aka. "robust",
"heteroskedastic consistent", "sandwich", "White", "Huber-White", "HC", etc.)
standard errors and "cluster-robust" standard errors as special cases. While
past work has shown algebraic equivalences between design-based and the
so-called "robust" standard errors under some designs, this paper motivates
them for a wide array of design-estimator-bound triplets. In so doing, it
provides a clearer and more general motivation for variance estimators.
arXiv link: http://arxiv.org/abs/2109.09220v1
Tests for Group-Specific Heterogeneity in High-Dimensional Factor Models
large set of variables could be modeled using a small number of latent factors
that affect all variables. In many relevant applications in economics and
finance, heterogenous comovements specific to some known groups of variables
naturally arise, and reflect distinct cyclical movements within those groups.
This paper develops two new statistical tests that can be used to investigate
whether there is evidence supporting group-specific heterogeneity in the data.
The first test statistic is designed for the alternative hypothesis of
group-specific heterogeneity appearing in at least one pair of groups; the
second is for the alternative of group-specific heterogeneity appearing in all
pairs of groups. We show that the second moment of factor loadings changes
across groups when heterogeneity is present, and use this feature to establish
the theoretical validity of the tests. We also propose and prove the validity
of a permutation approach for approximating the asymptotic distributions of the
two test statistics. The simulations and the empirical financial application
indicate that the proposed tests are useful for detecting group-specific
heterogeneity.
arXiv link: http://arxiv.org/abs/2109.09049v2
Composite Likelihood for Stochastic Migration Model with Unobserved Factor
method for the stochastic factor ordered Probit model of credit rating
transitions of firms. This model is recommended for internal credit risk
assessment procedures in banks and financial institutions under the Basel III
regulations. Its exact likelihood function involves a high-dimensional
integral, which can be approximated numerically before maximization. However,
the estimated migration risk and required capital tend to be sensitive to the
quality of this approximation, potentially leading to statistical regulatory
arbitrage. The proposed conditional MCL estimator circumvents this problem and
maximizes the composite log-likelihood of the factor ordered Probit model. We
present three conditional MCL estimators of different complexity and examine
their consistency and asymptotic normality when n and T tend to infinity. The
performance of these estimators at finite T is examined and compared with a
granularity-based approach in a simulation study. The use of the MCL estimator
is also illustrated in an empirical application.
arXiv link: http://arxiv.org/abs/2109.09043v2
Estimations of the Local Conditional Tail Average Treatment Effect
difference between the conditional tail expectations of potential outcomes,
which can capture heterogeneity and deliver aggregated local information on
treatment effects over different quantile levels and is closely related to the
notion of second-order stochastic dominance and the Lorenz curve. These
properties render it a valuable tool for policy evaluation. In this paper, we
study estimation of the CTATE locally for a group of compliers (local CTATE or
LCTATE) under the two-sided noncompliance framework. We consider a
semiparametric treatment effect framework under endogeneity for the LCTATE
estimation using a newly introduced class of consistent loss functions jointly
for the conditional tail expectation and quantile. We establish the asymptotic
theory of our proposed LCTATE estimator and provide an efficient algorithm for
its implementation. We then apply the method to evaluate the effects of
participating in programs under the Job Training Partnership Act in the US.
arXiv link: http://arxiv.org/abs/2109.08793v3
Regression Discontinuity Design with Potentially Many Covariates
regression discontinuity design (RDD) analysis. In particular, we propose
estimation and inference methods for the RDD models with covariate selection
which perform stably regardless of the number of covariates. The proposed
methods combine the local approach using kernel weights with
$\ell_{1}$-penalization to handle high-dimensional covariates. We provide
theoretical and numerical results which illustrate the usefulness of the
proposed methods. Theoretically, we present risk and coverage properties for
our point estimation and inference methods, respectively. Under certain special
case, the proposed estimator becomes more efficient than the conventional
covariate adjusted estimator at the cost of an additional sparsity condition.
Numerically, our simulation experiments and empirical example show the robust
behaviors of the proposed methods to the number of covariates in terms of bias
and variance for point estimation and coverage probability and interval length
for inference.
arXiv link: http://arxiv.org/abs/2109.08351v4
Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling
identification in the bandit literature -- proposed by Kasy and Sautmann (2021)
for adaptive experimental design. Theorem 1 of Kasy and Sautmann (2021)
provides three asymptotic results that give theoretical guarantees for
exploration sampling developed for this setting. We first show that the proof
of Theorem 1 (1) has technical issues, and the proof and statement of Theorem 1
(2) are incorrect. We then show, through a counterexample, that Theorem 1 (3)
is false. For the former two, we correct the statements and provide rigorous
proofs. For Theorem 1 (3), we propose an alternative objective function, which
we call posterior weighted policy regret, and derive the asymptotic optimality
of exploration sampling.
arXiv link: http://arxiv.org/abs/2109.08229v5
Short and Simple Confidence Intervals when the Directions of Some Effects are Known
presence of nuisance parameters when some of the nuisance parameters have known
signs. The confidence intervals are adaptive in the sense that they tend to be
short at and near the points where the nuisance parameters are equal to zero.
We focus our results primarily on the practical problem of inference on a
coefficient of interest in the linear regression model when it is unclear
whether or not it is necessary to include a subset of control variables whose
partial effects on the dependent variable have known directions (signs). Our
confidence intervals are trivial to compute and can provide significant length
reductions relative to standard confidence intervals in cases for which the
control variables do not have large effects. At the same time, they entail
minimal length increases at any parameter values. We prove that our confidence
intervals are asymptotically valid uniformly over the parameter space and
illustrate their length properties in an empirical application to a factorial
design field experiment and a Monte Carlo study calibrated to the empirical
application.
arXiv link: http://arxiv.org/abs/2109.08222v1
Standard Errors for Calibrated Parameters
match certain empirical moments, can be viewed as minimum distance estimation.
Existing standard error formulas for such estimators require a consistent
estimate of the correlation structure of the empirical moments, which is often
unavailable in practice. Instead, the variances of the individual empirical
moments are usually readily estimable. Using only these variances, we derive
conservative standard errors and confidence intervals for the structural
parameters that are valid even under the worst-case correlation structure. In
the over-identified case, we show that the moment weighting scheme that
minimizes the worst-case estimator variance amounts to a moment selection
problem with a simple solution. Finally, we develop tests of over-identifying
or parameter restrictions. We apply our methods empirically to a model of menu
cost pricing for multi-product firms and to a heterogeneous agent New Keynesian
model.
arXiv link: http://arxiv.org/abs/2109.08109v3
Structural Estimation of Matching Markets with Transferable Utility
matching markets with transferable utility.
arXiv link: http://arxiv.org/abs/2109.07932v1
Semi-parametric estimation of the EASI model: Welfare implications of taxes identifying clusters due to unobserved preference heterogeneity
index (EASI) model, and analyze welfare implications due to price changes
caused by taxes. Our inferential framework is based on a non-parametric
specification of the stochastic errors in the EASI incomplete demand system
using Dirichlet processes. Our proposal enables to identify consumer clusters
due to unobserved preference heterogeneity taking into account, censoring,
simultaneous endogeneity and non-linearities. We perform an application based
on a tax on electricity consumption in the Colombian economy. Our results
suggest that there are four clusters due to unobserved preference
heterogeneity; although 95% of our sample belongs to one cluster. This suggests
that observable variables describe preferences in a good way under the EASI
model in our application. We find that utilities seem to be inelastic normal
goods with non-linear Engel curves. Joint predictive distributions indicate
that electricity tax generates substitution effects between electricity and
other non-utility goods. These distributions as well as Slutsky matrices
suggest good model assessment. We find that there is a 95% probability that the
equivalent variation as percentage of income of the representative household is
between 0.60% to 1.49% given an approximately 1% electricity tariff increase.
However, there are heterogeneous effects with higher socioeconomic strata
facing more welfare losses on average. This highlights the potential remarkable
welfare implications due taxation on inelastic services.
arXiv link: http://arxiv.org/abs/2109.07646v1
Geographic Difference-in-Discontinuities
discontinuities where administrative borders serves as the 'cutoff'.
Identification in this context is difficult since multiple treatments can
change at the cutoff and individuals can easily sort on either side of the
border. This note extends the difference-in-discontinuities framework discussed
in Grembi et. al. (2016) to a geographic setting. The paper formalizes the
identifying assumptions in this context which will allow for the removal of
time-invariant sorting and compound-treatments similar to the
difference-in-differences methodology.
arXiv link: http://arxiv.org/abs/2109.07406v2
Bayesian hierarchical analysis of a multifaceted program against extreme poverty
developing countries gave encouraging results, but with important heterogeneity
between countries. This master thesis proposes to study this heterogeneity with
a Bayesian hierarchical analysis. The analysis we carry out with two different
hierarchical models leads to a very low amount of pooling of information
between countries, indicating that this observed heterogeneity should be
interpreted mostly as true heterogeneity, and not as sampling error. We analyze
the first order behavior of our hierarchical models, in order to understand
what leads to this very low amount of pooling. We try to give to this work a
didactic approach, with an introduction of Bayesian analysis and an explanation
of the different modeling and computational choices of our analysis.
arXiv link: http://arxiv.org/abs/2109.06759v1
Policy Optimization Using Semi-parametric Models for Dynamic Pricing
market value of a product is linear in its observed features plus some market
noise. Products are sold one at a time, and only a binary response indicating
success or failure of a sale is observed. Our model setting is similar to
Javanmard and Nazerzadeh [2019] except that we expand the demand curve to a
semiparametric model and need to learn dynamically both parametric and
nonparametric components. We propose a dynamic statistical learning and
decision-making policy that combines semiparametric estimation from a
generalized linear model with an unknown link and online decision-making to
minimize regret (maximize revenue). Under mild conditions, we show that for a
market noise c.d.f. $F(\cdot)$ with $m$-th order derivative ($m\geq 2$), our
policy achieves a regret upper bound of $O_{d}(T^{2m+1{4m-1}})$,
where $T$ is time horizon and $O_{d}$ is the order that hides
logarithmic terms and the dimensionality of feature $d$. The upper bound is
further reduced to $O_{d}(T)$ if $F$ is super smooth whose
Fourier transform decays exponentially. In terms of dependence on the horizon
$T$, these upper bounds are close to $\Omega(T)$, the lower bound where
$F$ belongs to a parametric class. We further generalize these results to the
case with dynamically dependent product features under the strong mixing
condition.
arXiv link: http://arxiv.org/abs/2109.06368v2
Nonparametric Estimation of Truncated Conditional Expectation Functions
range of economic applications, including income inequality measurement,
financial risk management, and impact evaluation. They typically involve
truncating the outcome variable above or below certain quantiles of its
conditional distribution. In this paper, based on local linear methods, a
novel, two-stage, nonparametric estimator of such functions is proposed. In
this estimation problem, the conditional quantile function is a nuisance
parameter that has to be estimated in the first stage. The proposed estimator
is insensitive to the first-stage estimation error owing to the use of a
Neyman-orthogonal moment in the second stage. This construction ensures that
inference methods developed for the standard nonparametric regression can be
readily adapted to conduct inference on truncated conditional expectations. As
an extension, estimation with an estimated truncation quantile level is
considered. The proposed estimator is applied in two empirical settings: sharp
regression discontinuity designs with a manipulated running variable and
randomized experiments with sample selection.
arXiv link: http://arxiv.org/abs/2109.06150v1
Estimating a new panel MSK dataset for comparative analyses of national absorptive capacity systems, economic growth, and development in low and middle income economies
severely lacking for developing economies. Particularly, the low- and
middle-income countries (LMICs) eligible for the World Bank's International
Development Association (IDA) support, are rarely part of any empirical
discourse on growth, development, and innovation. One major issue hindering
panel analyses in LMICs, and thus them being subject to any empirical
discussion, is the lack of complete data availability. This work offers a new
complete panel dataset with no missing values for LMICs eligible for IDA's
support. I use a standard, widely respected multiple imputation technique
(specifically, Predictive Mean Matching) developed by Rubin (1987). This
technique respects the structure of multivariate continuous panel data at the
country level. I employ this technique to create a large dataset consisting of
many variables drawn from publicly available established sources. These
variables, in turn, capture six crucial country-level capacities: technological
capacity, financial capacity, human capital capacity, infrastructural capacity,
public policy capacity, and social capacity. Such capacities are part and
parcel of the National Absorptive Capacity Systems (NACS). The dataset (MSK
dataset) thus produced contains data on 47 variables for 82 LMICs between 2005
and 2019. The dataset has passed a quality and reliability check and can thus
be used for comparative analyses of national absorptive capacities and
development, transition, and convergence analyses among LMICs.
arXiv link: http://arxiv.org/abs/2109.05529v1
{did2s}: Two-Stage Difference-in-Differences
difference-in-differences models when treatment timing occurs at different
times for different units. This article introduces the R package did2s which
implements the estimator introduced in Gardner (2021). The article provides an
approachable review of the underlying econometric theory and introduces the
syntax for the function did2s. Further, the package introduces a function,
event_study, that provides a common syntax for all the modern event-study
estimators and plot_event_study to plot the results of each estimator.
arXiv link: http://arxiv.org/abs/2109.05913v2
Implicit Copulas: An Overview
high dimensions. This broad class of copulas is introduced and surveyed,
including elliptical copulas, skew $t$ copulas, factor copulas, time series
copulas and regression copulas. The common auxiliary representation of implicit
copulas is outlined, and how this makes them both scalable and tractable for
statistical modeling. Issues such as parameter identification, extended
likelihoods for discrete or mixed data, parsimony in high dimensions, and
simulation from the copula model are considered. Bayesian approaches to
estimate the copula parameters, and predict from an implicit copula model, are
outlined. Particular attention is given to implicit copula processes
constructed from time series and regression models, which is at the forefront
of current research. Two econometric applications -- one from macroeconomic
time series and the other from financial asset pricing -- illustrate the
advantages of implicit copula models.
arXiv link: http://arxiv.org/abs/2109.04718v1
Variable Selection for Causal Inference via Outcome-Adaptive Random Forest
control for self-selection. This selection is based on confounding variables
that affect the treatment assignment and the outcome. Propensity score methods
aim to correct for confounding. However, not all covariates are confounders. We
propose the outcome-adaptive random forest (OARF) that only includes desirable
variables for estimating the propensity score to decrease bias and variance.
Our approach works in high-dimensional datasets and if the outcome and
propensity score model are non-linear and potentially complicated. The OARF
excludes covariates that are not associated with the outcome, even in the
presence of a large number of spurious variables. Simulation results suggest
that the OARF produces unbiased estimates, has a smaller variance and is
superior in variable selection compared to other approaches. The results from
two empirical examples, the effect of right heart catheterization on mortality
and the effect of maternal smoking during pregnancy on birth weight, show
comparable treatment effects to previous findings but tighter confidence
intervals and more plausible selected variables.
arXiv link: http://arxiv.org/abs/2109.04154v1
Some Impossibility Results for Inference With Cluster Dependence with Large Clusters
structure and presents two main impossibility results. First, we show that when
there is only one large cluster, i.e., the researcher does not have any
knowledge on the dependence structure of the observations, it is not possible
to consistently discriminate the mean. When within-cluster observations satisfy
the uniform central limit theorem, we also show that a sufficient condition for
consistent $n$-discrimination of the mean is that we have at least two
large clusters. This result shows some limitations for inference when we lack
information on the dependence structure of observations. Our second result
provides a necessary and sufficient condition for the cluster structure that
the long run variance is consistently estimable. Our result implies that when
there is at least one large cluster, the long run variance is not consistently
estimable.
arXiv link: http://arxiv.org/abs/2109.03971v4
On the estimation of discrete choice models to capture irrational customer behaviors
estimate consumer choice behavior. However, behavioral economics has provided
strong empirical evidence of irrational choice behavior, such as halo effects,
that are incompatible with this framework. Models belonging to the Random
Utility Maximization family may therefore not accurately capture such
irrational behavior. Hence, more general choice models, overcoming such
limitations, have been proposed. However, the flexibility of such models comes
at the price of increased risk of overfitting. As such, estimating such models
remains a challenge. In this work, we propose an estimation method for the
recently proposed Generalized Stochastic Preference choice model, which
subsumes the family of Random Utility Maximization models and is capable of
capturing halo effects. Specifically, we show how to use partially-ranked
preferences to efficiently model rational and irrational customer types from
transaction data. Our estimation procedure is based on column generation, where
relevant customer types are efficiently extracted by expanding a tree-like data
structure containing the customer behaviors. Further, we propose a new
dominance rule among customer types whose effect is to prioritize low orders of
interactions among products. An extensive set of experiments assesses the
predictive accuracy of the proposed approach. Our results show that accounting
for irrational preferences can boost predictive accuracy by 12.5% on average,
when tested on a real-world dataset from a large chain of grocery and drug
stores.
arXiv link: http://arxiv.org/abs/2109.03882v1
On a quantile autoregressive conditional duration model applied to high-frequency financial data
with data arising from times between two successive events. These models are
usually specified in terms of a time-varying conditional mean or median
duration. In this paper, we relax this assumption and consider a conditional
quantile approach to facilitate the modeling of different percentiles. The
proposed ACD quantile model is based on a skewed version of Birnbaum-Saunders
distribution, which provides better fitting of the tails than the traditional
Birnbaum-Saunders distribution, in addition to advancing the implementation of
an expectation conditional maximization (ECM) algorithm. A Monte Carlo
simulation study is performed to assess the behavior of the model as well as
the parameter estimation method and to evaluate a form of residual. A real
financial transaction data set is finally analyzed to illustrate the proposed
approach.
arXiv link: http://arxiv.org/abs/2109.03844v1
Approximate Factor Models with Weaker Loadings
characteristic of economic data and the approximate factor model provides a
useful framework for analysis. Assuming a strong factor structure where
$\Lop\Lo/N^\alpha$ is positive definite in the limit when $\alpha=1$, early
work established convergence of the principal component estimates of the
factors and loadings up to a rotation matrix. This paper shows that the
estimates are still consistent and asymptotically normal when $\alpha\in(0,1]$
albeit at slower rates and under additional assumptions on the sample size. The
results hold whether $\alpha$ is constant or varies across factor loadings. The
framework developed for heterogeneous loadings and the simplified proofs that
can be also used in strong factor analysis are of independent interest.
arXiv link: http://arxiv.org/abs/2109.03773v4
The multilayer architecture of the global input-output network and its properties
using sectoral trade data (WIOD, 2016 release). With a focus on the mesoscale
structure and related properties, our multilayer analysis takes into
consideration the splitting into industry-based layers in order to catch more
peculiar relationships between countries that cannot be detected from the
analysis of the single-layer aggregated network. We can identify several large
international communities in which some countries trade more intensively in
some specific layers. However, interestingly, our results show that these
clusters can restructure and evolve over time. In general, not only their
internal composition changes, but the centrality rankings of the members inside
are also reordered, industries from some countries diminishing their role and
others from other countries growing importance. These changes in the large
international clusters may reflect the outcomes and the dynamics of
cooperation, partner selection and competition among industries and among
countries in the global input-output network.
arXiv link: http://arxiv.org/abs/2109.02946v2
Semiparametric Estimation of Treatment Effects in Randomized Experiments
focus on settings where the outcome distributions may be thick tailed, where
treatment effects may be small, where sample sizes are large and where
assignment is completely random. This setting is of particular interest in
recent online experimentation. We propose using parametric models for the
treatment effects, leading to semiparametric models for the outcome
distributions. We derive the semiparametric efficiency bound for the treatment
effects for this setting, and propose efficient estimators. In the leading case
with constant quantile treatment effects one of the proposed efficient
estimators has an interesting interpretation as a weighted average of quantile
treatment effects, with the weights proportional to minus the second derivative
of the log of the density of the potential outcomes. Our analysis also suggests
an extension of Huber's model and trimmed mean to include asymmetry.
arXiv link: http://arxiv.org/abs/2109.02603v2
Optimal transport weights for causal inference
effects. Weighting methods attempt to correct this imbalance but rely on
specifying models for the treatment assignment mechanism, which is unknown in
observational studies. This leaves researchers to choose the proper weighting
method and the appropriate covariate functions for these models without knowing
the correct combination to achieve distributional balance. In response to these
difficulties, we propose a nonparametric generalization of several other
weighting schemes found in the literature: Causal Optimal Transport. This new
method directly targets distributional balance by minimizing optimal transport
distances between treatment and control groups or, more generally, between any
source and target population. Our approach is semiparametrically efficient and
model-free but can also incorporate moments or any other important functions of
covariates that a researcher desires to balance. Moreover, our method can
provide nonparametric estimate the conditional mean outcome function and we
give rates for the convergence of this estimator. Moreover, we show how this
method can provide nonparametric imputations of the missing potential outcomes
and give rates of convergence for this estimator. We find that Causal Optimal
Transport outperforms competitor methods when both the propensity score and
outcome models are misspecified, indicating it is a robust alternative to
common weighting methods. Finally, we demonstrate the utility of our method in
an external control trial examining the effect of misoprostol versus oxytocin
for the treatment of post-partum hemorrhage.
arXiv link: http://arxiv.org/abs/2109.01991v4
A Framework for Using Value-Added in Regressions
estimated value-added (VA) measures are now widely used as dependent or
explanatory variables in regressions. For example, VA is used as an explanatory
variable when examining the relationship between teacher VA and students'
long-run outcomes. Due to the multi-step nature of VA estimation, the standard
errors (SEs) researchers routinely use when including VA measures in OLS
regressions are incorrect. In this paper, I show that the assumptions
underpinning VA models naturally lead to a generalized method of moments (GMM)
framework. Using this insight, I construct correct SEs' for regressions that
use VA as an explanatory variable and for regressions where VA is the outcome.
In addition, I identify the causes of incorrect SEs when using OLS, discuss the
need to adjust SEs under different sets of assumptions, and propose a more
efficient estimator for using VA as an explanatory variable. Finally, I
illustrate my results using data from North Carolina, and show that correcting
SEs results in an increase that is larger than the impact of clustering SEs.
arXiv link: http://arxiv.org/abs/2109.01741v3
Dynamic Games in Empirical Industrial Organization
empirical applications. Section 2 presents the theoretical framework,
introduces the concept of Markov Perfect Nash Equilibrium, discusses existence
and multiplicity, and describes the representation of this equilibrium in terms
of conditional choice probabilities. We also discuss extensions of the basic
framework, including models in continuous time, the concepts of oblivious
equilibrium and experience-based equilibrium, and dynamic games where firms
have non-equilibrium beliefs. In section 3, we first provide an overview of the
types of data used in this literature, before turning to a discussion of
identification issues and results, and estimation methods. We review different
methods to deal with multiple equilibria and large state spaces. We also
describe recent developments for estimating games in continuous time and
incorporating serially correlated unobservables, and discuss the use of machine
learning methods to solving and estimating dynamic games. Section 4 discusses
empirical applications of dynamic games in IO. We start describing the first
empirical applications in this literature during the early 2000s. Then, we
review recent applications dealing with innovation, antitrust and mergers,
dynamic pricing, regulation, product repositioning, advertising, uncertainty
and investment, airline network competition, dynamic matching, and natural
resources. We conclude with our view of the progress made in this literature
and the remaining challenges.
arXiv link: http://arxiv.org/abs/2109.01725v2
How to Detect Network Dependence in Latent Factor Models? A Bias-Corrected CD Test
the CD test proposed by Pesaran (2004) to residuals from panels with latent
factors results in over-rejection. They propose a randomized test statistic to
correct for over-rejection, and add a screening component to achieve power.
This paper considers the same problem but from a different perspective, and
shows that the standard CD test remains valid if the latent factors are weak in
the sense the strength is less than half. In the case where latent factors are
strong, we propose a bias-corrected version, CD*, which is shown to be
asymptotically standard normal under the null of error cross-sectional
independence and have power against network type alternatives. This result is
shown to hold for pure latent factor models as well as for panel regression
models with latent factors. The case where the errors are serially correlated
is also considered. Small sample properties of the CD* test are investigated by
Monte Carlo experiments and are shown to have the correct size for strong and
weak factors as well as for Gaussian and non-Gaussian errors. In contrast, it
is found that JR's test tends to over-reject in the case of panels with
non-Gaussian errors, and has low power against spatial network alternatives. In
an empirical application, using the CD* test, it is shown that there remains
spatial error dependence in a panel data model for real house price changes
across 377 Metropolitan Statistical Areas in the U.S., even after the effects
of latent factors are filtered out.
arXiv link: http://arxiv.org/abs/2109.00408v7
Matching Theory and Evidence on Covid-19 using a Stochastic Network SIR Model
empirical analysis of the Covid-19 pandemic. It derives moment conditions for
the number of infected and active cases for single as well as multigroup
epidemic models. These moment conditions are used to investigate the
identification and estimation of the transmission rates. The paper then
proposes a method that jointly estimates the transmission rate and the
magnitude of under-reporting of infected cases. Empirical evidence on six
European countries matches the simulated outcomes once the under-reporting of
infected cases is addressed. It is estimated that the number of actual cases
could be between 4 to 10 times higher than the reported numbers in October 2020
and declined to 2 to 3 times in April 2021. The calibrated models are used in
the counterfactual analyses of the impact of social distancing and vaccination
on the epidemic evolution, and the timing of early interventions in the UK and
Germany.
arXiv link: http://arxiv.org/abs/2109.00321v2
A generalized bootstrap procedure of the standard error and confidence interval estimation for inverse probability of treatment weighting
used in propensity score analysis to infer causal effects in regression models.
Due to oversized IPTW weights and errors associated with propensity score
estimation, the IPTW approach can underestimate the standard error of causal
effect. To remediate this, bootstrap standard errors have been recommended to
replace the IPTW standard error, but the ordinary bootstrap (OB) procedure
might still result in underestimation of the standard error because of its
inefficient sampling algorithm and un-stabilized weights. In this paper, we
develop a generalized bootstrap (GB) procedure for estimating the standard
error of the IPTW approach. Compared with the OB procedure, the GB procedure
has much lower risk of underestimating the standard error and is more efficient
for both point and standard error estimates. The GB procedure also has smaller
risk of standard error underestimation than the ordinary bootstrap procedure
with trimmed weights, with comparable efficiencies. We demonstrate the
effectiveness of the GB procedure via a simulation study and a dataset from the
National Educational Longitudinal Study-1988 (NELS-88).
arXiv link: http://arxiv.org/abs/2109.00171v1
Look Who's Talking: Interpretable Machine Learning for Assessing Italian SMEs Credit Default
attention to Machine Learning algorithms due to their power to solve complex
learning tasks. In the field of firms' default prediction, however, the lack of
interpretability has prevented the extensive adoption of the black-box type of
models. To overcome this drawback and maintain the high performances of
black-boxes, this paper relies on a model-agnostic approach. Accumulated Local
Effects and Shapley values are used to shape the predictors' impact on the
likelihood of default and rank them according to their contribution to the
model outcome. Prediction is achieved by two Machine Learning algorithms
(eXtreme Gradient Boosting and FeedForward Neural Network) compared with three
standard discriminant models. Results show that our analysis of the Italian
Small and Medium Enterprises manufacturing industry benefits from the overall
highest classification power by the eXtreme Gradient Boosting algorithm without
giving up a rich interpretation framework.
arXiv link: http://arxiv.org/abs/2108.13914v2
Wild Bootstrap for Instrumental Variables Regressions with Weak and Few Clusters
in the framework of a small number of large clusters in which the number of
clusters is viewed as fixed and the number of observations for each cluster
diverges to infinity. We first show that the wild bootstrap Wald test, with or
without using the cluster-robust covariance estimator, controls size
asymptotically up to a small error as long as the parameters of endogenous
variables are strongly identified in at least one of the clusters. Then, we
establish the required number of strong clusters for the test to have power
against local alternatives. We further develop a wild bootstrap Anderson-Rubin
test for the full-vector inference and show that it controls size
asymptotically up to a small error even under weak or partial identification in
all clusters. We illustrate the good finite sample performance of the new
inference methods using simulations and provide an empirical application to a
well-known dataset about US local labor markets.
arXiv link: http://arxiv.org/abs/2108.13707v5
Dynamic Selection in Algorithmic Decision-making
learning algorithms with endogenous data. In a contextual multi-armed bandit
model, a novel bias (self-fulfilling bias) arises because the endogeneity of
the data influences the choices of decisions, affecting the distribution of
future data to be collected and analyzed. We propose an
instrumental-variable-based algorithm to correct for the bias. It obtains true
parameter values and attains low (logarithmic-like) regret levels. We also
prove a central limit theorem for statistical inference. To establish the
theoretical properties, we develop a general technique that untangles the
interdependence between data and actions.
arXiv link: http://arxiv.org/abs/2108.12547v3
Revisiting Event Study Designs: Robust and Efficient Estimation
treatment adoption and heterogeneous causal effects. We show that conventional
regression-based estimators fail to provide unbiased estimates of relevant
estimands absent strong restrictions on treatment-effect homogeneity. We then
derive the efficient estimator addressing this challenge, which takes an
intuitive "imputation" form when treatment-effect heterogeneity is
unrestricted. We characterize the asymptotic behavior of the estimator, propose
tools for inference, and develop tests for identifying assumptions. Our method
applies with time-varying controls, in triple-difference designs, and with
certain non-binary treatments. We show the practical relevance of our results
in a simulation study and an application. Studying the consumption response to
tax rebates in the United States, we find that the notional marginal propensity
to consume is between 8 and 11 percent in the first quarter - about half as
large as benchmark estimates used to calibrate macroeconomic models - and
predominantly occurs in the first month after the rebate.
arXiv link: http://arxiv.org/abs/2108.12419v5
Identification of Peer Effects using Panel Data
operating through unobserved individual heterogeneity. The results apply for
general network structures governing peer interactions and allow for correlated
effects. Identification hinges on a conditional mean restriction requiring
exogenous mobility of individuals between groups over time. We apply our method
to surgeon-hospital-year data to study take-up of keyhole surgery for cancer,
finding a positive effect of the average individual heterogeneity of other
surgeons practicing in the same hospital
arXiv link: http://arxiv.org/abs/2108.11545v4
Double Machine Learning and Automated Confounder Selection -- A Cautionary Tale
automated variable selection in high-dimensional settings. Even though the
ability to deal with a large number of potential covariates can render
selection-on-observables assumptions more plausible, there is at the same time
a growing risk that endogenous variables are included, which would lead to the
violation of conditional independence. This paper demonstrates that DML is very
sensitive to the inclusion of only a few "bad controls" in the covariate space.
The resulting bias varies with the nature of the theoretical causal model,
which raises concerns about the feasibility of selecting control variables in a
data-driven way.
arXiv link: http://arxiv.org/abs/2108.11294v4
Continuous Treatment Recommendation with Deep Survival Dose Response Function
problems in settings with clinical survival data, which we call the Deep
Survival Dose Response Function (DeepSDRF). That is, we consider the problem of
learning the conditional average dose response (CADR) function solely from
historical data in which observed factors (confounders) affect both observed
treatment and time-to-event outcomes. The estimated treatment effect from
DeepSDRF enables us to develop recommender algorithms with the correction for
selection bias. We compared two recommender approaches based on random search
and reinforcement learning and found similar performance in terms of patient
outcome. We tested the DeepSDRF and the corresponding recommender on extensive
simulation studies and the eICU Research Institute (eRI) database. To the best
of our knowledge, this is the first time that causal models are used to address
the continuous treatment effect with observational data in a medical context.
arXiv link: http://arxiv.org/abs/2108.10453v5
Feasible Weighted Projected Principal Component Analysis for Factor Models with an Application to Bond Risk Premia
for factor models in which observable characteristics partially explain the
latent factors. This novel method provides more efficient and accurate
estimators than existing methods. To increase estimation efficiency, I take
into account both cross-sectional dependence and heteroskedasticity by using a
consistent estimator of the inverse error covariance matrix as the weight
matrix. To improve accuracy, I employ a projection approach using
characteristics because it removes noise components in high-dimensional factor
analysis. By using the FPPC method, estimators of the factors and loadings have
faster rates of convergence than those of the conventional factor analysis.
Moreover, I propose an FPPC-based diffusion index forecasting model. The
limiting distribution of the parameter estimates and the rate of convergence
for forecast errors are obtained. Using U.S. bond market and macroeconomic
data, I demonstrate that the proposed model outperforms models based on
conventional principal component estimators. I also show that the proposed
model performs well among a large group of machine learning techniques in
forecasting excess bond returns.
arXiv link: http://arxiv.org/abs/2108.10250v3
Inference in high-dimensional regression models without the exact or $L^p$ sparsity
models and high-dimensional IV regression models. Estimation is based on a
combined use of the orthogonal greedy algorithm, high-dimensional Akaike
information criterion, and double/debiased machine learning. The method of
inference for any low-dimensional subvector of high-dimensional parameters is
based on a root-$N$ asymptotic normality, which is shown to hold without
requiring the exact sparsity condition or the $L^p$ sparsity condition.
Simulation studies demonstrate superior finite-sample performance of this
proposed method over those based on the LASSO or the random forest, especially
under less sparse models. We illustrate an application to production analysis
with a panel of Chilean firms.
arXiv link: http://arxiv.org/abs/2108.09520v2
A Maximum Entropy Copula Model for Mixed Data: Representation, Estimation, and Applications
is proposed, which offers the following advantages: (i) it is valid for mixed
random vector. By `mixed' we mean the method works for any combination of
discrete or continuous variables in a fully automated manner; (ii) it yields a
bonafide density estimate with intepretable parameters. By `bonafide' we mean
the estimate guarantees to be a non-negative function, integrates to 1; and
(iii) it plays a unifying role in our understanding of a large class of
statistical methods. Our approach utilizes modern machinery of nonparametric
statistics to represent and approximate log-copula density function via
LP-Fourier transform. Several real-data examples are also provided to explore
the key theoretical and practical implications of the theory.
arXiv link: http://arxiv.org/abs/2108.09438v2
Regression Discontinuity Designs
non-experimental methods for causal inference and program evaluation. Over the
last two decades, statistical and econometric methods for RD analysis have
expanded and matured, and there is now a large number of methodological results
for RD identification, estimation, inference, and validation. We offer a
curated review of this methodological literature organized around the two most
popular frameworks for the analysis and interpretation of RD designs: the
continuity framework and the local randomization framework. For each framework,
we discuss three main topics: (i) designs and parameters, which focuses on
different types of RD settings and treatment effects of interest; (ii)
estimation and inference, which presents the most popular methods based on
local polynomial regression and analysis of experiments, as well as
refinements, extensions, and alternatives; and (iii) validation and
falsification, which summarizes an array of mostly empirical approaches to
support the validity of RD designs in practice.
arXiv link: http://arxiv.org/abs/2108.09400v2
Efficient Online Estimation of Causal Effects by Deciding What to Observe
available, each capturing a distinct subset of variables. While problem
formulations typically take the data as given, in practice, data acquisition
can be an ongoing process. In this paper, we aim to estimate any functional of
a probabilistic model (e.g., a causal effect) as efficiently as possible, by
deciding, at each time, which data source to query. We propose online moment
selection (OMS), a framework in which structural assumptions are encoded as
moment conditions. The optimal action at each step depends, in part, on the
very moments that identify the functional of interest. Our algorithms balance
exploration with choosing the best action as suggested by current estimates of
the moments. We propose two selection strategies: (1) explore-then-commit
(OMS-ETC) and (2) explore-then-greedy (OMS-ETG), proving that both achieve zero
asymptotic regret as assessed by MSE. We instantiate our setup for average
treatment effect estimation, where structural assumptions are given by a causal
graph and data sources may include subsets of mediators, confounders, and
instrumental variables.
arXiv link: http://arxiv.org/abs/2108.09265v2
A Theoretical Analysis of the Stationarity of an Unrestricted Autoregression Process
econometric processes relatively generically if they incorporate the
heterogeneity in dependence on times. This paper analyzes the stationarity of
an autoregressive process of dimension $k>1$ having a sequence of coefficients
$\beta$ multiplied by successively increasing powers of $0<\delta<1$. The
theorem gives the conditions of stationarity in simple relations between the
coefficients and $k$ in terms of $\delta$. Computationally, the evidence of
stationarity depends on the parameters. The choice of $\delta$ sets the bounds
on $\beta$ and the number of time lags for prediction of the model.
arXiv link: http://arxiv.org/abs/2108.09083v1
Causal Inference with Noncompliance and Unknown Interference
social network and they may not comply with the assigned treatments. In
particular, we suppose that the form of network interference is unknown to
researchers. To estimate meaningful causal parameters in this situation, we
introduce a new concept of exposure mapping, which summarizes potentially
complicated spillover effects into a fixed dimensional statistic of
instrumental variables. We investigate identification conditions for the
intention-to-treat effects and the average treatment effects for compliers,
while explicitly considering the possibility of misspecification of exposure
mapping. Based on our identification results, we develop nonparametric
estimation procedures via inverse probability weighting. Their asymptotic
properties, including consistency and asymptotic normality, are investigated
using an approximate neighborhood interference framework. For an empirical
illustration, we apply our method to experimental data on the anti-conflict
intervention school program. The proposed methods are readily available with
the companion R package latenetwork.
arXiv link: http://arxiv.org/abs/2108.07455v5
InfoGram and Admissible Machine Learning
algorithm with superior predictive power may not even be deployable, unless it
is admissible under the regulatory constraints. This has led to great interest
in developing fair, transparent and trustworthy ML methods. The purpose of this
article is to introduce a new information-theoretic learning framework
(admissible machine learning) and algorithmic risk-management tools (InfoGram,
L-features, ALFA-testing) that can guide an analyst to redesign off-the-shelf
ML methods to be regulatory compliant, while maintaining good prediction
accuracy. We have illustrated our approach using several real-data examples
from financial sectors, biomedical research, marketing campaigns, and the
criminal justice system.
arXiv link: http://arxiv.org/abs/2108.07380v2
Density Sharpening: Principles and Applications to Discrete Data Analysis
"Density Sharpening" and applies it to the analysis of discrete count data. The
underlying foundation is based on a new theory of nonparametric approximation
and smoothing methods for discrete distributions which play a useful role in
explaining and uniting a large class of applied statistical methods. The
proposed modeling framework is illustrated using several real applications,
from seismology to healthcare to physics.
arXiv link: http://arxiv.org/abs/2108.07372v3
Dimensionality Reduction and State Space Systems: Forecasting the US Treasury Yields Using Frequentist and Bayesian VARs
frequentist and Bayesian methods after first decomposing the yields of varying
maturities into its unobserved term structure factors. Then, I exploited the
structure of the state-space model to forecast the Treasury yields and compared
the forecast performance of each model using mean squared forecast error. Among
the frequentist methods, I applied the two-step Diebold-Li, two-step principal
components, and one-step Kalman filter approaches. Likewise, I imposed the five
different priors in Bayesian VARs: Diffuse, Minnesota, natural conjugate, the
independent normal inverse: Wishart, and the stochastic search variable
selection priors. After forecasting the Treasury yields for 9 different
forecast horizons, I found that the BVAR with Minnesota prior generally
minimizes the loss function. I augmented the above BVARs by including
macroeconomic variables and constructed impulse response functions with a
recursive ordering identification scheme. Finally, I fitted a sign-restricted
BVAR with dummy observations.
arXiv link: http://arxiv.org/abs/2108.06553v1
Evidence Aggregation for Treatment Choice
a certain local population of interest due to a lack of data, but does have
access to the publicized intervention studies performed for similar policies on
different populations. How should the planner make use of and aggregate this
existing evidence to make her policy decision? Following Manski (2020; Towards
Credible Patient-Centered Meta-Analysis, Epidemiology), we formulate
the planner's problem as a statistical decision problem with a social welfare
objective, and solve for an optimal aggregation rule under the minimax-regret
criterion. We investigate the analytical properties, computational feasibility,
and welfare regret performance of this rule. We apply the minimax regret
decision rule to two settings: whether to enact an active labor market policy
based on 14 randomized control trial studies; and whether to approve a drug
(Remdesivir) for COVID-19 treatment using a meta-database of clinical trials.
arXiv link: http://arxiv.org/abs/2108.06473v2
Identification of Incomplete Preferences
consumers' preferences are not necessarily complete even if only aggregate
choice data is available. Behavior is modeled using an upper and a lower
utility for each alternative so that non-comparability can arise. The
identification region places intuitive bounds on the probability distribution
of upper and lower utilities. We show that the existence of an instrumental
variable can be used to reject the hypothesis that the preferences of all
consumers are complete. We apply our methods to data from the 2018 mid-term
elections in Ohio.
arXiv link: http://arxiv.org/abs/2108.06282v4
A Unified Frequency Domain Cross-Validatory Approach to HAC Standard Error Estimation
obtain a heteroskedasticity and autocorrelation consistent (HAC) standard
error. This method enables model/tuning parameter selection across both
parametric and nonparametric spectral estimators simultaneously. The candidate
class for this approach consists of restricted maximum likelihood-based (REML)
autoregressive spectral estimators and lag-weights estimators with the Parzen
kernel. Additionally, an efficient technique for computing the REML estimators
of autoregressive models is provided. Through simulations, the reliability of
the FDCV method is demonstrated, comparing favorably with popular HAC
estimators such as Andrews-Monahan and Newey-West.
arXiv link: http://arxiv.org/abs/2108.06093v3
An Optimal Transport Approach to Estimating Causal Effects via Nonlinear Difference-in-Differences
multivariate counterfactual distributions in classical treatment and control
study designs with observational data. Our approach sheds a new light on
existing approaches like the changes-in-changes and the classical
semiparametric difference-in-differences estimator and generalizes them to
settings with multivariate heterogeneity in the outcomes. The main benefit of
this extension is that it allows for arbitrary dependence and heterogeneity in
the joint outcomes. We demonstrate its utility both on synthetic and real data.
In particular, we revisit the classical Card & Krueger dataset, examining the
effect of a minimum wage increase on employment in fast food restaurants; a
reanalysis with our method reveals that restaurants tend to substitute
full-time with part-time labor after a minimum wage increase at a faster pace.
A previous version of this work was entitled "An optimal transport approach to
causal inference.
arXiv link: http://arxiv.org/abs/2108.05858v2
Sparse Temporal Disaggregation
enable high-frequency estimates of key economic indicators, such as GDP.
Traditionally, such methods have relied on only a couple of high-frequency
indicator series to produce estimates. However, the prevalence of large, and
increasing, volumes of administrative and alternative data-sources motivates
the need for such methods to be adapted for high-dimensional settings. In this
article, we propose a novel sparse temporal-disaggregation procedure and
contrast this with the classical Chow-Lin method. We demonstrate the
performance of our proposed method through simulation study, highlighting
various advantages realised. We also explore its application to disaggregation
of UK gross domestic product data, demonstrating the method's ability to
operate when the number of potential indicators is greater than the number of
low-frequency observations.
arXiv link: http://arxiv.org/abs/2108.05783v2
Multiway empirical likelihood
for observations indexed by multiple sets of entities. We propose a novel
multiway empirical likelihood statistic that converges to a chi-square
distribution under the non-degenerate case, where corresponding Hoeffding type
decomposition is dominated by linear terms. Our methodology is related to the
notion of jackknife empirical likelihood but the leave-out pseudo values are
constructed by leaving columns or rows. We further develop a modified version
of our multiway empirical likelihood statistic, which converges to a chi-square
distribution regardless of the degeneracy, and discover its desirable
higher-order property compared to the t-ratio by the conventional Eicker-White
type variance estimator. The proposed methodology is illustrated by several
important statistical problems, such as bipartite network, generalized
estimating equations, and three-way observations.
arXiv link: http://arxiv.org/abs/2108.04852v6
Weighted asymmetric least squares regression with fixed-effects
response, which is inadequate to summarize the variable relationships in the
presence of heteroscedasticity. In this paper, we adapt the asymmetric least
squares (expectile) regression to the fixed-effects model and propose a new
model: expectile regression with fixed-effects $(\ERFE).$ The $\ERFE$ model
applies the within transformation strategy to concentrate out the incidental
parameter and estimates the regressor effects on the expectiles of the response
distribution. The $\ERFE$ model captures the data heteroscedasticity and
eliminates any bias resulting from the correlation between the regressors and
the omitted factors. We derive the asymptotic properties of the $\ERFE$
estimators and suggest robust estimators of its covariance matrix. Our
simulations show that the $\ERFE$ estimator is unbiased and outperforms its
competitors. Our real data analysis shows its ability to capture data
heteroscedasticity (see our R package, github.com/AmBarry/erfe).
arXiv link: http://arxiv.org/abs/2108.04737v1
Controlling for Unmeasured Confounding in Panel Data Using Minimal Bridge Functions: From Two-Way Fixed Effects to Factor Models
effects in panel data under a linear factor model with unmeasured confounders.
Compared to other methods tackling factor models such as synthetic controls and
matrix completion, our method does not require the number of time periods to
grow infinitely. Instead, we draw inspiration from the two-way fixed effect
model as a special case of the linear factor model, where a simple
difference-in-differences transformation identifies the effect. We show that
analogous, albeit more complex, transformations exist in the more general
linear factor model, providing a new means to identify the effect in that
model. In fact many such transformations exist, called bridge functions, all
identifying the same causal effect estimand. This poses a unique challenge for
estimation and inference, which we solve by targeting the minimal bridge
function using a regularized estimation approach. We prove that our resulting
average causal effect estimator is root-N consistent and asymptotically normal,
and we provide asymptotically valid confidence intervals. Finally, we provide
extensions for the case of a linear factor model with time-varying unmeasured
confounders.
arXiv link: http://arxiv.org/abs/2108.03849v1
Improving Inference from Simple Instruments through Compliance Estimation
treatment effects in settings where receipt of treatment is not fully random,
but there exists an instrument that generates exogenous variation in treatment
exposure. While IV can recover consistent treatment effect estimates, they are
often noisy. Building upon earlier work in biostatistics (Joffe and Brensinger,
2003) and relating to an evolving literature in econometrics (including Abadie
et al., 2019; Huntington-Klein, 2020; Borusyak and Hull, 2020), we study how to
improve the efficiency of IV estimates by exploiting the predictable variation
in the strength of the instrument. In the case where both the treatment and
instrument are binary and the instrument is independent of baseline covariates,
we study weighting each observation according to its estimated compliance (that
is, its conditional probability of being affected by the instrument), which we
motivate from a (constrained) solution of the first-stage prediction problem
implicit to IV. The resulting estimator can leverage machine learning to
estimate compliance as a function of baseline covariates. We derive the
large-sample properties of a specific implementation of a weighted IV estimator
in the potential outcomes and local average treatment effect (LATE) frameworks,
and provide tools for inference that remain valid even when the weights are
estimated nonparametrically. With both theoretical results and a simulation
study, we demonstrate that compliance weighting meaningfully reduces the
variance of IV estimates when first-stage heterogeneity is present, and that
this improvement often outweighs any difference between the compliance-weighted
and unweighted IV estimands. These results suggest that in a variety of applied
settings, the precision of IV estimates can be substantially improved by
incorporating compliance estimation.
arXiv link: http://arxiv.org/abs/2108.03726v1
A Theoretical Analysis of Logistic Regression and Bayesian Classifiers
regression and Bayesian classifiers in the case of exponential and
unexponential families of distributions, yielding the following findings.
First, the logistic regression is a less general representation of a Bayesian
classifier. Second, one should suppose distributions of classes for the correct
specification of logistic regression equations. Third, in specific cases, there
is no difference between predicted probabilities from correctly specified
generative Bayesian classifier and discriminative logistic regression.
arXiv link: http://arxiv.org/abs/2108.03715v1
Including the asymmetry of the Lorenz curve into measures of economic inequality
very sensitive to income differences at the tails of the distribution. The
widely used index of inequality can be adjusted to also measure distributional
asymmetry by attaching weights to the distances between the Lorenz curve and
the 45-degree line. The measure is equivalent to the Gini if the distribution
is symmetric. The alternative measure of inequality inherits good properties
from the Gini but is more sensitive to changes in the extremes of the income
distribution.
arXiv link: http://arxiv.org/abs/2108.03623v2
Fully Modified Least Squares Cointegrating Parameter Estimation in Multicointegrated Systems
relationship among variables in a parametric vector autoregressive model that
introduces additional cointegrating links between these variables and partial
sums of the equilibrium errors. This paper departs from the parametric model,
using a semiparametric formulation that reveals the explicit role that
singularity of the long run conditional covariance matrix plays in determining
multicointegration. The semiparametric framework has the advantage that short
run dynamics do not need to be modeled and estimation by standard techniques
such as fully modified least squares (FM-OLS) on the original I(1) system is
straightforward. The paper derives FM-OLS limit theory in the multicointegrated
setting, showing how faster rates of convergence are achieved in the direction
of singularity and that the limit distribution depends on the distribution of
the conditional one-sided long run covariance estimator used in FM-OLS
estimation. Wald tests of restrictions on the regression coefficients have
nonstandard limit theory which depends on nuisance parameters in general. The
usual tests are shown to be conservative when the restrictions are isolated to
the directions of singularity and, under certain conditions, are invariant to
singularity otherwise. Simulations show that approximations derived in the
paper work well in finite samples. The findings are illustrated empirically in
an analysis of fiscal sustainability of the US government over the post-war
period.
arXiv link: http://arxiv.org/abs/2108.03486v1
Culling the herd of moments with penalized empirical likelihood
econometric estimation, but economic theory is mostly agnostic about moment
selection. While a large pool of valid moments can potentially improve
estimation efficiency, in the meantime a few invalid ones may undermine
consistency. This paper investigates the empirical likelihood estimation of
these moment-defined models in high-dimensional settings. We propose a
penalized empirical likelihood (PEL) estimation and establish its oracle
property with consistent detection of invalid moments. The PEL estimator is
asymptotically normally distributed, and a projected PEL procedure further
eliminates its asymptotic bias and provides more accurate normal approximation
to the finite sample behavior. Simulation exercises demonstrate excellent
numerical performance of these methods in estimation and inference.
arXiv link: http://arxiv.org/abs/2108.03382v4
Building a Foundation for Data-Driven, Interpretable, and Robust Policy Design using the AI Economist
issues and trade-offs, e.g., improving equality, productivity, or wellness, and
poses a complex mechanism design problem. A policy designer needs to consider
multiple objectives, policy levers, and behavioral responses from strategic
actors who optimize for their individual objectives. Moreover, real-world
policies should be explainable and robust to simulation-to-reality gaps, e.g.,
due to calibration issues. Existing approaches are often limited to a narrow
set of policy levers or objectives that are hard to measure, do not yield
explicit optimal policies, or do not consider strategic behavior, for example.
Hence, it remains challenging to optimize policy in real-world scenarios. Here
we show that the AI Economist framework enables effective, flexible, and
interpretable policy design using two-level reinforcement learning (RL) and
data-driven simulations. We validate our framework on optimizing the stringency
of US state policies and Federal subsidies during a pandemic, e.g., COVID-19,
using a simulation fitted to real data. We find that log-linear policies
trained using RL significantly improve social welfare, based on both public
health and economic outcomes, compared to past outcomes. Their behavior can be
explained, e.g., well-performing policies respond strongly to changes in
recovery and vaccination rates. They are also robust to calibration errors,
e.g., infection rates that are over or underestimated. As of yet, real-world
policymaking has not seen adoption of machine learning methods at large,
including RL and AI-driven simulations. Our results show the potential of AI to
guide policy design and improve social welfare amidst the complexity of the
real world.
arXiv link: http://arxiv.org/abs/2108.02904v1
Sparse Generalized Yule-Walker Estimation for Large Spatio-temporal Autoregressions with an Application to NO2 Satellite Data
time and space. The model consists of a spatio-temporal regression containing a
time lag and a spatial lag of the dependent variable. Unlike classical spatial
autoregressive models, we do not rely on a predetermined spatial interaction
matrix, but infer all spatial interactions from the data. Assuming sparsity, we
estimate the spatial and temporal dependence fully data-driven by penalizing a
set of Yule-Walker equations. This regularization can be left unstructured, but
we also propose customized shrinkage procedures when observations originate
from spatial grids (e.g. satellite images). Finite sample error bounds are
derived and estimation consistency is established in an asymptotic framework
wherein the sample size and the number of spatial units diverge jointly.
Exogenous variables can be included as well. A simulation exercise shows strong
finite sample performance compared to competing procedures. As an empirical
application, we model satellite measured NO2 concentrations in London. Our
approach delivers forecast improvements over a competitive benchmark and we
discover evidence for strong spatial interactions.
arXiv link: http://arxiv.org/abs/2108.02864v2
Synthetic Controls for Experimental Design
units are large aggregate entities (e.g., markets), and only one or a small
number of units can be exposed to the treatment. In such settings,
randomization of the treatment may result in treated and control groups with
very different characteristics at baseline, inducing biases. We propose a
variety of experimental non-randomized synthetic control designs (Abadie,
Diamond and Hainmueller, 2010, Abadie and Gardeazabal, 2003) that select the
units to be treated, as well as the untreated units to be used as a control
group. Average potential outcomes are estimated as weighted averages of the
outcomes of treated units for potential outcomes with treatment, and weighted
averages the outcomes of control units for potential outcomes without
treatment. We analyze the properties of estimators based on synthetic control
designs and propose new inferential techniques. We show that in experimental
settings with aggregate units, synthetic control designs can substantially
reduce estimation biases in comparison to randomization of the treatment.
arXiv link: http://arxiv.org/abs/2108.02196v5
Nested Pseudo Likelihood Estimation of Continuous-Time Dynamic Discrete Games
choice models (single-agent models and games) by adapting the nested pseudo
likelihood (NPL) estimator of Aguirregabiria and Mira (2002, 2007), developed
for discrete time models with discrete time data, to the continuous time case
with data sampled either discretely (i.e., uniformly-spaced snapshot data) or
continuously. We establish conditions for consistency and asymptotic normality
of the estimator, a local convergence condition, and, for single agent models,
a zero Jacobian property assuring local convergence. We carry out a series of
Monte Carlo experiments using an entry-exit game with five heterogeneous firms
to confirm the large-sample properties and demonstrate finite-sample bias
reduction via iteration. In our simulations we show that the convergence issues
documented for the NPL estimator in discrete time models are less likely to
affect comparable continuous-time models. We also show that there can be large
bias in economically-relevant parameters, such as the competitive effect and
entry cost, from estimating a misspecified discrete time model when in fact the
data generating process is a continuous time model.
arXiv link: http://arxiv.org/abs/2108.02182v2
Semiparametric Functional Factor Models with Bayesian Rank Selection
describes the typical shapes of the functions. However, these parametric
templates can incur significant bias, which undermines both utility and
interpretability. To correct for model misspecification, we augment the
parametric template with an infinite-dimensional nonparametric functional
basis. The nonparametric basis functions are learned from the data and
constrained to be orthogonal to the parametric template, which preserves
distinctness between the parametric and nonparametric terms. This distinctness
is essential to prevent functional confounding, which otherwise induces severe
bias for the parametric terms. The nonparametric factors are regularized with
an ordered spike-and-slab prior that provides consistent rank selection and
satisfies several appealing theoretical properties. The versatility of the
proposed approach is illustrated through applications to synthetic data, human
motor control data, and dynamic yield curve data. Relative to parametric and
semiparametric alternatives, the proposed semiparametric functional factor
model eliminates bias, reduces excessive posterior and predictive uncertainty,
and provides reliable inference on the effective number of nonparametric
terms--all with minimal additional computational costs.
arXiv link: http://arxiv.org/abs/2108.02151v3
Bayesian forecast combination using time-varying features
by constructing time-varying weights based on time series features, which is
called Feature-based Bayesian Forecasting Model Averaging (FEBAMA). Our
framework estimates weights in the forecast combination via Bayesian log
predictive scores, in which the optimal forecasting combination is determined
by time series features from historical information. In particular, we use an
automatic Bayesian variable selection method to add weight to the importance of
different features. To this end, our approach has better interpretability
compared to other black-box forecasting combination schemes. We apply our
framework to stock market data and M3 competition data. Based on our structure,
a simple maximum-a-posteriori scheme outperforms benchmark methods, and
Bayesian variable selection can further enhance the accuracy for both point and
density forecasts.
arXiv link: http://arxiv.org/abs/2108.02082v3
Automated Identification of Climate Risk Disclosures in Annual Corporate Reports
effective in increasing climate risk disclosure in corporate reporting. We use
machine learning to automatically identify disclosures of five different types
of climate-related risks. For this purpose, we have created a dataset of over
120 manually-annotated annual reports by European firms. Applying our approach
to reporting of 337 firms over the last 20 years, we find that risk disclosure
is increasing. Disclosure of transition risks grows more dynamically than
physical risks, and there are marked differences across industries.
Country-specific dynamics indicate that regulatory environments potentially
have an important role to play for increasing disclosure.
arXiv link: http://arxiv.org/abs/2108.01415v1
Learning Causal Models from Conditional Moment Restrictions by Importance Weighting
restrictions. Unlike causal inference under unconditional moment restrictions,
conditional moment restrictions pose serious challenges for causal inference,
especially in high-dimensional settings. To address this issue, we propose a
method that transforms conditional moment restrictions to unconditional moment
restrictions through importance weighting, using a conditional density ratio
estimator. Using this transformation, we successfully estimate nonparametric
functions defined under conditional moment restrictions. Our proposed framework
is general and can be applied to a wide range of methods, including neural
networks. We analyze the estimation error, providing theoretical support for
our proposed method. In experiments, we confirm the soundness of our proposed
method.
arXiv link: http://arxiv.org/abs/2108.01312v2
Partial Identification and Inference for Conditional Distributions of Treatment Effects
treatment effects conditional on observable covariates. Since the conditional
distribution of treatment effects is not point identified without strong
assumptions, we obtain bounds on the conditional distribution of treatment
effects by using the Makarov bounds. We also consider the case where the
treatment is endogenous and propose two stochastic dominance assumptions to
tighten the bounds. We develop a nonparametric framework to estimate the bounds
and establish the asymptotic theory that is uniformly valid over the support of
treatment effects. An empirical example illustrates the usefulness of the
methods.
arXiv link: http://arxiv.org/abs/2108.00723v6
Implementing an Improved Test of Matrix Rank in Stata
test of Chen and Fang (2019) in linear instrumental variable regression models.
Existing rank tests employ critical values that may be too small, and hence may
not even be first order valid in the sense that they may fail to control the
Type I error. By appealing to the bootstrap, they devise a test that overcomes
the deficiency of existing tests. The command bootranktest implements the
two-step version of their test, and also the analytic version if chosen. The
command also accommodates data with temporal and cluster dependence.
arXiv link: http://arxiv.org/abs/2108.00511v1
Semiparametric Estimation of Long-Term Treatment Effects
long delays. We develop semiparametric methods for combining the short-term
outcomes of experiments with observational measurements of short-term and
long-term outcomes, in order to estimate long-term treatment effects. We
characterize semiparametric efficiency bounds for various instances of this
problem. These calculations facilitate the construction of several estimators.
We analyze the finite-sample performance of these estimators with a simulation
calibrated to data from an evaluation of the long-term effects of a poverty
alleviation program.
arXiv link: http://arxiv.org/abs/2107.14405v5
Inference in heavy-tailed non-stationary multivariate time series
$N$-variate time series $y_{t}$, in the possible presence of heavy tails. We
propose a novel methodology which does not require any knowledge or estimation
of the tail index, or even knowledge as to whether certain moments (such as the
variance) exist or not, and develop an estimator of the number of stochastic
trends $m$ based on the eigenvalues of the sample second moment matrix of
$y_{t}$. We study the rates of such eigenvalues, showing that the first $m$
ones diverge, as the sample size $T$ passes to infinity, at a rate faster by
$O\left(T \right)$ than the remaining $N-m$ ones, irrespective of the tail
index. We thus exploit this eigen-gap by constructing, for each eigenvalue, a
test statistic which diverges to positive infinity or drifts to zero according
to whether the relevant eigenvalue belongs to the set of the first $m$
eigenvalues or not. We then construct a randomised statistic based on this,
using it as part of a sequential testing procedure, ensuring consistency of the
resulting estimator of $m$. We also discuss an estimator of the common trends
based on principal components and show that, up to a an invertible linear
transformation, such estimator is consistent in the sense that the estimation
error is of smaller order than the trend itself. Finally, we also consider the
case in which we relax the standard assumption of i.i.d. innovations,
by allowing for heterogeneity of a very general form in the scale of the
innovations. A Monte Carlo study shows that the proposed estimator for $m$
performs particularly well, even in samples of small size. We complete the
paper by presenting four illustrative applications covering commodity prices,
interest rates data, long run PPP and cryptocurrency markets.
arXiv link: http://arxiv.org/abs/2107.13894v1
Machine Learning and Factor-Based Portfolio Optimization
that factors based on autoencoder neural networks exhibit a weaker relationship
with commonly used characteristic-sorted portfolios than popular dimensionality
reduction techniques. Machine learning methods also lead to covariance and
portfolio weight structures that diverge from simpler estimators.
Minimum-variance portfolios using latent factors derived from autoencoders and
sparse methods outperform simpler benchmarks in terms of risk minimization.
These effects are amplified for investors with an increased sensitivity to
risk-adjusted returns, during high volatility periods or when accounting for
tail risk.
arXiv link: http://arxiv.org/abs/2107.13866v1
Design-Robust Two-Way-Fixed-Effects Regression For Panel Data
with panel data in settings with general treatment patterns. Our approach
augments the popular two-way-fixed-effects specification with unit-specific
weights that arise from a model for the assignment mechanism. We show how to
construct these weights in various settings, including the staggered adoption
setting, where units opt into the treatment sequentially but permanently. The
resulting estimator converges to an average (over units and time) treatment
effect under the correct specification of the assignment model, even if the
fixed effect model is misspecified. We show that our estimator is more robust
than the conventional two-way estimator: it remains consistent if either the
assignment mechanism or the two-way regression model is correctly specified. In
addition, the proposed estimator performs better than the two-way-fixed-effect
estimator if the outcome model and assignment mechanism are locally
misspecified. This strong double robustness property underlines and quantifies
the benefits of modeling the assignment process and motivates using our
estimator in practice. We also discuss an extension of our estimator to handle
dynamic treatment effects.
arXiv link: http://arxiv.org/abs/2107.13737v3
Estimating high-dimensional Markov-switching VARs
autoregressions (MS-VARs) can be challenging or infeasible due to parameter
proliferation. To accommodate situations where dimensionality may be of
comparable order to or exceeds the sample size, we adopt a sparse framework and
propose two penalized maximum likelihood estimators with either the Lasso or
the smoothly clipped absolute deviation (SCAD) penalty. We show that both
estimators are estimation consistent, while the SCAD estimator also selects
relevant parameters with probability approaching one. A modified EM-algorithm
is developed for the case of Gaussian errors and simulations show that the
algorithm exhibits desirable finite sample performance. In an application to
short-horizon return predictability in the US, we estimate a 15 variable
2-state MS-VAR(1) and obtain the often reported counter-cyclicality in
predictability. The variable selection property of our estimators helps to
identify predictors that contribute strongly to predictability during economic
contractions but are otherwise irrelevant in expansions. Furthermore,
out-of-sample analyses indicate that large MS-VARs can significantly outperform
"hard-to-beat" predictors like the historical average.
arXiv link: http://arxiv.org/abs/2107.12552v1
A Unifying Framework for Testing Shape Restrictions
unifying framework for testing shape restrictions based on the Wald principle.
The test has asymptotic uniform size control and is uniformly consistent.
Second, we examine the applicability and usefulness of some prominent shape
enforcing operators in implementing our framework. In particular, in stark
contrast to its use in point and interval estimation, the rearrangement
operator is inapplicable due to a lack of convexity. The greatest convex
minorization and the least concave majorization are shown to enjoy the analytic
properties required to employ our framework. Third, we show that, despite that
the projection operator may not be well-defined/behaved in general parameter
spaces such as those defined by uniform norms, one may nonetheless employ a
powerful distance-based test by applying our framework. Monte Carlo simulations
confirm that our test works well. We further showcase the empirical relevance
by investigating the relationship between weekly working hours and the annual
wage growth in the high-end labor market.
arXiv link: http://arxiv.org/abs/2107.12494v2
Semiparametric Estimation of Treatment Effects in Observational Studies with Heterogeneous Partial Interference
units are connected, and one unit's treatment and attributes may affect
another's treatment and outcome, violating the stable unit treatment value
assumption (SUTVA) and resulting in interference. To enable feasible estimation
and inference, many previous works assume exchangeability of interfering units
(neighbors). However, in many applications with distinctive units, interference
is heterogeneous and needs to be modeled explicitly. In this paper, we focus on
the partial interference setting, and only restrict units to be exchangeable
conditional on observable characteristics. Under this framework, we propose
generalized augmented inverse propensity weighted (AIPW) estimators for general
causal estimands that include heterogeneous direct and spillover effects. We
show that they are semiparametric efficient and robust to heterogeneous
interference as well as model misspecifications. We apply our methods to the
Add Health dataset to study the direct effects of alcohol consumption on
academic performance and the spillover effects of parental incarceration on
adolescent well-being.
arXiv link: http://arxiv.org/abs/2107.12420v3
Adaptive Estimation and Uniform Confidence Bands for Nonparametric Structural Functions and Elasticities
in nonparametric models using instrumental variables. The first is a
data-driven choice of sieve dimension for a popular class of sieve two-stage
least squares estimators. When implemented with this choice, estimators of both
the structural function $h_0$ and its derivatives (such as elasticities)
converge at the fastest possible (i.e., minimax) rates in sup-norm. The second
is for constructing uniform confidence bands (UCBs) for $h_0$ and its
derivatives. Our UCBs guarantee coverage over a generic class of
data-generating processes and contract at the minimax rate, possibly up to a
logarithmic factor. As such, our UCBs are asymptotically more efficient than
UCBs based on the usual approach of undersmoothing. As an application, we
estimate the elasticity of the intensive margin of firm exports in a
monopolistic competition model of international trade. Simulations illustrate
the good performance of our procedures in empirically calibrated designs. Our
results provide evidence against common parameterizations of the distribution
of unobserved firm heterogeneity.
arXiv link: http://arxiv.org/abs/2107.11869v3
Federated Causal Inference in Heterogeneous Observational Data
individuals at multiple sites, where data is stored locally for each site. Due
to privacy constraints, individual-level data cannot be shared across sites;
the sites may also have heterogeneous populations and treatment assignment
mechanisms. Motivated by these considerations, we develop federated methods to
draw inference on the average treatment effects of combined data across sites.
Our methods first compute summary statistics locally using propensity scores
and then aggregate these statistics across sites to obtain point and variance
estimators of average treatment effects. We show that these estimators are
consistent and asymptotically normal. To achieve these asymptotic properties,
we find that the aggregation schemes need to account for the heterogeneity in
treatment assignments and in outcomes across sites. We demonstrate the validity
of our federated methods through a comparative study of two large medical
claims databases.
arXiv link: http://arxiv.org/abs/2107.11732v5
The macroeconomic cost of climate volatility
on 133 countries between 1960 and 2019. We show that the conditional (ex ante)
volatility of annual temperatures increased steadily over time, rendering
climate conditions less predictable across countries, with important
implications for growth. Controlling for concomitant changes in temperatures, a
+1 degree C increase in temperature volatility causes on average a 0.3 percent
decline in GDP growth and a 0.7 percent increase in the volatility of GDP.
Unlike changes in average temperatures, changes in temperature volatility
affect both rich and poor countries.
arXiv link: http://arxiv.org/abs/2108.01617v2
Recent Developments in Inference: Practicalities for Applied Economics
errors and test statistics for statistical inference. While much of the focus
of the last two decades in economics has been on generating unbiased
coefficients, recent years has seen a variety of advancements in correcting for
non-standard standard errors. We synthesize these recent advances in addressing
challenges to conventional inference, like heteroskedasticity, clustering,
serial correlation, and testing multiple hypotheses. We also discuss recent
advancements in numerical methods, such as the bootstrap, wild bootstrap, and
randomization inference. We make three specific recommendations. First, applied
economists need to clearly articulate the challenges to statistical inference
that are present in data as well as the source of those challenges. Second,
modern computing power and statistical software means that applied economists
have no excuse for not correctly calculating their standard errors and test
statistics. Third, because complicated sampling strategies and research designs
make it difficult to work out the correct formula for standard errors and test
statistics, we believe that in the applied economics profession it should
become standard practice to rely on asymptotic refinements to the distribution
of an estimator or test statistic via bootstrapping. Throughout, we reference
built-in and user-written Stata commands that allow one to quickly calculate
accurate standard errors and relevant test statistics.
arXiv link: http://arxiv.org/abs/2107.09736v1
Distributional Effects with Two-Sided Measurement Error: An Application to Intergenerational Income Mobility
parameters that depend on the joint distribution of an outcome and another
variable of interest ("treatment") in a setting with "two-sided" measurement
error -- that is, where both variables are possibly measured with error.
Examples of these parameters in the context of intergenerational income
mobility include transition matrices, rank-rank correlations, and the poverty
rate of children as a function of their parents' income, among others. Building
on recent work on quantile regression (QR) with measurement error in the
outcome (particularly, Hausman, Liu, Luo, and Palmer (2021)), we show that,
given (i) two linear QR models separately for the outcome and treatment
conditional on other observed covariates and (ii) assumptions about the
measurement error for each variable, one can recover the joint distribution of
the outcome and the treatment. Besides these conditions, our approach does not
require an instrument, repeated measurements, or distributional assumptions
about the measurement error. Using recent data from the 1997 National
Longitudinal Study of Youth, we find that accounting for measurement error
notably reduces several estimates of intergenerational mobility parameters.
arXiv link: http://arxiv.org/abs/2107.09235v4
Mind the Income Gap: Bias Correction of Inequality Estimators in Small-Sized Samples
to an underestimation. This aspect deserves particular attention when
estimating inequality in small domains and performing small area estimation at
the area level. We propose a bias correction framework for a large class of
inequality measures comprising the Gini Index, the Generalized Entropy and the
Atkinson index families by accounting for complex survey designs. The proposed
methodology does not require any parametric assumption on income distribution,
being very flexible. Design-based performance evaluation of our proposal has
been carried out using EU-SILC data, their results show a noticeable bias
reduction for all the measures. Lastly, an illustrative example of application
in small area estimation confirms that ignoring ex-ante bias correction
determines model misspecification.
arXiv link: http://arxiv.org/abs/2107.08950v3
Decoupling Shrinkage and Selection for the Bayesian Quantile Regression
continuous priors to Bayesian Quantile Regression (BQR). The procedure follows
two steps: In the first step, we shrink the quantile regression posterior
through state of the art continuous priors and in the second step, we sparsify
the posterior through an efficient variant of the adaptive lasso, the signal
adaptive variable selection (SAVS) algorithm. We propose a new variant of the
SAVS which automates the choice of penalisation through quantile specific
loss-functions that are valid in high dimensions. We show in large scale
simulations that our selection procedure decreases bias irrespective of the
true underlying degree of sparsity in the data, compared to the un-sparsified
regression posterior. We apply our two-step approach to a high dimensional
growth-at-risk (GaR) exercise. The prediction accuracy of the un-sparsified
posterior is retained while yielding interpretable quantile specific variable
selection results. Our procedure can be used to communicate to policymakers
which variables drive downside risk to the macro economy.
arXiv link: http://arxiv.org/abs/2107.08498v1
Hamiltonian Monte Carlo for Regression with High-Dimensional Categorical Data
high-dimensional categorical data like text and surveys. We demonstrate the
effectiveness of Hamiltonian Monte Carlo (HMC) with parallelized automatic
differentiation for analyzing such data in a computationally efficient and
methodologically sound manner. Our new model, Supervised Topic Model with
Covariates, shows that carefully modeling this type of data can have
significant implications on conclusions compared to a simpler, frequently used,
yet methodologically problematic, two-step approach. A simulation study and
revisiting Bandiera et al. (2020)'s study of executive time use demonstrate
these results. The approach accommodates thousands of parameters and doesn't
require custom algorithms specific to each model, making it accessible for
applied researchers
arXiv link: http://arxiv.org/abs/2107.08112v2
Flexible Covariate Adjustments in Regression Discontinuity Designs
their specifications to increase the precision of their estimates. In this
paper, we propose a novel class of estimators that use such covariate
information more efficiently than existing methods and can accommodate many
covariates. Our estimators are simple to implement and involve running a
standard RD analysis after subtracting a function of the covariates from the
original outcome variable. We characterize the function of the covariates that
minimizes the asymptotic variance of these estimators. We also show that the
conventional RD framework gives rise to a special robustness property which
implies that the optimal adjustment function can be estimated flexibly via
modern machine learning techniques without affecting the first-order properties
of the final RD estimator. We demonstrate our methods' scope for efficiency
improvements by reanalyzing data from a large number of recently published
empirical studies.
arXiv link: http://arxiv.org/abs/2107.07942v5
Subspace Shrinkage in Conjugate Bayesian Vector Autoregressions
either a large Vector Autoregression (VAR) or a factor model. In this paper, we
develop methods for combining the two using a subspace shrinkage prior.
Subspace priors shrink towards a class of functions rather than directly
forcing the parameters of a model towards some pre-specified location. We
develop a conjugate VAR prior which shrinks towards the subspace which is
defined by a factor model. Our approach allows for estimating the strength of
the shrinkage as well as the number of factors. After establishing the
theoretical properties of our proposed prior, we carry out simulations and
apply it to US macroeconomic data. Using simulations we show that our framework
successfully detects the number of factors. In a forecasting exercise involving
a large macroeconomic data set we find that combining VARs with factor models
using our prior can lead to forecast improvements.
arXiv link: http://arxiv.org/abs/2107.07804v1
Generalized Covariance Estimator
errors. This class of processes includes the standard Vector Autoregressive
(VAR) model, the nonfundamental structural VAR, the mixed causal-noncausal
models, as well as nonlinear dynamic models such as the (multivariate) ARCH-M
model. For estimation of processes in this class, we propose the Generalized
Covariance (GCov) estimator, which is obtained by minimizing a residual-based
multivariate portmanteau statistic as an alternative to the Generalized Method
of Moments. We derive the asymptotic properties of the GCov estimator and of
the associated residual-based portmanteau statistic. Moreover, we show that the
GCov estimators are semi-parametrically efficient and the residual-based
portmanteau statistics are asymptotically chi-square distributed. The finite
sample performance of the GCov estimator is illustrated in a simulation study.
The estimator is also applied to a dynamic model of cryptocurrency prices.
arXiv link: http://arxiv.org/abs/2107.06979v1
Time Series Estimation of the Dynamic Effects of Disaster-Type Shock
primitive shocks are mutually independent. First, a framework is proposed to
accommodate a disaster-type variable with infinite variance into a SVAR. We
show that the least squares estimates of the SVAR are consistent but have
non-standard asymptotics. Second, the disaster shock is identified as the
component with the largest kurtosis and whose impact effect is negative. An
estimator that is robust to infinite variance is used to recover the mutually
independent components. Third, an independence test on the residuals
pre-whitened by the Choleski decomposition is proposed to test the restrictions
imposed on a SVAR. The test can be applied whether the data have fat or thin
tails, and to over as well as exactly identified models. Three applications are
considered. In the first, the independence test is used to shed light on the
conflicting evidence regarding the role of uncertainty in economic
fluctuations. In the second, disaster shocks are shown to have short term
economic impact arising mostly from feedback dynamics. The third uses the
framework to study the dynamic effects of economic shocks post-covid.
arXiv link: http://arxiv.org/abs/2107.06663v3
Financial Return Distributions: Past, Present, and COVID-19
cryptocurrencies, and contracts for differences (CFDs) representing stock
indices, stock shares, and commodities. Based on recent data from the years
2017--2020, we model tails of the return distributions at different time scales
by using power-law, stretched exponential, and $q$-Gaussian functions. We focus
on the fitted function parameters and how they change over the years by
comparing our results with those from earlier studies and find that, on the
time horizons of up to a few minutes, the so-called "inverse-cubic power-law"
still constitutes an appropriate global reference. However, we no longer
observe the hypothesized universal constant acceleration of the market time
flow that was manifested before in an ever faster convergence of empirical
return distributions towards the normal distribution. Our results do not
exclude such a scenario but, rather, suggest that some other short-term
processes related to a current market situation alter market dynamics and may
mask this scenario. Real market dynamics is associated with a continuous
alternation of different regimes with different statistical properties. An
example is the COVID-19 pandemic outburst, which had an enormous yet short-time
impact on financial markets. We also point out that two factors -- speed of the
market time flow and the asset cross-correlation magnitude -- while related
(the larger the speed, the larger the cross-correlations on a given time
scale), act in opposite directions with regard to the return distribution
tails, which can affect the expected distribution convergence to the normal
distribution.
arXiv link: http://arxiv.org/abs/2107.06659v1
MinP Score Tests with an Inequality Constrained Parameter Space
restricted by the null hypothesis, which often is much simpler than models
defined under the alternative hypothesis. This is typically so when the
alternative hypothesis involves inequality constraints. However, existing score
tests address only jointly testing all parameters of interest; a leading
example is testing all ARCH parameters or variances of random coefficients
being zero or not. In such testing problems rejection of the null hypothesis
does not provide evidence on rejection of specific elements of parameter of
interest. This paper proposes a class of one-sided score tests for testing a
model parameter that is subject to inequality constraints. Proposed tests are
constructed based on the minimum of a set of $p$-values. The minimand includes
the $p$-values for testing individual elements of parameter of interest using
individual scores. It may be extended to include a $p$-value of existing score
tests. We show that our tests perform better than/or perform as good as
existing score tests in terms of joint testing, and has furthermore the added
benefit of allowing for simultaneously testing individual elements of parameter
of interest. The added benefit is appealing in the sense that it can identify a
model without estimating it. We illustrate our tests in linear regression
models, ARCH and random coefficient models. A detailed simulation study is
provided to examine the finite sample performance of the proposed tests and we
find that our tests perform well as expected.
arXiv link: http://arxiv.org/abs/2107.06089v1
Testability of Reverse Causality Without Exogenous Variation
the absence of exogenous variation, such as in the form of instrumental
variables. Instead of relying on exogenous variation, we achieve testability by
imposing relatively weak model restrictions and exploiting that a dependence of
residual and purported cause is informative about the causal direction. Our
main assumption is that the true functional relationship is nonlinear and that
error terms are additively separable. We extend previous results by
incorporating control variables and allowing heteroskedastic errors. We build
on reproducing kernel Hilbert space (RKHS) embeddings of probability
distributions to test conditional independence and demonstrate the efficacy in
detecting the causal direction in both Monte Carlo simulations and an
application to German survey data.
arXiv link: http://arxiv.org/abs/2107.05936v2
Identification of Average Marginal Effects in Fixed Effects Dynamic Discrete Choice Models
because they cannot identify average marginal effects (AMEs) in short panels.
The common argument is that identifying AMEs requires knowledge of the
distribution of unobserved heterogeneity, but this distribution is not
identified in a fixed effects model with a short panel. In this paper, we
derive identification results that contradict this argument. In a panel data
dynamic logit model, and for $T$ as small as three, we prove the point
identification of different AMEs, including causal effects of changes in the
lagged dependent variable or the last choice's duration. Our proofs are
constructive and provide simple closed-form expressions for the AMEs in terms
of probabilities of choice histories. We illustrate our results using Monte
Carlo experiments and with an empirical application of a dynamic structural
model of consumer brand choice with state dependence.
arXiv link: http://arxiv.org/abs/2107.06141v2
Inference on Individual Treatment Effects in Nonseparable Triangular Models
binary instrumental variable, Vuong and Xu (2017) established identification
results for individual treatment effects (ITEs) under the rank invariance
assumption. Using their approach, Feng, Vuong, and Xu (2019) proposed a
uniformly consistent kernel estimator for the density of the ITE that utilizes
estimated ITEs. In this paper, we establish the asymptotic normality of the
density estimator of Feng, Vuong, and Xu (2019) and show that the ITE
estimation errors have a non-negligible effect on the asymptotic distribution
of the estimator. We propose asymptotically valid standard errors that account
for ITEs estimation, as well as a bias correction. Furthermore, we develop
uniform confidence bands for the density of the ITE using the jackknife
multiplier or nonparametric bootstrap critical values.
arXiv link: http://arxiv.org/abs/2107.05559v4
A Lucas Critique Compliant SVAR model with Observation-driven Time-varying Parameters
with the Lucas Critique, structural shocks drive both the evolution of the
macro variables and the dynamics of the VAR parameters. Contrary to existing
approaches where parameters follow a stochastic process with random and
exogenous shocks, our observation-driven specification allows the evolution of
the parameters to be driven by realized past structural shocks, thus opening
the possibility to gauge the impact of observed shocks and hypothetical policy
interventions on the future evolution of the economic system.
arXiv link: http://arxiv.org/abs/2107.05263v2
Inference for the proportional odds cumulative logit model with monotonicity constraints for ordinal predictors and ordinal response
model for an ordinal response. Ordinality of predictors can be incorporated by
monotonicity constraints for the corresponding parameters. It is shown that
estimators defined by optimization, such as maximum likelihood estimators, for
an unconstrained model and for parameters in the interior set of the parameter
space of a constrained model are asymptotically equivalent. This is used in
order to derive asymptotic confidence regions and tests for the constrained
model, involving simple modifications for finite samples. The finite sample
coverage probability of the confidence regions is investigated by simulation.
Tests concern the effect of individual variables, monotonicity, and a specified
monotonicity direction. The methodology is applied on real data related to the
assessment of school performance.
arXiv link: http://arxiv.org/abs/2107.04946v4
Machine Learning for Financial Forecasting, Planning and Analysis: Recent Developments and Pitfalls
forecasting, planning and analysis (FP&A). Machine learning appears well
suited to support FP&A with the highly automated extraction of information
from large amounts of data. However, because most traditional machine learning
techniques focus on forecasting (prediction), we discuss the particular care
that must be taken to avoid the pitfalls of using them for planning and
resource allocation (causal inference). While the naive application of machine
learning usually fails in this context, the recently developed double machine
learning framework can address causal questions of interest. We review the
current literature on machine learning in FP&A and illustrate in a simulation
study how machine learning can be used for both forecasting and planning. We
also investigate how forecasting and planning improve as the number of data
points increases.
arXiv link: http://arxiv.org/abs/2107.04851v1
Estimation and Inference in Factor Copula Models with Exogenous Covariates
estimable from exogenous information. Point estimation and inference are based
on a simulated methods of moments (SMM) approach with non-overlapping
simulation draws. Consistency and limiting normality of the estimator is
established and the validity of bootstrap standard errors is shown. Doing so,
previous results from the literature are verified under low-level conditions
imposed on the individual components of the factor structure. Monte Carlo
evidence confirms the accuracy of the asymptotic theory in finite samples and
an empirical application illustrates the usefulness of the model to explain the
cross-sectional dependence between stock returns.
arXiv link: http://arxiv.org/abs/2107.03366v4
Dynamic Ordered Panel Logit Models
effects. The main contribution of the paper is to construct a set of valid
moment conditions that are free of the fixed effects. The moment functions can
be computed using four or more periods of data, and the paper presents
sufficient conditions for the moment conditions to identify the common
parameters of the model, namely the regression coefficients, the autoregressive
parameters, and the threshold parameters. The availability of moment conditions
suggests that these common parameters can be estimated using the generalized
method of moments, and the paper documents the performance of this estimator
using Monte Carlo simulations and an empirical illustration to self-reported
health status using the British Household Panel Survey.
arXiv link: http://arxiv.org/abs/2107.03253v4
Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy
2020 US Census, enhancing the privacy of respondents while potentially reducing
the precision of economic analysis. To investigate whether this trade-off is
inevitable, we formulate a semiparametric model of causal inference with high
dimensional corrupted data. We propose a procedure for data cleaning,
estimation, and inference with data cleaning-adjusted confidence intervals. We
prove consistency and Gaussian approximation by finite sample arguments, with a
rate of $n^{ 1/2}$ for semiparametric estimands that degrades gracefully for
nonparametric estimands. Our key assumption is that the true covariates are
approximately low rank, which we interpret as approximate repeated measurements
and empirically validate. Our analysis provides nonasymptotic theoretical
contributions to matrix completion, statistical learning, and semiparametric
statistics. Calibrated simulations verify the coverage of our data cleaning
adjusted confidence intervals and demonstrate the relevance of our results for
Census-derived data.
arXiv link: http://arxiv.org/abs/2107.02780v6
Shapes as Product Differentiation: Neural Network Embedding in the Analysis of Markets for Fonts
thus high-dimensional (e.g., design, text). Instead of treating unstructured
attributes as unobservables in economic models, quantifying them can be
important to answer interesting economic questions. To propose an analytical
framework for these types of products, this paper considers one of the simplest
design products-fonts-and investigates merger and product differentiation using
an original dataset from the world's largest online marketplace for fonts. We
quantify font shapes by constructing embeddings from a deep convolutional
neural network. Each embedding maps a font's shape onto a low-dimensional
vector. In the resulting product space, designers are assumed to engage in
Hotelling-type spatial competition. From the image embeddings, we construct two
alternative measures that capture the degree of design differentiation. We then
study the causal effects of a merger on the merging firm's creative decisions
using the constructed measures in a synthetic control method. We find that the
merger causes the merging firm to increase the visual variety of font design.
Notably, such effects are not captured when using traditional measures for
product offerings (e.g., specifications and the number of products) constructed
from structured data.
arXiv link: http://arxiv.org/abs/2107.02739v2
Gravity models of networks: integrating maximum-entropy and econometric approaches
among world countries. Characterizing both the local link weights (observed
trade volumes) and the global network structure (large-scale topology) of the
WTW via a single model is still an open issue. While the traditional Gravity
Model (GM) successfully replicates the observed trade volumes by employing
macroeconomic properties such as GDP and geographic distance, it,
unfortunately, predicts a fully connected network, thus returning a completely
unrealistic topology of the WTW. To overcome this problem, two different
classes of models have been introduced in econometrics and statistical physics.
Econometric approaches interpret the traditional GM as the expected value of a
probability distribution that can be chosen arbitrarily and tested against
alternative distributions. Statistical physics approaches construct
maximum-entropy probability distributions of (weighted) graphs from a chosen
set of measurable structural constraints and test distributions resulting from
different constraints. Here we compare and integrate the two approaches by
considering a class of maximum-entropy models that can incorporate
macroeconomic properties used in standard econometric models. We find that the
integrated approach achieves a better performance than the purely econometric
one. These results suggest that the maximum-entropy construction can serve as a
viable econometric framework wherein extensive and intensive margins can be
separately controlled for, by combining topological constraints and dyadic
macroeconomic variables.
arXiv link: http://arxiv.org/abs/2107.02650v3
Difference-in-Differences with a Continuous Treatment
treatment. We show that treatment effect on the treated-type parameters can be
identified under a generalized parallel trends assumption that is similar to
the binary treatment setup. However, interpreting differences in these
parameters across different values of the treatment can be particularly
challenging due to selection bias that is not ruled out by the parallel trends
assumption. We discuss alternative, typically stronger, assumptions that
alleviate these challenges. We also provide a variety of treatment effect
decomposition results, highlighting that parameters associated with popular
linear two-way fixed-effects specifications can be hard to interpret,
even when there are only two time periods. We introduce alternative
estimation procedures that do not suffer from these drawbacks and show in an
application that they can lead to different conclusions.
arXiv link: http://arxiv.org/abs/2107.02637v7
Inference for Low-Rank Models
parameter matrix that can be well-approximated by a “spiked low-rank matrix.”
A spiked low-rank matrix has rank that grows slowly compared to its dimensions
and nonzero singular values that diverge to infinity. We show that this
framework covers a broad class of models of latent-variables which can
accommodate matrix completion problems, factor models, varying coefficient
models, and heterogeneous treatment effects. For inference, we apply a
procedure that relies on an initial nuclear-norm penalized estimation step
followed by two ordinary least squares regressions. We consider the framework
of estimating incoherent eigenvectors and use a rotation argument to argue that
the eigenspace estimation is asymptotically unbiased. Using this framework we
show that our procedure provides asymptotically normal inference and achieves
the semiparametric efficiency bound. We illustrate our framework by providing
low-level conditions for its application in a treatment effects context where
treatment assignment might be strongly dependent.
arXiv link: http://arxiv.org/abs/2107.02602v2
Big Data Information and Nowcasting: Consumption and Investment from Bank Transactions in Turkey
Garanti BBVA Bank transactions to mimic domestic private demand. Particularly,
we replicate the quarterly national accounts aggregate consumption and
investment (gross fixed capital formation) and its bigger components (Machinery
and Equipment and Construction) in real time for the case of Turkey. In order
to validate the usefulness of the information derived from these indicators we
test the nowcasting ability of both indicators to nowcast the Turkish GDP using
different nowcasting models. The results are successful and confirm the
usefulness of Consumption and Investment Banking transactions for nowcasting
purposes. The value of the Big data information is more relevant at the
beginning of the nowcasting process, when the traditional hard data information
is scarce. This makes this information specially relevant for those countries
where statistical release lags are longer like the Emerging Markets.
arXiv link: http://arxiv.org/abs/2107.03299v1
Partial Identification and Inference in Duration Models with Endogenous Censoring
endogenous censoring. Many kinds of duration models, such as the accelerated
failure time model, proportional hazard model, and mixed proportional hazard
model, can be viewed as transformation models. We allow the censoring of a
duration outcome to be arbitrarily correlated with observed covariates and
unobserved heterogeneity. We impose no parametric restrictions on either the
transformation function or the distribution function of the unobserved
heterogeneity. In this setting, we develop bounds on the regression parameters
and the transformation function, which are characterized by conditional moment
inequalities involving U-statistics. We provide inference methods for them by
constructing an inference approach for conditional moment inequality models in
which the sample analogs of moments are U-statistics. We apply the proposed
inference methods to evaluate the effect of heart transplants on patients'
survival time using data from the Stanford Heart Transplant Study.
arXiv link: http://arxiv.org/abs/2107.00928v1
Feasible Implied Correlation Matrices from Factor Structures
applications, including factor-based asset pricing, forecasting stock-price
movements or pricing index options. With a focus on non-FX markets, this paper
defines necessary conditions for option implied correlation matrices to be
mathematically and economically feasible and argues, that existing models are
typically not capable of guaranteeing so. To overcome this difficulty, the
problem is addressed from the underlying factor structure and introduces two
approaches to solve it. Under the quantitative approach, the puzzle is
reformulated into a nearest correlation matrix problem which can be used either
as a stand-alone estimate or to re-establish positive-semi-definiteness of any
other model's estimate. From an economic approach, it is discussed how expected
correlations between stocks and risk factors (like CAPM, Fama-French) can be
translated into a feasible implied correlation matrix. Empirical experiments
are carried out on monthly option data of the S&P 100 and S&P 500 index
(1996-2020).
arXiv link: http://arxiv.org/abs/2107.00427v1
A conditional independence test for causality in econometrics
of a multivariate regression.However, it is rarely used in practice since it
requires identifying multiple conditionally independent instruments, which is
often impossible. We propose a heuristic test which relaxes the independence
requirement. We then show how to apply this heuristic test on a price-demand
and a firm loan-productivity problem. We conclude that the test is informative
when the variables are linearly related with Gaussian additive noise, but it
can be misleading in other contexts. Still, we believe that the test can be a
useful concept for falsifying a proposed control set.
arXiv link: http://arxiv.org/abs/2107.09765v1
National-scale electricity peak load forecasting: Traditional, machine learning, or hybrid model?
electrification, the importance of accurate peak load forecasting is
increasing. Traditional peak load forecasting has been conducted through time
series-based models; however, recently, new models based on machine or deep
learning are being introduced. This study performs a comparative analysis to
determine the most accurate peak load-forecasting model for Korea, by comparing
the performance of time series, machine learning, and hybrid models. Seasonal
autoregressive integrated moving average with exogenous variables (SARIMAX) is
used for the time series model. Artificial neural network (ANN), support vector
regression (SVR), and long short-term memory (LSTM) are used for the machine
learning models. SARIMAX-ANN, SARIMAX-SVR, and SARIMAX-LSTM are used for the
hybrid models. The results indicate that the hybrid models exhibit significant
improvement over the SARIMAX model. The LSTM-based models outperformed the
others; the single and hybrid LSTM models did not exhibit a significant
performance difference. In the case of Korea's highest peak load in 2019, the
predictive power of the LSTM model proved to be greater than that of the
SARIMAX-LSTM model. The LSTM, SARIMAX-SVR, and SARIMAX-LSTM models outperformed
the current time series-based forecasting model used in Korea. Thus, Korea's
peak load-forecasting performance can be improved by including machine learning
or hybrid models.
arXiv link: http://arxiv.org/abs/2107.06174v1
A Note on the Topology of the First Stage of 2SLS with Many Instruments
approximated using asymptotic theories. Two main asymptotic constructions have
been used to characterize the presence of many instruments. The first assumes
that the number of instruments increases with the sample size. I demonstrate
that in this case, one of the key assumptions used in the asymptotic
construction may imply that the number of “effective" instruments should be
finite, resulting in an internal contradiction. The second asymptotic
representation considers that the number of instrumental variables (IVs) may be
finite, infinite, or even a continuum. The number does not change with the
sample size. In this scenario, the regularized estimator obtained depends on
the topology imposed on the set of instruments as well as on a regularization
parameter. These restrictions may induce a bias or restrict the set of
admissible instruments. However, the assumptions are internally coherent. The
limitations of many IVs asymptotic assumptions provide support for finite
sample distributional studies to better understand the behavior of many IV
estimators.
arXiv link: http://arxiv.org/abs/2106.15003v1
The Role of Contextual Information in Best Arm Identification
contextual (covariate) information is available in stochastic bandits. Although
we can use contextual information in each round, we are interested in the
marginalized mean reward over the contextual distribution. Our goal is to
identify the best arm with a minimal number of samplings under a given value of
the error rate. We show the instance-specific sample complexity lower bounds
for the problem. Then, we propose a context-aware version of the
"Track-and-Stop" strategy, wherein the proportion of the arm draws tracks the
set of optimal allocations and prove that the expected number of arm draws
matches the lower bound asymptotically. We demonstrate that contextual
information can be used to improve the efficiency of the identification of the
best marginalized mean reward compared with the results of Garivier & Kaufmann
(2016). We experimentally confirm that context information contributes to
faster best-arm identification.
arXiv link: http://arxiv.org/abs/2106.14077v3
Nonparametric inference on counterfactuals in first-price auctions
private values, we develop nonparametric estimators for several policy-relevant
targets, such as the bidder's surplus and auctioneer's revenue under
counterfactual reserve prices. Motivated by the linearity of these targets in
the quantile function of bidders' values, we propose an estimator of the latter
and derive its Bahadur-Kiefer expansion. This makes it possible to construct
uniform confidence bands and test complex hypotheses about the auction design.
Using the data on U.S. Forest Service timber auctions, we test whether setting
zero reserve prices in these auctions was revenue maximizing.
arXiv link: http://arxiv.org/abs/2106.13856v3
Constrained Classification and Policy Learning
support vector machines, and deep neural networks, utilize surrogate loss
techniques to circumvent the computational complexity of minimizing empirical
classification risk. These techniques are also useful for causal policy
learning problems, since estimation of individualized treatment rules can be
cast as a weighted (cost-sensitive) classification problem. Consistency of the
surrogate loss approaches studied in Zhang (2004) and Bartlett et al. (2006)
crucially relies on the assumption of correct specification, meaning that the
specified set of classifiers is rich enough to contain a first-best classifier.
This assumption is, however, less credible when the set of classifiers is
constrained by interpretability or fairness, leaving the applicability of
surrogate loss based algorithms unknown in such second-best scenarios. This
paper studies consistency of surrogate loss procedures under a constrained set
of classifiers without assuming correct specification. We show that in the
setting where the constraint restricts the classifier's prediction set only,
hinge losses (i.e., $\ell_1$-support vector machines) are the only surrogate
losses that preserve consistency in second-best scenarios. If the constraint
additionally restricts the functional form of the classifier, consistency of a
surrogate loss approach is not guaranteed even with hinge loss. We therefore
characterize conditions for the constrained set of classifiers that can
guarantee consistency of hinge risk minimizing classifiers. Exploiting our
theoretical results, we develop robust and computationally attractive hinge
loss based procedures for a monotone classification problem.
arXiv link: http://arxiv.org/abs/2106.12886v2
Variational Bayes in State Space Models: Inferential and Predictive Accuracy
applied variational Bayes methods across a range of state space models. The
results demonstrate that, in terms of accuracy on fixed parameters, there is a
clear hierarchy in terms of the methods, with approaches that do not
approximate the states yielding superior accuracy over methods that do. We also
document numerically that the inferential discrepancies between the various
methods often yield only small discrepancies in predictive accuracy over small
out-of-sample evaluation periods. Nevertheless, in certain settings, these
predictive discrepancies can become meaningful over a longer out-of-sample
period. This finding indicates that the invariance of predictive results to
inferential inaccuracy, which has been an oft-touted point made by
practitioners seeking to justify the use of variational inference, is not
ubiquitous and must be assessed on a case-by-case basis.
arXiv link: http://arxiv.org/abs/2106.12262v3
Discovering Heterogeneous Treatment Effects in Regression Discontinuity Designs
treatment effect heterogeneity in sharp and fuzzy regression discontinuity (RD)
designs. We develop a criterion for building an honest “regression
discontinuity tree”, where each leaf contains the RD estimate of a treatment
conditional on the values of some pre-treatment covariates. It is a priori
unknown which covariates are relevant for capturing treatment effect
heterogeneity, and it is the task of the algorithm to discover them, without
invalidating inference, while employing a nonparametric estimator with expected
MSE optimal bandwidth. We study the performance of the method through Monte
Carlo simulations and apply it to uncover various sources of heterogeneity in
the impact of attending a better secondary school in Romania.
arXiv link: http://arxiv.org/abs/2106.11640v4
On Testing Equal Conditional Predictive Ability Under Measurement Error
However, forecast comparisons are often based on mismeasured proxy variables
for the true target. We introduce the concept of exact robustness to
measurement error for loss functions and fully characterize this class of loss
functions as the Bregman class. For such exactly robust loss functions,
forecast loss differences are on average unaffected by the use of proxy
variables and, thus, inference on conditional predictive ability can be carried
out as usual. Moreover, we show that more precise proxies give predictive
ability tests higher power in discriminating between competing forecasts.
Simulations illustrate the different behavior of exactly robust and non-robust
loss functions. An empirical application to US GDP growth rates demonstrates
that it is easier to discriminate between forecasts issued at different
horizons if a better proxy for GDP growth is used.
arXiv link: http://arxiv.org/abs/2106.11104v1
On the Use of Two-Way Fixed Effects Models for Policy Evaluation During Pandemics
fixed effects (FE) models to assess the impact of mitigation policies on health
outcomes. Building on the SIRD model of disease transmission, I show that FE
models tend to be misspecified for three reasons. First, despite misleading
common trends in the pre-treatment period, the parallel trends assumption
generally does not hold. Second, heterogeneity in infection rates and infected
populations across regions cannot be accounted for by region-specific fixed
effects, nor by conditioning on observable time-varying confounders. Third,
epidemiological theory predicts heterogeneous treatment effects across regions
and over time. Via simulations, I find that the bias resulting from model
misspecification can be substantial, in magnitude and sometimes in sign.
Overall, my results caution against the use of FE models for mitigation policy
evaluation.
arXiv link: http://arxiv.org/abs/2106.10949v1
A Neural Frequency-Severity Model and Its Application to Insurance Claims
and severity models for predicting insurance claims. The proposed model is able
to capture nonlinear relationships in explanatory variables by characterizing
the logarithmic mean functions of frequency and severity distributions as
neural networks. Moreover, a potential dependence between the claim frequency
and severity can be incorporated. In particular, the paper provides analytic
formulas for mean and variance of the total claim cost, making our model ideal
for many applications such as pricing insurance contracts and the pure premium.
A simulation study demonstrates that our method successfully recovers nonlinear
features of explanatory variables as well as the dependency between frequency
and severity. Then, this paper uses a French auto insurance claim dataset to
illustrate that the proposed model is superior to the existing methods in
fitting and predicting the claim frequency, severity, and the total claim loss.
Numerical results indicate that the proposed model helps in maintaining the
competitiveness of an insurer by accurately predicting insurance claims and
avoiding adverse selection.
arXiv link: http://arxiv.org/abs/2106.10770v3
Semiparametric inference for partially linear regressions with Box-Cox transformation
Robinson (1988) with Box- Cox transformed dependent variable is studied.
Transformation regression models are widely used in applied econometrics to
avoid misspecification. In addition, a partially linear semiparametric model is
an intermediate strategy that tries to balance advantages and disadvantages of
a fully parametric model and nonparametric models. A combination of
transformation and partially linear semiparametric model is, thus, a natural
strategy. The model parameters are estimated by a semiparametric extension of
the so called smooth minimum distance (SmoothMD) approach proposed by Lavergne
and Patilea (2013). SmoothMD is suitable for models defined by conditional
moment conditions and allows the variance of the error terms to depend on the
covariates. In addition, here we allow for infinite-dimension nuisance
parameters. The asymptotic behavior of the new SmoothMD estimator is studied
under general conditions and new inference methods are proposed. A simulation
experiment illustrates the performance of the methods for finite samples.
arXiv link: http://arxiv.org/abs/2106.10723v1
Generalized Spatial and Spatiotemporal ARCH Models
conditional heteroscedasticity (GARCH) models are widely applied statistical
tools for modelling volatility clusters (i.e., periods of increased or
decreased risk). In contrast, it has not been considered to be of critical
importance until now to model spatial dependence in the conditional second
moments. Only a few models have been proposed for modelling local clusters of
increased risks. In this paper, we introduce a novel spatial GARCH process in a
unified spatial and spatiotemporal GARCH framework, which also covers all
previously proposed spatial ARCH models, exponential spatial GARCH, and
time-series GARCH models. In contrast to previous spatiotemporal and time
series models, this spatial GARCH allows for instantaneous spill-overs across
all spatial units. For this common modelling framework, estimators are derived
based on a non-linear least-squares approach. Eventually, the use of the model
is demonstrated by a Monte Carlo simulation study and by an empirical example
that focuses on real estate prices from 1995 to 2014 across the ZIP-Code areas
of Berlin. A spatial autoregressive model is applied to the data to illustrate
how locally varying model uncertainties (e.g., due to latent regressors) can be
captured by the spatial GARCH-type models.
arXiv link: http://arxiv.org/abs/2106.10477v1
Scalable Econometrics on Big Data -- The Logistic Regression on Spark
tools designed to handle huge amount of data efficiently are democratizing
rapidly. However, conventional statistical and econometric tools are still
lacking fluency when dealing with such large datasets. This paper dives into
econometrics on big datasets, specifically focusing on the logistic regression
on Spark. We review the robustness of the functions available in Spark to fit
logistic regression and introduce a package that we developed in PySpark which
returns the statistical summary of the logistic regression, necessary for
statistical inference.
arXiv link: http://arxiv.org/abs/2106.10341v1
Set coverage and robust policy
regions may cover the whole identified set with a prescribed probability, to
which we will refer as set coverage, or they may cover each of its point with a
prescribed probability, to which we will refer as point coverage. Since set
coverage implies point coverage, confidence regions satisfying point coverage
are generally preferred on the grounds that they may be more informative. The
object of this note is to describe a decision problem in which, contrary to
received wisdom, point coverage is clearly undesirable.
arXiv link: http://arxiv.org/abs/2106.09784v1
Economic Nowcasting with Long Short-Term Memory Artificial Neural Networks (LSTM)
in a variety of fields and disciplines in recent years. Their impact on
economics, however, has been comparatively muted. One type of ANN, the long
short-term memory network (LSTM), is particularly wellsuited to deal with
economic time-series. Here, the architecture's performance and characteristics
are evaluated in comparison with the dynamic factor model (DFM), currently a
popular choice in the field of economic nowcasting. LSTMs are found to produce
superior results to DFMs in the nowcasting of three separate variables; global
merchandise export values and volumes, and global services exports. Further
advantages include their ability to handle large numbers of input features in a
variety of time frequencies. A disadvantage is the inability to ascribe
contributions of input features to model outputs, common to all ANNs. In order
to facilitate continued applied research of the methodology by avoiding the
need for any knowledge of deep-learning libraries, an accompanying Python
library was developed using PyTorch, https://pypi.org/project/nowcast-lstm/.
arXiv link: http://arxiv.org/abs/2106.08901v1
Comparisons of Australian Mental Health Distributions
are obtained to assess how the mental health status of the population has
changed over time and to compare the mental health status of female/male and
indigenous/non-indigenous population subgroups. First- and second-order
stochastic dominance are used to compare distributions, with results presented
in terms of the posterior probability of dominance and the posterior
probability of no dominance. Our results suggest mental health has deteriorated
in recent years, that males mental health status is better than that of
females, and non-indigenous health status is better than that of the indigenous
population.
arXiv link: http://arxiv.org/abs/2106.08047v1
Dynamic Asymmetric Causality Tests with an Application
one variable on the current value of another one when all other pertinent
information is accounted for, is increasingly utilized in empirical research of
the time-series data in different scientific disciplines. A relatively recent
extension of this approach has been allowing for potential asymmetric impacts
since it is harmonious with the way reality operates in many cases according to
Hatemi-J (2012). The current paper maintains that it is also important to
account for the potential change in the parameters when asymmetric causation
tests are conducted, as there exists a number of reasons for changing the
potential causal connection between variables across time. The current paper
extends therefore the static asymmetric causality tests by making them dynamic
via the usage of subsamples. An application is also provided consistent with
measurable definitions of economic or financial bad as well as good news and
their potential interaction across time.
arXiv link: http://arxiv.org/abs/2106.07612v2
Sensitivity of LATE Estimates to Violations of the Monotonicity Assumption
treatment effect estimates to potential violations of the monotonicity
assumption of Imbens and Angrist (1994). We parameterize the degree to which
monotonicity is violated using two sensitivity parameters: the first one
determines the share of defiers in the population, and the second one measures
differences in the distributions of outcomes between compliers and defiers. For
each pair of values of these sensitivity parameters, we derive sharp bounds on
the outcome distributions of compliers in the first-order stochastic dominance
sense. We identify the robust region that is the set of all values of
sensitivity parameters for which a given empirical conclusion, e.g. that the
local average treatment effect is positive, is valid. Researchers can assess
the credibility of their conclusion by evaluating whether all the plausible
sensitivity parameters lie in the robust region. We obtain confidence sets for
the robust region through a bootstrap procedure and illustrate the sensitivity
analysis in an empirical application. We also extend this framework to analyze
treatment effects of the entire population.
arXiv link: http://arxiv.org/abs/2106.06421v1
An Interpretable Neural Network for Parameter Inference
been constrained by the lack of interpretability of model outcomes. This paper
proposes a generative neural network architecture - the parameter encoder
neural network (PENN) - capable of estimating local posterior distributions for
the parameters of a regression model. The parameters fully explain predictions
in terms of the inputs and permit visualization, interpretation and inference
in the presence of complex heterogeneous effects and feature dependencies. The
use of Bayesian inference techniques offers an intuitive mechanism to
regularize local parameter estimates towards a stable solution, and to reduce
noise-fitting in settings of limited data availability. The proposed neural
network is particularly well-suited to applications in economics and finance,
where parameter inference plays an important role. An application to an asset
pricing problem demonstrates how the PENN can be used to explore nonlinear risk
dynamics in financial markets, and to compare empirical nonlinear effects to
behavior posited by financial theory.
arXiv link: http://arxiv.org/abs/2106.05536v1
Panel Data with Unknown Clusters
inference methods that allow for dependence within observations. However, they
require researchers to know the cluster structure ex ante. We propose a
procedure to help researchers discover clusters in panel data. Our method is
based on thresholding an estimated long-run variance-covariance matrix and
requires the panel to be large in the time dimension, but imposes no lower
bound on the number of units. We show that our procedure recovers the true
clusters with high probability with no assumptions on the cluster structure.
The estimated clusters are independently of interest, but they can also be used
in the approximate randomization tests or with conventional cluster-robust
covariance estimators. The resulting procedures control size and have good
power.
arXiv link: http://arxiv.org/abs/2106.05503v4
Estimation of Optimal Dynamic Treatment Assignment Rules under Policy Constraints
individuals receive sequential interventions over multiple stages. We study
estimation of an optimal dynamic treatment regime that guides the optimal
treatment assignment for each individual at each stage based on their history.
We propose an empirical welfare maximization approach in this dynamic
framework, which estimates the optimal dynamic treatment regime using data from
an experimental or quasi-experimental study while satisfying exogenous
constraints on policies. The paper proposes two estimation methods: one solves
the treatment assignment problem sequentially through backward induction, and
the other solves the entire problem simultaneously across all stages. We
establish finite-sample upper bounds on worst-case average welfare regrets for
these methods and show their optimal $n^{-1/2}$ convergence rates. We also
modify the simultaneous estimation method to accommodate intertemporal
budget/capacity constraints.
arXiv link: http://arxiv.org/abs/2106.05031v5
Contamination Bias in Linear Regressions
flexible enough to purge omitted variable bias. We show that these regressions
generally fail to estimate convex averages of heterogeneous treatment effects
-- instead, estimates of each treatment's effect are contaminated by non-convex
averages of the effects of other treatments. We discuss three estimation
approaches that avoid such contamination bias, including the targeting of
easiest-to-estimate weighted average effects. A re-analysis of nine empirical
applications finds economically and statistically meaningful contamination bias
in observational studies; contamination bias in experimental studies is more
limited due to smaller variability in propensity scores.
arXiv link: http://arxiv.org/abs/2106.05024v5
Automatically Differentiable Random Coefficient Logistic Demand Estimation
as an automatically differentiable moment function, including the incorporation
of numerical safeguards proposed in the literature. This allows gradient-based
frequentist and quasi-Bayesian estimation using the Continuously Updating
Estimator (CUE). Drawing from the machine learning literature, we outline
hitherto under-utilized best practices in both frequentist and Bayesian
estimation techniques. Our Monte Carlo experiments compare the performance of
CUE, 2S-GMM, and LTE estimation. Preliminary findings indicate that the CUE
estimated using LTE and frequentist optimization has a lower bias but higher
MAE compared to the traditional 2-Stage GMM (2S-GMM) approach. We also find
that using credible intervals from MCMC sampling for the non-linear parameters
together with frequentist analytical standard errors for the concentrated out
linear parameters provides empirical coverage closest to the nominal level. The
accompanying admest Python package provides a platform for replication and
extensibility.
arXiv link: http://arxiv.org/abs/2106.04636v1
Testing Monotonicity of Mean Potential Outcomes in a Continuous Treatment with High-Dimensional Data
literature also considers continuously distributed treatments. We propose a
Cram\'{e}r-von Mises-type test for testing whether the mean potential outcome
given a specific treatment has a weakly monotonic relationship with the
treatment dose under a weak unconfoundedness assumption. In a nonseparable
structural model, applying our method amounts to testing monotonicity of the
average structural function in the continuous treatment of interest. To
flexibly control for a possibly high-dimensional set of covariates in our
testing approach, we propose a double debiased machine learning estimator that
accounts for covariates in a data-driven way. We show that the proposed test
controls asymptotic size and is consistent against any fixed alternative. These
theoretical findings are supported by the Monte-Carlo simulations. As an
empirical illustration, we apply our test to the Job Corps study and reject a
weakly negative relationship between the treatment (hours in academic and
vocational training) and labor market performance among relatively low
treatment values.
arXiv link: http://arxiv.org/abs/2106.04237v3
Modeling Portfolios with Leptokurtic and Dependent Risk Factors
distributed as Gram-Charlier (GC) expansions of the Gaussian law, has been
conceived. GC expansions prove effective when dealing with moderately
leptokurtic data. In order to cover the case of possibly severe leptokurtosis,
the so-called GC-like expansions have been devised by reshaping parent
leptokurtic distributions by means of orthogonal polynomials specific to them.
In this paper, we focus on the hyperbolic-secant (HS) law as parent
distribution whose GC-like expansions fit with kurtosis levels up to 19.4. A
portfolio distribution has been obtained with risk factors modeled as GClike
expansions of the HS law which duly account for excess kurtosis. Empirical
evidence of the workings of the approach dealt with in the paper is included.
arXiv link: http://arxiv.org/abs/2106.04218v1
Superconsistency of Tests in High Dimensions
the global null hypothesis of no effect are routinely applied in practice
before more specialized analysis is carried out. Although a plethora of
aggregate tests is available, each test has its strengths but also its blind
spots. In a Gaussian sequence model, we study whether it is possible to obtain
a test with substantially better consistency properties than the likelihood
ratio (i.e., Euclidean norm based) test. We establish an impossibility result,
showing that in the high-dimensional framework we consider, the set of
alternatives for which a test may improve upon the likelihood ratio test --
that is, its superconsistency points -- is always asymptotically negligible in
a relative volume sense.
arXiv link: http://arxiv.org/abs/2106.03700v3
On the "mementum" of Meme Stocks
evidence that these stocks display common stylized facts on the dynamics of
price, trading volume, and social media activity. Using a regime-switching
cointegration model, we identify the meme stock "mementum" which exhibits a
different characterization with respect to other stocks with high volumes of
activity (persistent and not) on social media. Understanding these properties
helps the investors and market authorities in their decision.
arXiv link: http://arxiv.org/abs/2106.03691v1
Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling
estimated by the Polyak-Ruppert averaging procedure of stochastic gradient
descent (SGD) algorithms. We leverage insights from time series regression in
econometrics and construct asymptotically pivotal statistics via random
scaling. Our approach is fully operational with online data and is rigorously
underpinned by a functional central limit theorem. Our proposed inference
method has a couple of key advantages over the existing methods. First, the
test statistic is computed in an online fashion with only SGD iterates and the
critical values can be obtained without any resampling methods, thereby
allowing for efficient implementation suitable for massive online data. Second,
there is no need to estimate the asymptotic variance and our inference method
is shown to be robust to changes in the tuning parameters for SGD algorithms in
simulation experiments with synthetic data.
arXiv link: http://arxiv.org/abs/2106.03156v3
Linear Rescaling to Accurately Interpret Logarithms
interprets a linear change of \(p\) in \(\ln(X)\) as a \((1+p)\) proportional
change in \(X\), which is only accurate for small values of \(p\). I suggest
base-\((1+p)\) logarithms, where \(p\) is chosen ahead of time. A one-unit
change in \(\log_{1+p}(X)\) is exactly equivalent to a \((1+p)\) proportional
change in \(X\). This avoids an approximation applied too broadly, makes exact
interpretation easier and less error-prone, improves approximation quality when
approximations are used, makes the change of interest a one-log-unit change
like other regression variables, and reduces error from the use of
\(\log(1+X)\).
arXiv link: http://arxiv.org/abs/2106.03070v3
Truthful Self-Play
state representation without any supervision. Evolutionary frameworks such as
self-play converge to bad local optima in case of multi-agent reinforcement
learning in non-cooperative partially observable environments with
communication due to information asymmetry. Our proposed framework is a simple
modification of self-play inspired by mechanism design, also known as {\em
reverse game theory}, to elicit truthful signals and make the agents
cooperative. The key idea is to add imaginary rewards using the peer prediction
method, i.e., a mechanism for evaluating the validity of information exchanged
between agents in a decentralized environment. Numerical experiments with
predator prey, traffic junction and StarCraft tasks demonstrate that the
state-of-the-art performance of our framework.
arXiv link: http://arxiv.org/abs/2106.03007v6
Learning Treatment Effects in Panels with General Intervention Patterns
question. The following is a fundamental version of this problem: Let $M^*$ be
a low rank matrix and $E$ be a zero-mean noise matrix. For a `treatment' matrix
$Z$ with entries in $\{0,1\}$ we observe the matrix $O$ with entries $O_{ij} :=
M^*_{ij} + E_{ij} + T_{ij} Z_{ij}$ where $T_{ij} $ are
unknown, heterogenous treatment effects. The problem requires we estimate the
average treatment effect $\tau^* := \sum_{ij} T_{ij} Z_{ij} /
\sum_{ij} Z_{ij}$. The synthetic control paradigm provides an approach to
estimating $\tau^*$ when $Z$ places support on a single row. This paper extends
that framework to allow rate-optimal recovery of $\tau^*$ for general $Z$, thus
broadly expanding its applicability. Our guarantees are the first of their type
in this general setting. Computational experiments on synthetic and real-world
data show a substantial advantage over competing estimators.
arXiv link: http://arxiv.org/abs/2106.02780v2
Change-Point Analysis of Time Series with Evolutionary Spectra
stationary time series. We focus on series with a bounded spectral density that
change smoothly under the null hypothesis but exhibits change-points or becomes
less smooth under the alternative. We address two local problems. The first is
the detection of discontinuities (or breaks) in the spectrum at unknown dates
and frequencies. The second involves abrupt yet continuous changes in the
spectrum over a short time period at an unknown frequency without signifying a
break. Both problems can be cast into changes in the degree of smoothness of
the spectral density over time. We consider estimation and minimax-optimal
testing. We determine the optimal rate for the minimax distinguishable
boundary, i.e., the minimum break magnitude such that we are able to uniformly
control type I and type II errors. We propose a novel procedure for the
estimation of the change-points based on a wild sequential top-down algorithm
and show its consistency under shrinking shifts and possibly growing number of
change-points. Our method can be used across many fields and a companion
program is made available in popular software packages.
arXiv link: http://arxiv.org/abs/2106.02031v3
Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits
example using contextual bandits. Historical data of this type can be used to
evaluate other treatment assignment policies to guide future innovation or
experiments. However, policy evaluation is challenging if the target policy
differs from the one used to collect data, and popular estimators, including
doubly robust (DR) estimators, can be plagued by bias, excessive variance, or
both. In particular, when the pattern of treatment assignment in the collected
data looks little like the pattern generated by the policy to be evaluated, the
importance weights used in DR estimators explode, leading to excessive
variance.
In this paper, we improve the DR estimator by adaptively weighting
observations to control its variance. We show that a t-statistic based on our
improved estimator is asymptotically normal under certain conditions, allowing
us to form confidence intervals and test hypotheses. Using synthetic data and
public benchmarks, we provide empirical evidence for our estimator's improved
accuracy and inferential properties relative to existing alternatives.
arXiv link: http://arxiv.org/abs/2106.02029v2
Retrospective causal inference via matrix completion, with an evaluation of the effect of European integration on cross-border employment
settings with later-treated and always-treated units, but no never-treated
units. We use the observed outcomes to impute the counterfactual outcomes of
the later-treated using a matrix completion estimator. We propose a novel
propensity-score and elapsed-time weighting of the estimator's objective
function to correct for differences in the observed covariate and unobserved
fixed effects distributions, and elapsed time since treatment between groups.
Our methodology is motivated by studying the effect of two milestones of
European integration -- the Free Movement of persons and the Schengen Agreement
-- on the share of cross-border workers in sending border regions. We apply the
proposed method to the European Labour Force Survey (ELFS) data and provide
evidence that opening the border almost doubled the probability of working
beyond the border in Eastern European regions.
arXiv link: http://arxiv.org/abs/2106.00788v1
A Simple and General Debiased Machine Learning Theorem with Finite Sample Guarantees
sample splitting to calculate confidence intervals for functionals, i.e. scalar
summaries, of machine learning algorithms. For example, an analyst may desire
the confidence interval for a treatment effect estimated with a neural network.
We provide a nonasymptotic debiased machine learning theorem that encompasses
any global or local functional of any machine learning algorithm that satisfies
a few simple, interpretable conditions. Formally, we prove consistency,
Gaussian approximation, and semiparametric efficiency by finite sample
arguments. The rate of convergence is $n^{-1/2}$ for global functionals, and it
degrades gracefully for local functionals. Our results culminate in a simple
set of conditions that an analyst can use to translate modern learning theory
rates into traditional statistical inference. The conditions reveal a general
double robustness property for ill posed inverse problems.
arXiv link: http://arxiv.org/abs/2105.15197v3
Regression-Adjusted Estimation of Quantile Treatment Effects under Covariate-Adaptive Randomizations
usually contain extra covariates in addition to the strata indicators. We
propose to incorporate these additional covariates via auxiliary regressions in
the estimation and inference of unconditional quantile treatment effects (QTEs)
under CARs. We establish the consistency and limit distribution of the
regression-adjusted QTE estimator and prove that the use of multiplier
bootstrap inference is non-conservative under CARs. The auxiliary regression
may be estimated parametrically, nonparametrically, or via regularization when
the data are high-dimensional. Even when the auxiliary regression is
misspecified, the proposed bootstrap inferential procedure still achieves the
nominal rejection probability in the limit under the null. When the auxiliary
regression is correctly specified, the regression-adjusted estimator achieves
the minimum asymptotic variance. We also discuss forms of adjustments that can
improve the efficiency of the QTE estimators. The finite sample performance of
the new estimation and inferential methods is studied in simulations and an
empirical application to a well-known dataset concerned with expanding access
to basic bank accounts on savings is reported.
arXiv link: http://arxiv.org/abs/2105.14752v4
Asset volatility forecasting:The optimal decay parameter in the EWMA model
competitive volatility estimator, where its main strength relies on computation
simplicity, especially in a multi-asset scenario, due to dependency only on the
decay parameter, $\lambda$. But, what is the best election for $\lambda$ in the
EMWA volatility model? Through a large time-series data set of historical
returns of the top US large-cap companies; we test empirically the forecasting
performance of the EWMA approach, under different time horizons and varying the
decay parameter. Using a rolling window scheme, the out-of-sample performance
of the variance-covariance matrix is computed following two approaches. First,
if we look for a fixed decay parameter for the full sample, the results are in
agreement with the RiskMetrics suggestion for 1-month forecasting. In addition,
we provide the full-sample optimal decay parameter for the weekly and bi-weekly
forecasting horizon cases, confirming two facts: i) the optimal value is as a
function of the forecasting horizon, and ii) for lower forecasting horizons the
short-term memory gains importance. In a second way, we also evaluate the
forecasting performance of EWMA, but this time using the optimal time-varying
decay parameter which minimizes the in-sample variance-covariance estimator,
arriving at better accuracy than the use of a fixed-full-sample optimal
parameter.
arXiv link: http://arxiv.org/abs/2105.14382v1
Crime and Mismeasured Punishment: Marginal Treatment Effect with Misclassification
is misclassified. I explore two restrictions, allowing for dependence between
the instrument and the misclassification decision. If the signs of the
derivatives of the propensity scores are equal, I identify the MTE sign. If
those derivatives are similar, I bound the MTE. To illustrate, I analyze the
impact of alternative sentences (fines and community service v. no punishment)
on recidivism in Brazil, where Appeals processes generate misclassification.
The estimated misclassification bias may be as large as 10% of the largest
possible MTE, and the bounds contain the correctly estimated MTE.
arXiv link: http://arxiv.org/abs/2106.00536v7
Specification tests for GARCH processes
variance function in GARCH models when the true parameter may lie on the
boundary of the parameter space. The test statistics considered are of
Kolmogorov-Smirnov and Cram\'{e}r-von Mises type, and are based on a certain
empirical process marked by centered squared residuals. The limiting
distributions of the test statistics are not free from (unknown) nuisance
parameters, and hence critical values cannot be tabulated. A novel bootstrap
procedure is proposed to implement the tests; it is shown to be asymptotically
valid under general conditions, irrespective of the presence of nuisance
parameters on the boundary. The proposed bootstrap approach is based on
shrinking of the parameter estimates used to generate the bootstrap sample
toward the boundary of the parameter space at a proper rate. It is simple to
implement and fast in applications, as the associated test statistics have
simple closed form expressions. A simulation study demonstrates that the new
tests: (i) have excellent finite sample behavior in terms of empirical
rejection probabilities under the null as well as under the alternative; (ii)
provide a useful complement to existing procedures based on Ljung-Box type
approaches. Two data examples are considered to illustrate the tests.
arXiv link: http://arxiv.org/abs/2105.14081v1
Identification and Estimation of Partial Effects in Nonlinear Semiparametric Panel Models
with unrestricted unobserved individual heterogeneity, such as a binary
response panel model with fixed effects and logistic errors as a special case.
This lack of point identification occurs despite the identification of these
models' common coefficients. We provide a unified framework to establish the
point identification of various partial effects in a wide class of nonlinear
semiparametric models under an index sufficiency assumption on the unobserved
heterogeneity, even when the error distribution is unspecified and
non-stationary. This assumption does not impose parametric restrictions on the
unobserved heterogeneity and idiosyncratic errors. We also present partial
identification results when the support condition fails. We then propose
three-step semiparametric estimators for APEs, average structural functions,
and average marginal effects, and show their consistency and asymptotic
normality. Finally, we illustrate our approach in a study of determinants of
married women's labor supply.
arXiv link: http://arxiv.org/abs/2105.12891v6
A Structural Model of Business Card Exchange Networks
diffusion and new business creation. To understand the determinants of how
these networks are formed in the first place, we analyze a unique dataset of
business cards exchanges among a sample of over 240,000 users of the
multi-platform contact management and professional social networking tool for
individuals Eight. We develop a structural model of network formation with
strategic interactions, and we estimate users' payoffs that depend on the
composition of business relationships, as well as indirect business
interactions. We allow heterogeneity of users in both observable and
unobservable characteristics to affect how relationships form and are
maintained. The model's stationary equilibrium delivers a likelihood that is a
mixture of exponential random graph models that we can characterize in
closed-form. We overcome several econometric and computational challenges in
estimation, by exploiting a two-step estimation procedure, variational
approximations and minorization-maximization methods. Our algorithm is
scalable, highly parallelizable and makes efficient use of computer memory to
allow estimation in massive networks. We show that users payoffs display
homophily in several dimensions, e.g. location; furthermore, users unobservable
characteristics also display homophily.
arXiv link: http://arxiv.org/abs/2105.12704v3
A data-driven approach to beating SAA out-of-sample
sometimes have a higher out-of-sample expected reward than the Sample Average
Approximation (SAA), there is no guarantee. In this paper, we introduce a class
of Distributionally Optimistic Optimization (DOO) models, and show that it is
always possible to “beat" SAA out-of-sample if we consider not just worst-case
(DRO) models but also best-case (DOO) ones. We also show, however, that this
comes at a cost: Optimistic solutions are more sensitive to model error than
either worst-case or SAA optimizers, and hence are less robust and calibrating
the worst- or best-case model to outperform SAA may be difficult when data is
limited.
arXiv link: http://arxiv.org/abs/2105.12342v3
Measuring Financial Advice: aligning client elicited and revealed risk
determine a suitable portfolio of assets that will allow clients to reach their
investment objectives. Financial institutions assign risk ratings to each
security they offer, and those ratings are used to guide clients and advisors
to choose an investment portfolio risk that suits their stated risk tolerance.
This paper compares client Know Your Client (KYC) profile risk allocations to
their investment portfolio risk selections using a value-at-risk discrepancy
methodology. Value-at-risk is used to measure elicited and revealed risk to
show whether clients are over-risked or under-risked, changes in KYC risk lead
to changes in portfolio configuration, and cash flow affects a client's
portfolio risk. We demonstrate the effectiveness of value-at-risk at measuring
clients' elicited and revealed risk on a dataset provided by a private Canadian
financial dealership of over $50,000$ accounts for over $27,000$ clients and
$300$ advisors. By measuring both elicited and revealed risk using the same
measure, we can determine how well a client's portfolio aligns with their
stated goals. We believe that using value-at-risk to measure client risk
provides valuable insight to advisors to ensure that their practice is KYC
compliant, to better tailor their client portfolios to stated goals,
communicate advice to clients to either align their portfolios to stated goals
or refresh their goals, and to monitor changes to the clients' risk positions
across their practice.
arXiv link: http://arxiv.org/abs/2105.11892v1
Vector autoregression models with skewness and heavy tails
during recessions and crises can hardly be explained by a Gaussian structural
shock. There is evidence that the distribution of macroeconomic variables is
skewed and heavy tailed. In this paper, we contribute to the literature by
extending a vector autoregression (VAR) model to account for a more realistic
assumption of the multivariate distribution of the macroeconomic variables. We
propose a general class of generalized hyperbolic skew Student's t distribution
with stochastic volatility for the error term in the VAR model that allows us
to take into account skewness and heavy tails. Tools for Bayesian inference and
model selection using a Gibbs sampler are provided. In an empirical study, we
present evidence of skewness and heavy tails for monthly macroeconomic
variables. The analysis also gives a clear message that skewness should be
taken into account for better predictions during recessions and crises.
arXiv link: http://arxiv.org/abs/2105.11182v1
Inference for multi-valued heterogeneous treatment effects when the number of treated units is small
treatment effects in a multi-valued treatment framework where the number of
units in the treatment arms can be small and do not grow with the sample size.
We accomplish this by casting the model as a semi-/non-parametric conditional
quantile model and using known finite sample results about the law of the
indicator function that defines the conditional quantile. Our framework allows
for structural functions that are non-additively separable, with flexible
functional forms and heteroskedasticy in the residuals, and it also encompasses
commonly used designs like difference in difference. We study the finite sample
behavior of our test in a Monte Carlo study and we also apply our results to
assessing the effect of weather events on GDP growth.
arXiv link: http://arxiv.org/abs/2105.10965v1
Identification and Estimation of a Partially Linear Regression Model using Network Data: Inference and an Application to Network Peer Effects
estimators of Auerbach (2019a). Section 1 contains results about the large
sample properties of the estimators from Section 2 of Auerbach (2019a). Section
2 considers some extensions to the model. Section 3 provides an application to
estimating network peer effects. Section 4 shows the results from some
simulations.
arXiv link: http://arxiv.org/abs/2105.10002v1
Two Sample Unconditional Quantile Effect
effects (UQE) in a data combination model. The UQE measures the effect of a
marginal counterfactual change in the unconditional distribution of a covariate
on quantiles of the unconditional distribution of a target outcome. Under rank
similarity and conditional independence assumptions, we provide a set of
identification results for UQEs when the target covariate is continuously
distributed and when it is discrete, respectively. Based on these
identification results, we propose semiparametric estimators and establish
their large sample properties under primitive conditions. Applying our method
to a variant of Mincer's earnings function, we study the counterfactual
quantile effect of actual work experience on income.
arXiv link: http://arxiv.org/abs/2105.09445v1
Multiply Robust Causal Mediation Analysis with Continuous Treatments
causal effects of a treatment or exposure on an outcome of interest. Mediation
analysis offers a rigorous framework for identifying and estimating these
causal effects. For binary treatments, efficient estimators for the direct and
indirect effects are presented by Tchetgen Tchetgen and Shpitser (2012) based
on the influence function of the parameter of interest. These estimators
possess desirable properties such as multiple-robustness and asymptotic
normality while allowing for slower than root-n rates of convergence for the
nuisance parameters. However, in settings involving continuous treatments,
these influence function-based estimators are not readily applicable without
making strong parametric assumptions. In this work, utilizing a
kernel-smoothing approach, we propose an estimator suitable for settings with
continuous treatments inspired by the influence function-based estimator of
Tchetgen Tchetgen and Shpitser (2012). Our proposed approach employs
cross-fitting, relaxing the smoothness requirements on the nuisance functions
and allowing them to be estimated at slower rates than the target parameter.
Additionally, similar to influence function-based estimators, our proposed
estimator is multiply robust and asymptotically normal, allowing for inference
in settings where parametric assumptions may not be justified.
arXiv link: http://arxiv.org/abs/2105.09254v3
Trading-off Bias and Variance in Stratified Experiments and in Matching Studies, Under a Boundedness Condition on the Magnitude of the Treatment Effect
composed of $S$ groups or units, when one has unbiased estimators of each
group's conditional average treatment effect (CATE). These conditions are met
in stratified experiments and in matching studies. I assume that each CATE is
bounded in absolute value by $B$ standard deviations of the outcome, for some
known $B$. This restriction may be appealing: outcomes are often standardized
in applied work, so researchers can use available literature to determine a
plausible value for $B$. I derive, across all linear combinations of the CATEs'
estimators, the minimax estimator of the ATE. In two stratified experiments, my
estimator has twice lower worst-case mean-squared-error than the commonly-used
strata-fixed effects estimator. In a matching study with limited overlap, my
estimator achieves 56% of the precision gains of a commonly-used trimming
estimator, and has an 11 times smaller worst-case mean-squared-error.
arXiv link: http://arxiv.org/abs/2105.08766v6
Incorporating Social Welfare in Program-Evaluation and Treatment Choice
of outcome-distributions as `social welfare' and ignores program-impacts on
unobserved utilities. We show how to incorporate aggregate utility within
econometric program-evaluation and optimal treatment-targeting for a
heterogenous population. In the practically important setting of
discrete-choice, under unrestricted preference-heterogeneity and
income-effects, the indirect-utility distribution becomes a closed-form
functional of average demand. This enables nonparametric cost-benefit analysis
of policy-interventions and their optimal targeting based on planners'
redistributional preferences. For ordered/continuous choice,
utility-distributions can be bounded. Our methods are illustrated with Indian
survey-data on private-tuition, where income-paths of usage-maximizing
subsidies differ significantly from welfare-maximizing ones.
arXiv link: http://arxiv.org/abs/2105.08689v2
Identification robust inference for moments based analysis of linear dynamic panel data models
non-linear moment conditions, as proposed by Arellano and Bond (1991), Arellano
and Bover (1995), Blundell and Bond (1998) and Ahn and Schmidt (1995) for the
linear dynamic panel data model, do not separately identify the autoregressive
parameter when its true value is close to one and the variance of the initial
observations is large. We prove that combinations of these moment conditions,
however, do so when there are more than three time series observations. This
identification then solely results from a set of, so-called, robust moment
conditions. These robust moments are spanned by the combined difference, level
and non-linear moment conditions and only depend on differenced data. We show
that, when only the robust moments contain identifying information on the
autoregressive parameter, the discriminatory power of the Kleibergen (2005) LM
test using the combined moments is identical to the largest rejection
frequencies that can be obtained from solely using the robust moments. This
shows that the KLM test implicitly uses the robust moments when only they
contain information on the autoregressive parameter.
arXiv link: http://arxiv.org/abs/2105.08346v1
Double robust inference for continuous updating GMM
hypotheses specified on the pseudo-true value of the structural parameters in
the generalized method of moments. The pseudo-true value is defined as the
minimizer of the population continuous updating objective function and equals
the true value of the structural parameter in the absence of
misspecification.hhy96 The (bounding) chi-squared limiting
distribution of the DRLM statistic is robust to both misspecification and weak
identification of the structural parameters, hence its name. To emphasize its
importance for applied work, we use the DRLM test to analyze the return on
education, which is often perceived to be weakly identified, using data from
Card (1995) where misspecification occurs in case of treatment heterogeneity;
and to analyze the risk premia associated with risk factors proposed in Adrian
et al. (2014) and He et al. (2017), where both misspecification and weak
identification need to be addressed.
arXiv link: http://arxiv.org/abs/2105.08345v1
Choice Set Confounding in Discrete Choice
discrete choice models from data of selections (choices) made by individuals
from a discrete set of alternatives (the choice set). While there are many
models for individual preferences, existing learning methods overlook how
choice set assignment affects the data. Often, the choice set itself is
influenced by an individual's preferences; for instance, a consumer choosing a
product from an online retailer is often presented with options from a
recommender system that depend on information about the consumer's preferences.
Ignoring these assignment mechanisms can mislead choice models into making
biased estimates of preferences, a phenomenon that we call choice set
confounding; we demonstrate the presence of such confounding in widely-used
choice datasets.
To address this issue, we adapt methods from causal inference to the discrete
choice setting. We use covariates of the chooser for inverse probability
weighting and/or regression controls, accurately recovering individual
preferences in the presence of choice set confounding under certain
assumptions. When such covariates are unavailable or inadequate, we develop
methods that take advantage of structured choice set assignment to improve
prediction. We demonstrate the effectiveness of our methods on real-world
choice data, showing, for example, that accounting for choice set confounding
makes choices observed in hotel booking and commute transportation more
consistent with rational utility-maximization.
arXiv link: http://arxiv.org/abs/2105.07959v2
Putting a Compass on the Map of Elections
visualizes a set of 800 elections generated from various statistical cultures.
While similar elections are grouped together on this map, there is no obvious
interpretation of the elections' positions. We provide such an interpretation
by introducing four canonical "extreme" elections, acting as a compass on the
map. We use them to analyze both a dataset provided by Szufa et al. and a
number of real-life elections. In effect, we find a new variant of the Mallows
model and show that it captures real-life scenarios particularly well.
arXiv link: http://arxiv.org/abs/2105.07815v1
Using social network and semantic analysis to analyze online travel forums and forecast tourism demand
and companies operating in the tourism industry. In this research, we applied
methods and tools of social network and semantic analysis to study
user-generated content retrieved from online communities which interacted on
the TripAdvisor travel forum. We analyzed the forums of 7 major European
capital cities, over a period of 10 years, collecting more than 2,660,000
posts, written by about 147,000 users. We present a new methodology of analysis
of tourism-related big data and a set of variables which could be integrated
into traditional forecasting models. We implemented Factor Augmented
Autoregressive and Bridge models with social network and semantic variables
which often led to a better forecasting performance than univariate models and
models based on Google Trend data. Forum language complexity and the
centralization of the communication network, i.e. the presence of eminent
contributors, were the variables that contributed more to the forecasting of
international airport arrivals.
arXiv link: http://arxiv.org/abs/2105.07727v1
Classifying variety of customer's online engagement for churn prediction with mixed-penalty logistic regression
decision-making tools for preventing customer attrition (churn) in customer
relationship management (CRM). Focusing on a CRM dataset with several different
categories of factors that impact customer heterogeneity (i.e., usage of
self-care service channels, duration of service, and responsiveness to
marketing actions), we provide new predictive analytics of customer churn rate
based on a machine learning method that enhances the classification of logistic
regression by adding a mixed penalty term. The proposed penalized logistic
regression can prevent overfitting when dealing with big data and minimize the
loss function when balancing the cost from the median (absolute value) and mean
(squared value) regularization. We show the analytical properties of the
proposed method and its computational advantage in this research. In addition,
we investigate the performance of the proposed method with a CRM data set (that
has a large number of features) under different settings by efficiently
eliminating the disturbance of (1) least important features and (2) sensitivity
from the minority (churn) class. Our empirical results confirm the expected
performance of the proposed method in full compliance with the common
classification criteria (i.e., accuracy, precision, and recall) for evaluating
machine learning methods.
arXiv link: http://arxiv.org/abs/2105.07671v2
Uniform Inference on High-dimensional Spatial Panel Networks
estimator, regularized for dimension reduction and subsequently debiased to
correct for shrinkage bias (referred to as a debiased-regularized estimator),
for inference on large-scale spatial panel networks. In particular, the network
structure, which incorporates a flexible sparse deviation that can be regarded
either as a latent component or as a misspecification of a predetermined
adjacency matrix, is estimated using a debiased machine learning approach. The
theoretical analysis establishes the consistency and asymptotic normality of
our proposed estimator, taking into account general temporal and spatial
dependencies inherent in the data-generating processes. A primary contribution
of our study is the development of a uniform inference theory, which enables
hypothesis testing on the parameters of interest, including zero or non-zero
elements in the network structure. Additionally, the asymptotic properties of
the estimator are derived for both linear and nonlinear moments. Simulations
demonstrate the superior performance of our proposed approach. Finally, we
apply our methodology to investigate the spatial network effects of stock
returns.
arXiv link: http://arxiv.org/abs/2105.07424v5
Cohort Shapley value for algorithmic fairness
in game theory that does not use any unobserved and potentially impossible
feature combinations. We use it to evaluate algorithmic fairness, using the
well known COMPAS recidivism data as our example. This approach allows one to
identify for each individual in a data set the extent to which they were
adversely or beneficially affected by their value of a protected attribute such
as their race. The method can do this even if race was not one of the original
predictors and even if it does not have access to a proprietary algorithm that
has made the predictions. The grounding in game theory lets us define aggregate
variable importance for a data set consistently with its per subject
definitions. We can investigate variable importance for multiple quantities of
interest in the fairness literature including false positive predictions.
arXiv link: http://arxiv.org/abs/2105.07168v1
Policy Evaluation during a Pandemic
response to the Covid-19 pandemic. Evaluating the effects of these policies,
both on the number of Covid-19 cases as well as on other economic outcomes is a
key ingredient for policymakers to be able to determine which policies are most
effective as well as the relative costs and benefits of particular policies. In
this paper, we consider the relative merits of common identification strategies
that exploit variation in the timing of policies across different locations by
checking whether the identification strategies are compatible with leading
epidemic models in the epidemiology literature. We argue that unconfoundedness
type approaches, that condition on the pre-treatment "state" of the pandemic,
are likely to be more useful for evaluating policies than
difference-in-differences type approaches due to the highly nonlinear spread of
cases during a pandemic. For difference-in-differences, we further show that a
version of this problem continues to exist even when one is interested in
understanding the effect of a policy on other economic outcomes when those
outcomes also depend on the number of Covid-19 cases. We propose alternative
approaches that are able to circumvent these issues. We apply our proposed
approach to study the effect of state level shelter-in-place orders early in
the pandemic.
arXiv link: http://arxiv.org/abs/2105.06927v2
Characterization of the probability and information entropy of a process with an exponentially increasing sample space and its application to the Broad Money Supply
p(x0) = 1. Consider x0 to have a discrete uniform distribution over the integer
interval [1, s], where the size of the sample space (s) = 1, in the initial
state, such that p(x0) = 1. What is the probability of x0 and the associated
information entropy (H), as s increases exponentially? If the sample space
expansion occurs at an exponential rate (rate constant = lambda) with time (t)
and applying time scaling, such that T = lambda x t, gives: p(x0|T)=exp(-T) and
H(T)=T. The characterization has also been extended to include exponential
expansion by means of simultaneous, independent processes, as well as the more
general multi-exponential case. The methodology was applied to the expansion of
the broad money supply of US$ over the period 2001-2019, as a real-world
example. At any given time, the information entropy is related to the rate at
which the sample space is expanding. In the context of the expansion of the
broad money supply, the information entropy could be considered to be related
to the "velocity" of the expansion of the money supply.
arXiv link: http://arxiv.org/abs/2105.14193v1
Dynamic Portfolio Allocation in High Dimensions using Sparse Risk Factors
predictions up to high-dimensions using a dynamic risk factor model. Our
approach increases parsimony via time-varying sparsity on factor loadings and
is able to sequentially learn the use of constant or time-varying parameters
and volatilities. We show in a dynamic portfolio allocation problem with 452
stocks from the S&P 500 index that our dynamic risk factor model is able to
produce more stable and sparse predictions, achieving not just considerable
portfolio performance improvements but also higher utility gains for the
mean-variance investor compared to the traditional Wishart benchmark and the
passive investment on the market index.
arXiv link: http://arxiv.org/abs/2105.06584v2
Generalized Autoregressive Moving Average Models with GARCH Errors
series is the generalized autoregressive model average models (GARMA), which
specifies an ARMA structure for the conditional mean process of the underlying
time series. However, in many applications one often encounters conditional
heteroskedasticity. In this paper we propose a new class of models, referred to
as GARMA-GARCH models, that jointly specify both the conditional mean and
conditional variance processes of a general non-Gaussian time series. Under the
general modeling framework, we propose three specific models, as examples, for
proportional time series, nonnegative time series, and skewed and heavy-tailed
financial time series. Maximum likelihood estimator (MLE) and quasi Gaussian
MLE (GMLE) are used to estimate the parameters. Simulation studies and three
applications are used to demonstrate the properties of the models and the
estimation procedures.
arXiv link: http://arxiv.org/abs/2105.05532v1
Robust Inference on Income Inequality: $t$-Statistic Based Approaches
in economics and finance often face the difficulty that the data is
heterogeneous, heavy-tailed or correlated in some unknown fashion. The paper
focuses on applications of the recently developed t-statistic based
robust inference approaches in the analysis of inequality measures and their
comparisons under the above problems. Following the approaches, in particular,
a robust large sample test on equality of two parameters of interest (e.g., a
test of equality of inequality measures in two regions or countries considered)
is conducted as follows: The data in the two samples dealt with is partitioned
into fixed numbers $q_1, q_2\ge 2$ (e.g., $q_1=q_2=2, 4, 8$) of groups, the
parameters (inequality measures dealt with) are estimated for each group, and
inference is based on a standard two-sample $t-$test with the resulting $q_1,
q_2$ group estimators. Robust $t-$statistic approaches result in valid
inference under general conditions that group estimators of parameters (e.g.,
inequality measures) considered are asymptotically independent, unbiased and
Gaussian of possibly different variances, or weakly converge, at an arbitrary
rate, to independent scale mixtures of normal random variables. These
conditions are typically satisfied in empirical applications even under
pronounced heavy-tailedness and heterogeneity and possible dependence in
observations. The methods dealt with in the paper complement and compare
favorably with other inference approaches available in the literature. The use
of robust inference approaches is illustrated by an empirical analysis of
income inequality measures and their comparisons across different regions in
Russia.
arXiv link: http://arxiv.org/abs/2105.05335v2
Efficient Peer Effects Estimators with Group Effects
individual's outcomes are linear in the group mean outcome and characteristics,
and group effects are random. Our specification is motivated by the moment
conditions imposed in Graham 2008. We show that these moment conditions can be
cast in terms of a linear random group effects model and lead to a class of GMM
estimators that are generally identified as long as there is sufficient
variation in group size. We also show that our class of GMM estimators contains
a Quasi Maximum Likelihood estimator (QMLE) for the random group effects model,
as well as the Wald estimator of Graham 2008 and the within estimator of Lee
2007 as special cases. Our identification results extend insights in Graham
2008 that show how assumptions about random group effects as well as variation
in group size can be used to overcome the reflection problem in identifying
peer effects. Our QMLE and GMM estimators accommodate additional covariates and
are valid in situations with a large but finite number of different group sizes
or types. Because our estimators are general moment based procedures, using
instruments other than binary group indicators in estimation is straight
forward. Our QMLE estimator accommodates group level covariates in the spirit
of Mundlak and Chamberlain and offers an alternative to fixed effects
specifications. Monte-Carlo simulations show that the bias of the QMLE
estimator decreases with the number of groups and the variation in group size,
and increases with group size. We also prove the consistency and asymptotic
normality of the estimator under reasonable assumptions.
arXiv link: http://arxiv.org/abs/2105.04330v2
The Local Approach to Causal Inference under Network Interference
outcomes depend on how agents are linked in a social or economic network. Such
network interference describes a large literature on treatment spillovers,
social interactions, social learning, information diffusion, disease and
financial contagion, social capital formation, and more. Our approach works by
first characterizing how an agent is linked in the network using the
configuration of other agents and connections nearby as measured by path
distance. The impact of a policy or treatment assignment is then learned by
pooling outcome data across similarly configured agents. We demonstrate the
approach by deriving finite-sample bounds on the mean-squared error of a
k-nearest-neighbor estimator for the average treatment response as well as
proposing an asymptotically valid test for the hypothesis of policy
irrelevance.
arXiv link: http://arxiv.org/abs/2105.03810v5
Difference-in-Differences Estimation with Spatial Spillovers
When the effects of treatment cross over borders, classical
difference-in-differences estimation produces biased estimates for the average
treatment effect. In this paper, I introduce a potential outcomes framework to
model spillover effects and decompose the estimate's bias in two parts: (1) the
control group no longer identifies the counterfactual trend because their
outcomes are affected by treatment and (2) changes in treated units' outcomes
reflect the effect of their own treatment status and the effect from the
treatment status of 'close' units. I propose conditions for non-parametric
identification that can remove both sources of bias and semi-parametrically
estimate the spillover effects themselves including in settings with staggered
treatment timing. To highlight the importance of spillover effects, I revisit
analyses of three place-based interventions.
arXiv link: http://arxiv.org/abs/2105.03737v3
Machine Collaboration
collaboration (MaC), using a collection of base machines for prediction tasks.
Unlike bagging/stacking (a parallel & independent framework) and boosting (a
sequential & top-down framework), MaC is a type of circular & interactive
learning framework. The circular & interactive feature helps the base machines
to transfer information circularly and update their structures and parameters
accordingly. The theoretical result on the risk bound of the estimator from MaC
reveals that the circular & interactive feature can help MaC reduce risk via a
parsimonious ensemble. We conduct extensive experiments on MaC using both
simulated data and 119 benchmark real datasets. The results demonstrate that in
most cases, MaC performs significantly better than several other
state-of-the-art methods, including classification and regression trees, neural
networks, stacking, and boosting.
arXiv link: http://arxiv.org/abs/2105.02569v3
Policy Learning with Adaptively Collected Data
wide variety of applications including healthcare, digital recommendations, and
online education. The growing policy learning literature focuses on settings
where the data collection rule stays fixed throughout the experiment. However,
adaptive data collection is becoming more common in practice, from two primary
sources: 1) data collected from adaptive experiments that are designed to
improve inferential efficiency; 2) data collected from production systems that
progressively evolve an operational policy to improve performance over time
(e.g. contextual bandits). Yet adaptivity complicates the optimal policy
identification ex post, since samples are dependent, and each treatment may not
receive enough observations for each type of individual. In this paper, we make
initial research inquiries into addressing the challenges of learning the
optimal policy with adaptively collected data. We propose an algorithm based on
generalized augmented inverse propensity weighted (AIPW) estimators, which
non-uniformly reweight the elements of a standard AIPW estimator to control
worst-case estimation variance. We establish a finite-sample regret upper bound
for our algorithm and complement it with a regret lower bound that quantifies
the fundamental difficulty of policy learning with adaptive data. When equipped
with the best weighting scheme, our algorithm achieves minimax rate optimal
regret guarantees even with diminishing exploration. Finally, we demonstrate
our algorithm's effectiveness using both synthetic data and public benchmark
datasets.
arXiv link: http://arxiv.org/abs/2105.02344v2
Stock Price Forecasting in Presence of Covid-19 Pandemic and Evaluating Performances of Machine Learning Models for Time-Series Forecasting
the need for price forecasting has become more critical. We investigated the
forecast performance of four models including Long-Short Term Memory, XGBoost,
Autoregression, and Last Value on stock prices of Facebook, Amazon, Tesla,
Google, and Apple in COVID-19 pandemic time to understand the accuracy and
predictability of the models in this highly volatile time region. To train the
models, the data of all stocks are split into train and test datasets. The test
dataset starts from January 2020 to April 2021 which covers the COVID-19
pandemic period. The results show that the Autoregression and Last value models
have higher accuracy in predicting the stock prices because of the strong
correlation between the previous day and the next day's price value.
Additionally, the results suggest that the machine learning models (Long-Short
Term Memory and XGBoost) are not performing as well as Autoregression models
when the market experiences high volatility.
arXiv link: http://arxiv.org/abs/2105.02785v1
A Modified Randomization Test for the Level of Clustering
Given concerns about correlation across individuals, it is common to group
observations into clusters and conduct inference treating observations across
clusters as roughly independent. However, a researcher that has chosen to
cluster at the county level may be unsure of their decision, given knowledge
that observations are independent across states. This paper proposes a modified
randomization test as a robustness check for the chosen level of clustering in
a linear regression setting. Existing tests require either the number of states
or number of counties to be large. Our method is designed for settings with few
states and few counties. While the method is conservative, it has competitive
power in settings that may be relevant to empirical work.
arXiv link: http://arxiv.org/abs/2105.01008v2
A nonparametric instrumental approach to endogeneity in competing risks models
competing risks and random right censoring. The endogeneity issue is solved
using a discrete instrumental variable. We show that the competing risks model
generates a non-parametric quantile instrumental regression problem. The
cause-specific cumulative incidence, the cause-specific hazard and the
subdistribution hazard can be recovered from the regression function. A
distinguishing feature of the model is that censoring and competing risks
prevent identification at some quantiles. We characterize the set of quantiles
for which exact identification is possible and give partial identification
results for other quantiles. We outline an estimation procedure and discuss its
properties. The finite sample performance of the estimator is evaluated through
simulations. We apply the proposed method to the Health Insurance Plan of
Greater New York experiment.
arXiv link: http://arxiv.org/abs/2105.00946v1
Identification and Estimation of Average Causal Effects in Fixed Effects Logit Models
such as average marginal or treatment effects, in fixed effects logit models
with short panels. Relating the identified set of these effects to an extremal
moment problem, we first show how to obtain sharp bounds on such effects
simply, without any optimization. We also consider even simpler outer bounds,
which, contrary to the sharp bounds, do not require any first-step
nonparametric estimators. We build confidence intervals based on these two
approaches and show their asymptotic validity. Monte Carlo simulations suggest
that both approaches work well in practice, the second being typically
competitive in terms of interval length. Finally, we show that our method is
also useful to measure treatment effect heterogeneity.
arXiv link: http://arxiv.org/abs/2105.00879v5
A model of inter-organizational network formation
among ties while studying tie formation is one of the key challenges in this
area of research. We address this challenge using an equilibrium framework
where firms' decisions to form links with other firms are modeled as a
strategic game. In this game, firms weigh the costs and benefits of
establishing a relationship with other firms and form ties if their net payoffs
are positive. We characterize the equilibrium networks as exponential random
graphs (ERGM), and we estimate the firms' payoffs using a Bayesian approach. To
demonstrate the usefulness of our approach, we apply the framework to a
co-investment network of venture capital firms in the medical device industry.
The equilibrium framework allows researchers to draw economic interpretation
from parameter estimates of the ERGM Model. We learn that firms rely on their
joint partners (transitivity) and prefer to form ties with firms similar to
themselves (homophily). These results hold after controlling for the
interdependence among ties. Another, critical advantage of a structural
approach is that it allows us to simulate the effects of economic shocks or
policy counterfactuals. We test two such policy shocks, namely, firm entry and
regulatory change. We show how new firms' entry or a regulatory shock of
minimum capital requirements increase the co-investment network's density and
clustering.
arXiv link: http://arxiv.org/abs/2105.00458v1
Local Average and Marginal Treatment Effects with a Misclassified Treatment
effects (LATE and MTE) with a misclassified binary treatment variable. We
derive bounds on the (generalized) LATE and exploit its relationship with the
MTE to further bound the MTE. Indeed, under some standard assumptions, the MTE
is a limit of the ratio of the variation in the conditional expectation of the
observed outcome given the instrument to the variation in the true propensity
score, which is partially identified. We characterize the identified set for
the propensity score, and then for the MTE. We show that our LATE bounds are
tighter than the existing bounds and that the sign of the MTE is locally
identified under some mild regularity conditions. We use our MTE bounds to
derive bounds on other commonly used parameters in the literature and
illustrate the practical relevance of our derived bounds through numerical and
empirical results.
arXiv link: http://arxiv.org/abs/2105.00358v8
Automatic Debiased Machine Learning via Riesz Regression
regressions. Machine learning can be used to estimate such parameters. However
estimators based on machine learners can be severely biased by regularization
and/or model selection. Debiased machine learning uses Neyman orthogonal
estimating equations to reduce such biases. Debiased machine learning generally
requires estimation of unknown Riesz representers. A primary innovation of this
paper is to provide Riesz regression estimators of Riesz representers that
depend on the parameter of interest, rather than explicit formulae, and that
can employ any machine learner, including neural nets and random forests.
End-to-end algorithms emerge where the researcher chooses the parameter of
interest and the machine learner and the debiasing follows automatically.
Another innovation here is debiased machine learners of parameters depending on
generalized regressions, including high-dimensional generalized linear models.
An empirical example of automatic debiased machine learning using neural nets
is given. We find in Monte Carlo examples that automatic debiasing sometimes
performs better than debiasing via inverse propensity scores and never worse.
Finite sample mean square error bounds for Riesz regression estimators and
asymptotic theory are also given.
arXiv link: http://arxiv.org/abs/2104.14737v3
Nonparametric Difference-in-Differences in Repeated Cross-Sections with Continuous Treatments
treatment using a new difference-in-difference strategy. Our approach allows
for endogeneity of the treatment, and employs repeated cross-sections. It
requires an exogenous change over time which affects the treatment in a
heterogeneous way, stationarity of the distribution of unobservables and a rank
invariance condition on the time trend. On the other hand, we do not impose any
functional form restrictions or an additive time trend, and we are invariant to
the scaling of the dependent variable. Under our conditions, the time trend can
be identified using a control group, as in the binary difference-in-differences
literature. In our scenario, however, this control group is defined by the
data. We then identify average and quantile treatment effect parameters. We
develop corresponding nonparametric estimators and study their asymptotic
properties. Finally, we apply our results to the effect of disposable income on
consumption.
arXiv link: http://arxiv.org/abs/2104.14458v2
Generalized Linear Models with Structured Sparsity Estimators
Linear Models. Structured sparsity estimators in the least squares loss are
introduced by Stucky and van de Geer (2018) recently for fixed design and
normal errors. We extend their results to debiased structured sparsity
estimators with Generalized Linear Model based loss. Structured sparsity
estimation means penalized loss functions with a possible sparsity structure
used in the chosen norm. These include weighted group lasso, lasso and norms
generated from convex cones. The significant difficulty is that it is not clear
how to prove two oracle inequalities. The first one is for the initial
penalized Generalized Linear Model estimator. Since it is not clear how a
particular feasible-weighted nodewise regression may fit in an oracle
inequality for penalized Generalized Linear Model, we need a second oracle
inequality to get oracle bounds for the approximate inverse for the sample
estimate of second-order partial derivative of Generalized Linear Model.
Our contributions are fivefold: 1. We generalize the existing oracle
inequality results in penalized Generalized Linear Models by proving the
underlying conditions rather than assuming them. One of the key issues is the
proof of a sample one-point margin condition and its use in an oracle
inequality. 2. Our results cover even non sub-Gaussian errors and regressors.
3. We provide a feasible weighted nodewise regression proof which generalizes
the results in the literature from a simple l_1 norm usage to norms generated
from convex cones. 4. We realize that norms used in feasible nodewise
regression proofs should be weaker or equal to the norms in penalized
Generalized Linear Model loss. 5. We can debias the first step estimator via
getting an approximate inverse of the singular-sample second order partial
derivative of Generalized Linear Model loss.
arXiv link: http://arxiv.org/abs/2104.14371v1
Loss-Based Variational Bayes Prediction
a large number of parameters and is robust to model misspecification. Given a
class of high-dimensional (but parametric) predictive models, this new approach
constructs a posterior predictive using a variational approximation to a
generalized posterior that is directly focused on predictive accuracy. The
theoretical behavior of the new prediction approach is analyzed and a form of
optimality demonstrated. Applications to both simulated and empirical data
using high-dimensional Bayesian neural network and autoregressive mixture
models demonstrate that the approach provides more accurate results than
various alternatives, including misspecified likelihood-based predictions.
arXiv link: http://arxiv.org/abs/2104.14054v2
Sequential Search Models: A Pairwise Maximum Rank Approach
product quality, which can be correlated with endogenous observable
characteristics (such as price) and endogenous search cost variables (such as
product rankings in online search intermediaries); and (2) do not require
researchers to know the true distribution of the match value between consumers
and products. A likelihood approach to estimate such models gives biased
results. Therefore, I propose a new estimator -- pairwise maximum rank (PMR)
estimator -- for both preference and search cost parameters. I show that the
PMR estimator is consistent using only data on consumers' search order among
one pair of products rather than data on consumers' full consideration set or
final purchase. Additionally, we can use the PMR estimator to test for the true
match value distribution in the data. In the empirical application, I apply the
PMR estimator to quantify the effect of rankings in Expedia hotel search using
two samples of the data set, to which consumers are randomly assigned. I find
the position effect to be $0.11-$0.36, and the effect estimated using the
sample with randomly generated rankings is close to the effect estimated using
the sample with endogenous rankings. Moreover, I find that the true match value
distribution in the data is unlikely to be N(0,1). Likelihood estimation
ignoring endogeneity gives an upward bias of at least $1.17; misspecification
of match value distribution as N(0,1) gives an upward bias of at least $2.99.
arXiv link: http://arxiv.org/abs/2104.13865v2
Changepoint detection in random coefficient autoregressive models
changepoints in the deterministic part of the autoregressive parameter in a
Random Coefficient AutoRegressive (RCA) sequence. In order to ensure the
ability to detect breaks at sample endpoints, we thoroughly study weighted
CUSUM statistics, analysing the asymptotics for virtually all possible weighing
schemes, including the standardised CUSUM process (for which we derive a
Darling-Erdos theorem) and even heavier weights (studying the so-called R\'enyi
statistics). Our results are valid irrespective of whether the sequence is
stationary or not, and no prior knowledge of stationarity or lack thereof is
required. Technically, our results require strong approximations which, in the
nonstationary case, are entirely new. Similarly, we allow for
heteroskedasticity of unknown form in both the error term and in the stochastic
part of the autoregressive coefficient, proposing a family of test statistics
which are robust to heteroskedasticity, without requiring any prior knowledge
as to the presence or type thereof. Simulations show that our procedures work
very well in finite samples. We complement our theory with applications to
financial, economic and epidemiological time series.
arXiv link: http://arxiv.org/abs/2104.13440v1
A model of multiple hypothesis testing
are appropriate when. This paper provides an economic foundation for these
practices designed to capture leading examples, such as regulatory approval on
the basis of clinical trials. In studies of multiple treatments or
sub-populations, adjustments may be appropriate depending on scale economies in
the research production function, with control of classical notions of compound
errors emerging in some but not all cases. In studies with multiple outcomes,
indexing is appropriate and adjustments to test levels may be appropriate if
the intended audience is heterogeneous. Data on actual costs in the drug
approval process suggest both that some adjustment is warranted in that setting
and that standard procedures may be overly conservative.
arXiv link: http://arxiv.org/abs/2104.13367v8
Algorithm as Experiment: Machine Learning, Market Design, and Policy Eligibility Rules
develop a treatment-effect estimator using algorithmic decisions as instruments
for a class of stochastic and deterministic algorithms. Our estimator is
consistent and asymptotically normal for well-defined causal effects. A special
case of our setup is multidimensional regression discontinuity designs with
complex boundaries. We apply our estimator to evaluate the Coronavirus Aid,
Relief, and Economic Security Act, which allocated many billions of dollars
worth of relief funding to hospitals via an algorithmic rule. The funding is
shown to have little effect on COVID-19-related hospital activities. Naive
estimates exhibit selection bias.
arXiv link: http://arxiv.org/abs/2104.12909v6
Valid Heteroskedasticity Robust Testing
technique in econometric practice. Choosing the right critical value, however,
is not simple at all: conventional critical values based on asymptotics often
lead to severe size distortions; and so do existing adjustments including the
bootstrap. To avoid these issues, we suggest to use smallest size-controlling
critical values, the generic existence of which we prove in this article for
the commonly used test statistics. Furthermore, sufficient and often also
necessary conditions for their existence are given that are easy to check.
Granted their existence, these critical values are the canonical choice: larger
critical values result in unnecessary power loss, whereas smaller critical
values lead to over-rejections under the null hypothesis, make spurious
discoveries more likely, and thus are invalid. We suggest algorithms to
numerically determine the proposed critical values and provide implementations
in accompanying software. Finally, we numerically study the behavior of the
proposed testing procedures, including their power properties.
arXiv link: http://arxiv.org/abs/2104.12597v3
Weak Instrumental Variables: Limitations of Traditional 2SLS and Exploring Alternative Instrumental Variable Estimators
decades as a tool for causal inference, particularly amongst empirical
researchers. This paper makes three contributions. First, we provide a detailed
theoretical discussion on the properties of the standard two-stage least
squares estimator in the presence of weak instruments and introduce and derive
two alternative estimators. Second, we conduct Monte-Carlo simulations to
compare the finite-sample behavior of the different estimators, particularly in
the weak-instruments case. Third, we apply the estimators to a real-world
context; we employ the different estimators to calculate returns to schooling.
arXiv link: http://arxiv.org/abs/2104.12370v1
Interference, Bias, and Variance in Two-Sided Marketplace Experimentation: Guidance for Platforms
an intervention before launching it platform-wide. A typical approach is to
randomize individuals into the treatment group, which receives the
intervention, and the control group, which does not. The platform then compares
the performance in the two groups to estimate the effect if the intervention
were launched to everyone. We focus on two common experiment types, where the
platform randomizes individuals either on the supply side or on the demand
side. The resulting estimates of the treatment effect in these experiments are
typically biased: because individuals in the market compete with each other,
individuals in the treatment group affect those in the control group and vice
versa, creating interference. We develop a simple tractable market model to
study bias and variance in these experiments with interference. We focus on two
choices available to the platform: (1) Which side of the platform should it
randomize on (supply or demand)? (2) What proportion of individuals should be
allocated to treatment? We find that both choices affect the bias and variance
of the resulting estimators but in different ways. The bias-optimal choice of
experiment type depends on the relative amounts of supply and demand in the
market, and we discuss how a platform can use market data to select the
experiment type. Importantly, we find in many circumstances, choosing the
bias-optimal experiment type has little effect on variance. On the other hand,
the choice of treatment proportion can induce a bias-variance tradeoff, where
the bias-minimizing proportion increases variance. We discuss how a platform
can navigate this tradeoff and best choose the treatment proportion, using a
combination of modeling as well as contextual knowledge about the market, the
risk of the intervention, and reasonable effect sizes of the intervention.
arXiv link: http://arxiv.org/abs/2104.12222v1
Performance of Empirical Risk Minimization for Linear Regression with Dependent Data
minimization for large-dimensional linear regression. We generalize existing
results by allowing the data to be dependent and heavy-tailed. The analysis
covers both the cases of identically and heterogeneously distributed
observations. Our analysis is nonparametric in the sense that the relationship
between the regressand and the regressors is not specified. The main results of
this paper show that the empirical risk minimizer achieves the optimal
performance (up to a logarithmic factor) in a dependent data setting.
arXiv link: http://arxiv.org/abs/2104.12127v5
Hermite Polynomial-based Valuation of American Options with General Jump-Diffusion Processes
American options. The scheme is based on Hermite polynomial expansions of the
transition density of the underlying asset dynamics and the early exercise
premium representation of the American option price. The advantages of the
proposed approach are threefold. First, our approach does not require the
transition density and characteristic functions of the underlying asset
dynamics to be attainable in closed form. Second, our approach is fast and
accurate, while the prices and exercise policy can be jointly produced. Third,
our approach has a wide range of applications. We show that the proposed
approximations of the price and optimal exercise boundary converge to the true
ones. We also provide a numerical method based on a step function to implement
our proposed approach. Applications to nonlinear mean-reverting models, double
mean-reverting models, Merton's and Kou's jump-diffusion models are presented
and discussed.
arXiv link: http://arxiv.org/abs/2104.11870v1
Correlated Dynamics in Marketing Sensitivities
brands, and other marketing mix elements is fundamental to a wide swath of
marketing problems. An important but understudied aspect of this problem is the
dynamic nature of these sensitivities, which change over time and vary across
individuals. Prior work has developed methods for capturing such dynamic
heterogeneity within product categories, but neglected the possibility of
correlated dynamics across categories. In this work, we introduce a framework
to capture such correlated dynamics using a hierarchical dynamic factor model,
where individual preference parameters are influenced by common cross-category
dynamic latent factors, estimated through Bayesian nonparametric Gaussian
processes. We apply our model to grocery purchase data, and find that a
surprising degree of dynamic heterogeneity can be accounted for by only a few
global trends. We also characterize the patterns in how consumers'
sensitivities evolve across categories. Managerially, the proposed framework
not only enhances predictive accuracy by leveraging cross-category data, but
enables more precise estimation of quantities of interest, like price
elasticity.
arXiv link: http://arxiv.org/abs/2104.11702v2
Robust decision-making under risk and ambiguity
as a stand-in for the truth when studying the model's implications for optimal
decision-making. This practice ignores model ambiguity, exposes the decision
problem to misspecification, and ultimately leads to post-decision
disappointment. Using statistical decision theory, we develop a framework to
explore, evaluate, and optimize robust decision rules that explicitly account
for estimation uncertainty. We show how to operationalize our analysis by
studying robust decisions in a stochastic dynamic investment model in which a
decision-maker directly accounts for uncertainty in the model's transition
dynamics.
arXiv link: http://arxiv.org/abs/2104.12573v4
Investigating farming efficiency through a two stage analytical approach: Application to the agricultural sector in Northern Oman
farming efficiency. In the first stage, data envelopment analysis is employed
to estimate the efficiency of the farms and conduct slack and scale economies
analyses. In the second stage, we propose a stochastic model to identify
potential sources of inefficiency. The latter model integrates within a unified
structure all variables, including inputs, outputs and contextual factors. As
an application ground, we use a sample of 60 farms from the Batinah coastal
region, an agricultural area representing more than 53 per cent of the total
cropped area of Oman. The findings of the study lay emphasis on the
inter-dependence of groundwater salinity, irrigation technology and operational
efficiency of a farm, with as a key recommendation the necessity for more
regulated water consumption and a readjustment of governmental subsidiary
policies.
arXiv link: http://arxiv.org/abs/2104.10943v1
Identification of Peer Effects with Miss-specified Peer Groups: Missing Data and Group Uncertainty
miss-specification. Two leading cases are missing data and peer group
uncertainty. Missing data can take the form of some individuals being entirely
absent from the data. The researcher need not have any information on missing
individuals and need not even know that they are missing. We show that peer
effects are nevertheless identifiable under mild restrictions on the
probabilities of observing individuals, and propose a GMM estimator to estimate
the peer effects. In practice this means that the researcher need only have
access to an individual level sample with group identifiers. Group uncertainty
arises when the relevant peer group for the outcome under study is unknown. We
show that peer effects are nevertheless identifiable if the candidate groups
are nested within one another and propose a non-linear least squares estimator.
We conduct a Monte-Carlo experiment to demonstrate our identification results
and the performance of the proposed estimators, and apply our method to study
peer effects in the career decisions of junior lawyers.
arXiv link: http://arxiv.org/abs/2104.10365v5
Automatic Double Machine Learning for Continuous Treatment Effects
nonparametric estimator of continuous treatment effects. Specifically, we
estimate the average dose-response function - the expected value of an outcome
of interest at a particular level of the treatment level. We utilize tools from
both the double debiased machine learning (DML) and the automatic double
machine learning (ADML) literatures to construct our estimator. Our estimator
utilizes a novel debiasing method that leads to nice theoretical stability and
balancing properties. In simulations our estimator performs well compared to
current methods.
arXiv link: http://arxiv.org/abs/2104.10334v1
Backtesting Systemic Risk Forecasts using Multi-Objective Elicitability
finance, macroeconomics and by regulatory bodies. Despite their importance, we
show that they fail to be elicitable and identifiable. This renders forecast
comparison and validation, commonly summarised as `backtesting', impossible.
The novel notion of multi-objective elicitability solves this problem.
Specifically, we propose Diebold--Mariano type tests utilising two-dimensional
scores equipped with the lexicographic order. We illustrate the test decisions
by an easy-to-apply traffic-light approach. We apply our traffic-light approach
to DAX 30 and S&P 500 returns, and infer some recommendations for regulators.
arXiv link: http://arxiv.org/abs/2104.10673v4
CATE meets ML -- The Conditional Average Treatment Effect and Machine Learning
- prediction and estimation are two sides of the same coin. As it turns out,
machine learning methods are the tool for generalized prediction models.
Combined with econometric theory, they allow us to estimate not only the
average but a personalized treatment effect - the conditional average treatment
effect (CATE). In this tutorial, we give an overview of novel methods, explain
them in detail, and apply them via Quantlets in real data applications. We
study the effect that microcredit availability has on the amount of money
borrowed and if 401(k) pension plan eligibility has an impact on net financial
assets, as two empirical examples. The presented toolbox of methods contains
meta-learners, like the Doubly-Robust, R-, T- and X-learner, and methods that
are specially designed to estimate the CATE like the causal BART and the
generalized random forest. In both, the microcredit and 401(k) example, we find
a positive treatment effect for all observations but conflicting evidence of
treatment effect heterogeneity. An additional simulation study, where the true
treatment effect is known, allows us to compare the different methods and to
observe patterns and similarities.
arXiv link: http://arxiv.org/abs/2104.09935v2
Deep Reinforcement Learning in a Monetary Model
general equilibrium models. Agents are represented by deep artificial neural
networks and learn to solve their dynamic optimisation problem by interacting
with the model environment, of which they have no a priori knowledge. Deep
reinforcement learning offers a flexible yet principled way to model bounded
rationality within this general class of models. We apply our proposed approach
to a classical model from the adaptive learning literature in macroeconomics
which looks at the interaction of monetary and fiscal policy. We find that,
contrary to adaptive learning, the artificially intelligent household can solve
the model in all policy regimes.
arXiv link: http://arxiv.org/abs/2104.09368v2
Estimating and Improving Dynamic Treatment Regimes With a Time-Varying Instrumental Variable
data is challenging as some degree of unmeasured confounding is often expected.
In this work, we develop a framework of estimating properly defined "optimal"
DTRs with a time-varying instrumental variable (IV) when unmeasured covariates
confound the treatment and outcome, rendering the potential outcome
distributions only partially identified. We derive a novel Bellman equation
under partial identification, use it to define a generic class of estimands
(termed IV-optimal DTRs), and study the associated estimation problem. We then
extend the IV-optimality framework to tackle the policy improvement problem,
delivering IV-improved DTRs that are guaranteed to perform no worse and
potentially better than a pre-specified baseline DTR. Importantly, our
IV-improvement framework opens up the possibility of strictly improving upon
DTRs that are optimal under the no unmeasured confounding assumption (NUCA). We
demonstrate via extensive simulations the superior performance of IV-optimal
and IV-improved DTRs over the DTRs that are optimal only under the NUCA. In a
real data example, we embed retrospective observational registry data into a
natural, two-stage experiment with noncompliance using a time-varying IV and
estimate useful IV-optimal DTRs that assign mothers to high-level or low-level
neonatal intensive care units based on their prognostic variables.
arXiv link: http://arxiv.org/abs/2104.07822v1
A robust specification test in linear panel data models
testing procedures that result in unstable test statistics and unreliable
inferences depending on the distortion in parameter estimates. In spite of the
fact that the adverse effects of outliers in panel data models, there are only
a few robust testing procedures available for model specification. In this
paper, a new weighted likelihood based robust specification test is proposed to
determine the appropriate approach in panel data including individual-specific
components. The proposed test has been shown to have the same asymptotic
distribution as that of most commonly used Hausman's specification test under
null hypothesis of random effects specification. The finite sample properties
of the robust testing procedure are illustrated by means of Monte Carlo
simulations and an economic-growth data from the member countries of the
Organisation for Economic Co-operation and Development. Our records reveal that
the robust specification test exhibit improved performance in terms of size and
power of the test in the presence of contamination.
arXiv link: http://arxiv.org/abs/2104.07723v1
BERT based freedom to operate patent analysis
analysis and patent searches. According to the method, BERT is fine-tuned by
training patent descriptions to the independent claims. Each description
represents an invention which is protected by the corresponding claims. Such a
trained BERT could be able to identify or order freedom to operate relevant
patents based on a short description of an invention or product. We tested the
method by training BERT on the patent class G06T1/00 and applied the trained
BERT on five inventions classified in G06T1/60, described via DOCDB abstracts.
The DOCDB abstract are available on ESPACENET of the European Patent Office.
arXiv link: http://arxiv.org/abs/2105.00817v2
Selecting Penalty Parameters of High-Dimensional M-Estimators using Bootstrapping after Cross-Validation
$\ell_{1}$-penalized M-estimators in high dimensions, which we refer to as
bootstrapping after cross-validation. We derive rates of convergence for the
corresponding $\ell_1$-penalized M-estimator and also for the
post-$\ell_1$-penalized M-estimator, which refits the non-zero entries of the
former estimator without penalty in the criterion function. We demonstrate via
simulations that our methods are not dominated by cross-validation in terms of
estimation errors and can outperform cross-validation in terms of inference. As
an empirical illustration, we revisit Fryer Jr (2019), who investigated racial
differences in police use of force, and confirm his findings.
arXiv link: http://arxiv.org/abs/2104.04716v5
Identification of Dynamic Panel Logit Models with Fixed Effects
with fixed effects is related to the truncated moment problem from the
mathematics literature. We use this connection to show that the identified set
for structural parameters and functionals of the distribution of latent
individual effects can be characterized by a finite set of conditional moment
equalities subject to a certain set of shape constraints on the model
parameters. In addition to providing a general approach to identification, the
new characterization can deliver informative bounds in cases where competing
methods deliver no identifying restrictions, and can deliver point
identification in cases where competing methods deliver partial identification.
We then present an estimation and inference procedure that uses semidefinite
programming methods, is applicable with continuous or discrete covariates, and
can be used for models that are either point- or partially-identified. Finally,
we illustrate our identification result with a number of examples, and provide
an empirical application to employment dynamics using data from the National
Longitudinal Survey of Youth.
arXiv link: http://arxiv.org/abs/2104.04590v3
Average Direct and Indirect Causal Effects under Interference
in the potential outcomes model for causal inference under cross-unit
interference. Our definition is analogous to the standard definition of the
average direct effect, and can be expressed without needing to compare outcomes
across multiple randomized experiments. We show that the proposed indirect
effect satisfies a decomposition theorem whereby, in a Bernoulli trial, the sum
of the average direct and indirect effects always corresponds to the effect of
a policy intervention that infinitesimally increases treatment probabilities.
We also consider a number of parametric models for interference, and find that
our non-parametric indirect effect remains a natural estimand when re-expressed
in the context of these models.
arXiv link: http://arxiv.org/abs/2104.03802v4
Predicting Inflation with Recurrent Neural Networks
inflation. This is an appealing model for time series as it processes each time
step sequentially and explicitly learns dynamic dependencies. The paper also
explores the dimension reduction capability of the model to uncover
economically-meaningful factors that can explain the inflation process. Results
from an exercise with US data indicate that the estimated neural nets present
competitive, but not outstanding, performance against common benchmarks
(including other machine learning models). The LSTM in particular is found to
perform well at long horizons and during periods of heightened macroeconomic
uncertainty. Interestingly, LSTM-implied factors present high correlation with
business cycle indicators, informing on the usefulness of such signals as
inflation predictors. The paper also sheds light on the impact of network
initialization and architecture on forecast performance.
arXiv link: http://arxiv.org/abs/2104.03757v2
Min(d)ing the President: A text analytic approach to measuring tax news
Consequently, estimating their macroeconomic effects requires identification of
such signals. We propose a novel text analytic approach for transforming
textual information into an economically meaningful time series. Using this
method, we create a tax news measure from all publicly available post-war
communications of U.S. presidents. Our measure predicts the direction and size
of future tax changes and contains signals not present in previously considered
(narrative) measures of tax changes. We investigate the effects of tax news and
find that, for long anticipation horizons, pre-implementation effects lead
initially to contractions in output.
arXiv link: http://arxiv.org/abs/2104.03261v3
DoubleML -- An Object-Oriented Implementation of Double Machine Learning in Python
learning framework of Chernozhukov et al. (2018) for a variety of causal
models. It contains functionalities for valid statistical inference on causal
parameters when the estimation of nuisance parameters is based on machine
learning methods. The object-oriented implementation of DoubleML provides a
high flexibility in terms of model specifications and makes it easily
extendable. The package is distributed under the MIT license and relies on core
libraries from the scientific Python ecosystem: scikit-learn, numpy, pandas,
scipy, statsmodels and joblib. Source code, documentation and an extensive user
guide can be found at https://github.com/DoubleML/doubleml-for-py and
https://docs.doubleml.org.
arXiv link: http://arxiv.org/abs/2104.03220v2
Bootstrap Inference for Hawkes and General Point Processes
model is predominantly based on asymptotic approximations for likelihood-based
estimators and tests. As an alternative, and to improve finite sample
performance, this paper considers bootstrap-based inference for interval
estimation and testing. Specifically, for a wide class of point process models
we consider a novel bootstrap scheme labeled 'fixed intensity bootstrap' (FIB),
where the conditional intensity is kept fixed across bootstrap repetitions. The
FIB, which is very simple to implement and fast in practice, extends previous
ideas from the bootstrap literature on time series in discrete time, where the
so-called 'fixed design' and 'fixed volatility' bootstrap schemes have shown to
be particularly useful and effective. We compare the FIB with the classic
recursive bootstrap, which is here labeled 'recursive intensity bootstrap'
(RIB). In RIB algorithms, the intensity is stochastic in the bootstrap world
and implementation of the bootstrap is more involved, due to its sequential
structure. For both bootstrap schemes, we provide new bootstrap (asymptotic)
theory which allows to assess bootstrap validity, and propose a
'non-parametric' approach based on resampling time-changed transformations of
the original waiting times. We also establish the link between the proposed
bootstraps for point process models and the related autoregressive conditional
duration (ACD) models. Lastly, we show effectiveness of the different bootstrap
schemes in finite samples through a set of detailed Monte Carlo experiments,
and provide applications to both financial data and social media data to
illustrate the proposed methodology.
arXiv link: http://arxiv.org/abs/2104.03122v2
The Proper Use of Google Trends in Forecasting Models
free tools used by forecasters both in academics and in the private and public
sectors. There are many papers, from several different fields, concluding that
Google Trends improve forecasts' accuracy. However, what seems to be widely
unknown, is that each sample of Google search data is different from the other,
even if you set the same search term, data and location. This means that it is
possible to find arbitrary conclusions merely by chance. This paper aims to
show why and when it can become a problem and how to overcome this obstacle.
arXiv link: http://arxiv.org/abs/2104.03065v3
Minimax Kernel Machine Learning for a Class of Doubly Robust Functionals with Application to Proximal Causal Inference
could be used to obtain doubly robust moment functions for the corresponding
parameters. However, that class does not include the IF of parameters for which
the nuisance functions are solutions to integral equations. Such parameters are
particularly important in the field of causal inference, specifically in the
recently proposed proximal causal inference framework of Tchetgen Tchetgen et
al. (2020), which allows for estimating the causal effect in the presence of
latent confounders. In this paper, we first extend the class of Robins et al.
to include doubly robust IFs in which the nuisance functions are solutions to
integral equations. Then we demonstrate that the double robustness property of
these IFs can be leveraged to construct estimating equations for the nuisance
functions, which enables us to solve the integral equations without resorting
to parametric models. We frame the estimation of the nuisance functions as a
minimax optimization problem. We provide convergence rates for the nuisance
functions and conditions required for asymptotic linearity of the estimator of
the parameter of interest. The experiment results demonstrate that our proposed
methodology leads to robust and high-performance estimators for average causal
effect in the proximal causal inference framework.
arXiv link: http://arxiv.org/abs/2104.02929v3
Revisiting the empirical fundamental relationship of traffic flow for highways using a causal econometric approach
fitting a regression curve to a cloud of observations of traffic variables.
Such estimates, however, may suffer from the confounding/endogeneity bias due
to omitted variables such as driving behaviour and weather. To this end, this
paper adopts a causal approach to obtain an unbiased estimate of the
fundamental flow-density relationship using traffic detector data. In
particular, we apply a Bayesian non-parametric spline-based regression approach
with instrumental variables to adjust for the aforementioned confounding bias.
The proposed approach is benchmarked against standard curve-fitting methods in
estimating the flow-density relationship for three highway bottlenecks in the
United States. Our empirical results suggest that the saturated (or
hypercongested) regime of the estimated flow-density relationship using
correlational curve fitting methods may be severely biased, which in turn leads
to biased estimates of important traffic control inputs such as capacity and
capacity-drop. We emphasise that our causal approach is based on the physical
laws of vehicle movement in a traffic stream as opposed to a demand-supply
framework adopted in the economics literature. By doing so, we also aim to
conciliate the engineering and economics approaches to this empirical problem.
Our results, thus, have important implications both for traffic engineers and
transport economists.
arXiv link: http://arxiv.org/abs/2104.02399v1
Identification and Estimation in Many-to-one Two-sided Matching without Transfers
utilities, e.g., college admissions, we study conditions under which
preferences of both sides are identified with data on one single market.
Regardless of whether the market is centralized or decentralized, assuming that
the observed matching is stable, we show nonparametric identification of
preferences of both sides under certain exclusion restrictions. To take our
results to the data, we use Monte Carlo simulations to evaluate different
estimators, including the ones that are directly constructed from the
identification. We find that a parametric Bayesian approach with a Gibbs
sampler works well in realistically sized problems. Finally, we illustrate our
methodology in decentralized admissions to public and private schools in Chile
and conduct a counterfactual analysis of an affirmative action policy.
arXiv link: http://arxiv.org/abs/2104.02009v3
Local Projections vs. VARs: Lessons From Thousands of DGPs
Autoregression (VAR) estimators of structural impulse responses across
thousands of data generating processes, designed to mimic the properties of the
universe of U.S. macroeconomic data. Our analysis considers various
identification schemes and several variants of LP and VAR estimators, employing
bias correction, shrinkage, or model averaging. A clear bias-variance trade-off
emerges: LP estimators have lower bias than VAR estimators, but they also have
substantially higher variance at intermediate and long horizons. Bias-corrected
LP is the preferred method if and only if the researcher overwhelmingly
prioritizes bias. For researchers who also care about precision, VAR methods
are the most attractive -- Bayesian VARs at short and long horizons, and
least-squares VARs at intermediate and long horizons.
arXiv link: http://arxiv.org/abs/2104.00655v4
Normalizations and misspecification in skill formation models
formation and the optimal timing of interventions. In this paper, I provide new
identification results for these models and investigate the effects of
seemingly innocuous scale and location restrictions on parameters of interest.
To do so, I first characterize the identified set of all parameters without
these additional restrictions and show that important policy-relevant
parameters are point identified under weaker assumptions than commonly used in
the literature. The implications of imposing standard scale and location
restrictions depend on how the model is specified, but they generally impact
the interpretation of parameters and may affect counterfactuals. Importantly,
with the popular CES production function, commonly used scale restrictions fix
identified parameters and lead to misspecification. Consequently, simply
changing the units of measurements of observed variables might yield
ineffective investment strategies and misleading policy recommendations. I show
how existing estimators can easily be adapted to solve these issues. As a
byproduct, this paper also presents a general and formal definition of when
restrictions are truly normalizations.
arXiv link: http://arxiv.org/abs/2104.00473v4
Universal Prediction Band via Semi-Definite Programming
heteroscedastic prediction bands for uncertainty quantification, with or
without any user-specified predictive model. Our approach provides an
alternative to the now-standard conformal prediction for uncertainty
quantification, with novel theoretical insights and computational advantages.
The data-adaptive prediction band is universally applicable with minimal
distributional assumptions, has strong non-asymptotic coverage properties, and
is easy to implement using standard convex programs. Our approach can be viewed
as a novel variance interpolation with confidence and further leverages
techniques from semi-definite programming and sum-of-squares optimization.
Theoretical and numerical performances for the proposed approach for
uncertainty quantification are analyzed.
arXiv link: http://arxiv.org/abs/2103.17203v3
Forecasting open-high-low-close data contained in candlestick chart
is of great practical importance, as exemplified by applications in the field
of finance. Typically, the existence of the inherent constraints in OHLC data
poses great challenge to its prediction, e.g., forecasting models may yield
unrealistic values if these constraints are ignored. To address it, a novel
transformation approach is proposed to relax these constraints along with its
explicit inverse transformation, which ensures the forecasting models obtain
meaningful openhigh-low-close values. A flexible and efficient framework for
forecasting the OHLC data is also provided. As an example, the detailed
procedure of modelling the OHLC data via the vector auto-regression (VAR) model
and vector error correction (VEC) model is given. The new approach has high
practical utility on account of its flexibility, simple implementation and
straightforward interpretation. Extensive simulation studies are performed to
assess the effectiveness and stability of the proposed approach. Three
financial data sets of the Kweichow Moutai, CSI 100 index and 50 ETF of Chinese
stock market are employed to document the empirical effect of the proposed
methodology.
arXiv link: http://arxiv.org/abs/2104.00581v1
Dimension reduction of open-high-low-close data in candlestick chart based on pseudo-PCA
of finance and the investigate object of various technical analysis. With
increasing features of OHLC data being collected, the issue of extracting their
useful information in a comprehensible way for visualization and easy
interpretation must be resolved. The inherent constraints of OHLC data also
pose a challenge for this issue. This paper proposes a novel approach to
characterize the features of OHLC data in a dataset and then performs dimension
reduction, which integrates the feature information extraction method and
principal component analysis. We refer to it as the pseudo-PCA method.
Specifically, we first propose a new way to represent the OHLC data, which will
free the inherent constraints and provide convenience for further analysis.
Moreover, there is a one-to-one match between the original OHLC data and its
feature-based representations, which means that the analysis of the
feature-based data can be reversed to the original OHLC data. Next, we develop
the pseudo-PCA procedure for OHLC data, which can effectively identify
important information and perform dimension reduction. Finally, the
effectiveness and interpretability of the proposed method are investigated
through finite simulations and the spot data of China's agricultural product
market.
arXiv link: http://arxiv.org/abs/2103.16908v1
Mobility Functional Areas and COVID-19 Spread
Functional Areas (MFAs), i.e., the geographic zones highly interconnected
according to the analysis of mobile positioning data. The MFAs do not coincide
necessarily with administrative borders as they are built observing natural
human mobility and, therefore, they can be used to inform, in a bottom-up
approach, local transportation, spatial planning, health and economic policies.
After presenting the methodology behind the MFAs, this study focuses on the
link between the COVID-19 pandemic and the MFAs in Austria. It emerges that the
MFAs registered an average number of infections statistically larger than the
areas in the rest of the country, suggesting the usefulness of the MFAs in the
context of targeted re-escalation policy responses to this health crisis. The
MFAs dataset is openly available to other scholars for further analyses.
arXiv link: http://arxiv.org/abs/2103.16894v2
On a Standard Method for Measuring the Natural Rate of Interest
Unbiased Estimation (MUE) cannot recover the signal-to-noise ratio of interest
from their Stage 2 model. Moreover, their implementation of the structural
break regressions which are used as an auxiliary model in MUE deviates from
Stock and Watson's (1998) formulation. This leads to spuriously large estimates
of the signal-to-noise parameter $\lambda _{z}$ and thereby an excessive
downward trend in other factor $z_{t}$ and the natural rate. I provide a
correction to the Stage 2 model specification and the implementation of the
structural break regressions in MUE. This correction is quantitatively
important. It results in substantially smaller point estimates of $\lambda
_{z}$ which affects the severity of the downward trend in other factor $z_{t}$.
For the US, the estimate of $\lambda _{z}$ shrinks from $0.040$ to $0.013$ and
is statistically highly insignificant. For the Euro Area, the UK and Canada,
the MUE point estimates of $\lambda _{z}$ are exactly zero. Natural rate
estimates from HLW's model using the correct Stage 2 MUE implementation are up
to 100 basis points larger than originally computed.
arXiv link: http://arxiv.org/abs/2103.16452v2
Empirical Welfare Maximization with Constraints
select welfare program eligibility policies based on data. This paper extends
EWM by allowing for uncertainty in estimating the budget needed to implement
the selected policy, in addition to its welfare. Due to the additional
estimation error, I show there exist no rules that achieve the highest welfare
possible while satisfying a budget constraint uniformly over a wide range of
DGPs. This differs from the setting without a budget constraint where
uniformity is achievable. I propose an alternative trade-off rule and
illustrate it with Medicaid expansion, a setting with imperfect take-up and
varying program costs.
arXiv link: http://arxiv.org/abs/2103.15298v2
Divide-and-Conquer: A Distributed Hierarchical Factor Approach to Modeling Large-Scale Time Series Data
high-dimensional, large-scale heterogeneous time series data using distributed
computing. The new method employs a multiple-fold dimension reduction procedure
using Principal Component Analysis (PCA) and shows great promises for modeling
large-scale data that cannot be stored nor analyzed by a single machine. Each
computer at the basic level performs a PCA to extract common factors among the
time series assigned to it and transfers those factors to one and only one node
of the second level. Each 2nd-level computer collects the common factors from
its subordinates and performs another PCA to select the 2nd-level common
factors. This process is repeated until the central server is reached, which
collects common factors from its direct subordinates and performs a final PCA
to select the global common factors. The noise terms of the 2nd-level
approximate factor model are the unique common factors of the 1st-level
clusters. We focus on the case of 2 levels in our theoretical derivations, but
the idea can easily be generalized to any finite number of hierarchies. We
discuss some clustering methods when the group memberships are unknown and
introduce a new diffusion index approach to forecasting. We further extend the
analysis to unit-root nonstationary time series. Asymptotic properties of the
proposed method are derived for the diverging dimension of the data in each
computing unit and the sample size $T$. We use both simulated data and real
examples to assess the performance of the proposed method in finite samples,
and compare our method with the commonly used ones in the literature concerning
the forecastability of extracted factors.
arXiv link: http://arxiv.org/abs/2103.14626v1
Addressing spatial dependence in technical efficiency estimation: A Spatial DEA frontier approach
production-frontier based on Data Envelopment Analysis (DEA) when dealing with
decision-making units whose economic performances are correlated with those of
the neighbors (spatial dependence). To illustrate the bias reduction that the
SpDEA provides with respect to standard DEA methods, an analysis of the
regional production frontiers for the NUTS-2 European regions during the period
2000-2014 was carried out. The estimated SpDEA scores show a bimodal
distribution do not detected by the standard DEA estimates. The results confirm
the crucial role of space, offering important new insights on both the causes
of regional disparities in labour productivity and the observed polarization of
the European distribution of per capita income.
arXiv link: http://arxiv.org/abs/2103.14063v1
Testing for threshold effects in the TARMA framework
specification against its threshold ARMA extension. We derive the asymptotic
distribution of the test statistics both under the null hypothesis and
contiguous local alternatives. Moreover, we prove the consistency of the tests.
The Monte Carlo study shows that the tests enjoy good finite-sample properties,
are robust against model mis-specification and their performance is not
affected if the order of the model is unknown. The tests present a low
computational burden and do not suffer from some of the drawbacks that affect
the quasi-likelihood ratio setting. Lastly, we apply our tests to a time series
of standardized tree-ring growth indexes and this can lead to new research in
climate studies.
arXiv link: http://arxiv.org/abs/2103.13977v1
A perturbed utility route choice model
a utility maximizing assignment of flow across an entire network under a flow
conservation constraint}. Substitution between routes depends on how much they
overlap. {\tr The model is estimated considering the full set of route
alternatives, and no choice set generation is required. Nevertheless,
estimation requires only linear regression and is very fast. Predictions from
the model can be computed using convex optimization, and computation is
straightforward even for large networks. We estimate and validate the model
using a large dataset comprising 1,337,096 GPS traces of trips in the Greater
Copenhagen road network.
arXiv link: http://arxiv.org/abs/2103.13784v3
Phase transition of the monotonicity assumption in learning local average treatment effects
a binary treatment. The traditional LATE approach assumes the monotonicity
condition stating that there are no defiers (or compliers). Since this
condition is not always obvious, we investigate the sensitivity and testability
of this condition. In particular, we focus on the question: does a slight
violation of monotonicity lead to a small problem or a big problem? We find a
phase transition for the monotonicity condition. On one of the boundary of the
phase transition, it is easy to learn the sign of LATE and on the other side of
the boundary, it is impossible to learn the sign of LATE. Unfortunately, the
impossible side of the phase transition includes data-generating processes
under which the proportion of defiers tends to zero. This boundary of phase
transition is explicitly characterized in the case of binary outcomes. Outside
a special case, it is impossible to test whether the data-generating process is
on the nice side of the boundary. However, in the special case that the
non-compliance is almost one-sided, such a test is possible. We also provide
simple alternatives to monotonicity.
arXiv link: http://arxiv.org/abs/2103.13369v1
An investigation of higher order moments of empirical financial data and the implications to risk
financial time series when we truncate a large data set into smaller and
smaller subsets, referred to below as time windows. We look at the effect of
the economic environment on the behaviour of higher order moments in these time
windows. We observe two different scaling relations of higher order moments
when the data sub sets' length decreases; one for longer time windows and
another for the shorter time windows. These scaling relations drastically
change when the time window encompasses a financial crisis. We also observe a
qualitative change of higher order standardised moments compared to the
gaussian values in response to a shrinking time window. We extend this analysis
to incorporate the effects these scaling relations have upon risk. We decompose
the return series within these time windows and carry out a Value-at-Risk
calculation. In doing so, we observe the manifestation of the scaling relations
through the change in the Value-at-Risk level. Moreover, we model the observed
scaling laws by analysing the hierarchy of rare events on higher order moments.
arXiv link: http://arxiv.org/abs/2103.13199v3
Identification at the Zero Lower Bound
identify the causal effects of monetary policy. Identification depends on the
extent to which the ZLB limits the efficacy of monetary policy. I propose a
simple way to test the efficacy of unconventional policies, modelled via a
`shadow rate'. I apply this method to U.S. monetary policy using a
three-equation SVAR model of inflation, unemployment and the federal funds
rate. I reject the null hypothesis that unconventional monetary policy has no
effect at the ZLB, but find some evidence that it is not as effective as
conventional monetary policy.
arXiv link: http://arxiv.org/abs/2103.12779v2
What Do We Get from Two-Way Fixed Effects Regressions? Implications from Numerical Equivalence
effects (TWFE) regressions, allowing for general scalar treatments with
non-staggered designs and time-varying covariates. Building on the numerical
equivalence between TWFE and pooled first-difference regressions, I decompose
the TWFE coefficient into a weighted average of first-difference coefficients
across varying horizons, clarifying contributions of short-run versus long-run
changes. Causal interpretation of the TWFE coefficient requires common trends
assumptions for all time horizons, conditional on changes, not levels, of
time-varying covariates. I develop diagnostic procedures to assess this
assumption's plausibility across different horizons, extending beyond recent
literature's focus on binary, staggered treatments.
arXiv link: http://arxiv.org/abs/2103.12374v9
Uncovering Bias in Order Assignment
random sequence. In such circumstances, it is often desirable to test whether
such randomization indeed obtains, yet this problem has received very limited
attention in the literature. This paper articulates the key features of this
problem and presents three "untargeted" tests that require no a priori
information from the analyst. These methods are used to analyze the order in
which lottery numbers are drawn in Powerball, the order in which contestants
perform on American Idol, and the order of candidates on primary election
ballots in Texas and West Virginia. In this last application, multiple
deviations from full randomization are detected, with potentially serious
political and legal consequences. The form these deviations take varies,
depending on institutional factors, which sometimes necessitates the use of
tests that exchange power for increased robustness.
arXiv link: http://arxiv.org/abs/2103.11952v2
PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT
patent-to-patent (p2p) technological similarity, and presents a hybrid
framework for leveraging the resulting p2p similarity for applications such as
semantic search and automated patent classification. We create embeddings using
Sentence-BERT (SBERT) based on patent claims. We leverage SBERTs efficiency in
creating embedding distance measures to map p2p similarity in large sets of
patent data. We deploy our framework for classification with a simple Nearest
Neighbors (KNN) model that predicts Cooperative Patent Classification (CPC) of
a patent based on the class assignment of the K patents with the highest p2p
similarity. We thereby validate that the p2p similarity captures their
technological features in terms of CPC overlap, and at the same demonstrate the
usefulness of this approach for automatic patent classification based on text
data. Furthermore, the presented classification framework is simple and the
results easy to interpret and evaluate by end-users. In the out-of-sample model
validation, we are able to perform a multi-label prediction of all assigned CPC
classes on the subclass (663) level on 1,492,294 patents with an accuracy of
54% and F1 score > 66%, which suggests that our model outperforms the current
state-of-the-art in text-based multi-label and multi-class patent
classification. We furthermore discuss the applicability of the presented
framework for semantic IP search, patent landscaping, and technology
intelligence. We finally point towards a future research agenda for leveraging
multi-source patent embeddings, their appropriateness across applications, as
well as to improve and validate patent embeddings by creating domain-expert
curated Semantic Textual Similarity (STS) benchmark datasets.
arXiv link: http://arxiv.org/abs/2103.11933v3
Robust Orthogonal Machine Learning of Treatment Effects
what if problems in decision-makings. In causal learning, it is
central to seek methods to estimate the average treatment effect (ATE) from
observational data. The Double/Debiased Machine Learning (DML) is one of the
prevalent methods to estimate ATE. However, the DML estimators can suffer from
an error-compounding issue and even give extreme estimates when the
propensity scores are close to 0 or 1. Previous studies have overcome this
issue through some empirical tricks such as propensity score trimming, yet none
of the existing works solves it from a theoretical standpoint. In this paper,
we propose a Robust Causal Learning (RCL) method to offset the
deficiencies of DML estimators. Theoretically, the RCL estimators i) satisfy
the (higher-order) orthogonal condition and are as consistent and
doubly robust as the DML estimators, and ii) get rid of the error-compounding
issue. Empirically, the comprehensive experiments show that: i) the RCL
estimators give more stable estimations of the causal parameters than DML; ii)
the RCL estimators outperform traditional estimators and their variants when
applying different machine learning models on both simulation and benchmark
datasets, and a mimic consumer credit dataset generated by WGAN.
arXiv link: http://arxiv.org/abs/2103.11869v2
A Powerful Subvector Anderson Rubin Test in Linear Instrumental Variables Regression with Conditional Heteroskedasticity
structural parameter vector in the linear instrumental variables (IVs) model.
Guggenberger et al. (2019), GKM19 from now on, introduce a subvector
Anderson-Rubin (AR) test with data-dependent critical values that has
asymptotic size equal to nominal size for a parameter space that allows for
arbitrary strength or weakness of the IVs and has uniformly nonsmaller power
than the projected AR test studied in Guggenberger et al. (2012). However,
GKM19 imposes the restrictive assumption of conditional homoskedasticity. The
main contribution here is to robustify the procedure in GKM19 to arbitrary
forms of conditional heteroskedasticity. We first adapt the method in GKM19 to
a setup where a certain covariance matrix has an approximate Kronecker product
(AKP) structure which nests conditional homoskedasticity. The new test equals
this adaption when the data is consistent with AKP structure as decided by a
model selection procedure. Otherwise the test equals the AR/AR test in Andrews
(2017) that is fully robust to conditional heteroskedasticity but less powerful
than the adapted method. We show theoretically that the new test has asymptotic
size bounded by the nominal size and document improved power relative to the
AR/AR test in a wide array of Monte Carlo simulations when the covariance
matrix is not too far from AKP.
arXiv link: http://arxiv.org/abs/2103.11371v4
On Spurious Causality, CO2, and Global Temperature
Reports) use information flows (Liang, 2008, 2014) to establish causality from
various forcings to global temperature. We show that the formulas being used
hinges on a simplifying assumption that is nearly always rejected by the data.
We propose an adequate measure of information flow based on Vector
Autoregressions, and find that most results in Stips et al. (2016) cannot be
corroborated. Then, it is discussed which modeling choices (e.g., the choice of
CO2 series and assumptions about simultaneous relationships) may help in
extracting credible estimates of causal flows and the transient climate
response simply by looking at the joint dynamics of two climatic time series.
arXiv link: http://arxiv.org/abs/2103.10605v1
Feasible IV Regression without Excluded Instruments
significantly weaker than the conventional IV's in at least two respects: (1)
consistent estimation without excluded instruments is possible, provided
endogenous covariates are non-linearly mean-dependent on exogenous covariates,
and (2) endogenous covariates may be uncorrelated with but mean-dependent on
instruments. These remarkable properties notwithstanding, multiplicative-kernel
ICM estimators suffer diminished identification strength, large bias, and
severe size distortions even for a moderately sized instrument vector. This
paper proposes a computationally fast linear ICM estimator that better
preserves identification strength in the presence of multiple instruments and a
test of the ICM relevance condition. Monte Carlo simulations demonstrate a
considerably better size control in the presence of multiple instruments and a
favourably competitive performance in general. An empirical example illustrates
the practical usefulness of the estimator, where estimates remain plausible
when no excluded instrument is used.
arXiv link: http://arxiv.org/abs/2103.09621v4
DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R
framework of Chernozhukov et al. (2018). It provides functionalities to
estimate parameters in causal models based on machine learning methods. The
double machine learning framework consist of three key ingredients: Neyman
orthogonality, high-quality machine learning estimation and sample splitting.
Estimation of nuisance components can be performed by various state-of-the-art
machine learning methods that are available in the mlr3 ecosystem. DoubleML
makes it possible to perform inference in a variety of causal models, including
partially linear and interactive regression models and their extensions to
instrumental variable estimation. The object-oriented implementation of
DoubleML enables a high flexibility for the model specification and makes it
easily extendable. This paper serves as an introduction to the double machine
learning framework and the R package DoubleML. In reproducible code examples
with simulated and real data sets, we demonstrate how DoubleML users can
perform valid inference based on machine learning methods.
arXiv link: http://arxiv.org/abs/2103.09603v6
Simultaneous Decorrelation of Matrix Time Series
time series to alleviate the difficulties in modeling and forecasting matrix
time series when $p$ and/or $q$ are large. The resulting transformed matrix
assumes a block structure consisting of several small matrices, and those small
matrix series are uncorrelated across all times. Hence an overall parsimonious
model is achieved by modelling each of those small matrix series separately
without the loss of information on the linear dynamics. Such a parsimonious
model often has better forecasting performance, even when the underlying true
dynamics deviates from the assumed uncorrelated block structure after
transformation. The uniform convergence rates of the estimated transformation
are derived, which vindicate an important virtue of the proposed bilinear
transformation, i.e. it is technically equivalent to the decorrelation of a
vector time series of dimension max$(p,q)$ instead of $p\times q$. The proposed
method is illustrated numerically via both simulated and real data examples.
arXiv link: http://arxiv.org/abs/2103.09411v2
Estimating the Long-Term Effects of Novel Treatments
effects of novel treatments, while only having historical data of older
treatment options. We assume access to a long-term dataset where only past
treatments were administered and a short-term dataset where novel treatments
have been administered. We propose a surrogate based approach where we assume
that the long-term effect is channeled through a multitude of available
short-term proxies. Our work combines three major recent techniques in the
causal machine learning literature: surrogate indices, dynamic treatment effect
estimation and double machine learning, in a unified pipeline. We show that our
method is consistent and provides root-n asymptotically normal estimates under
a Markovian assumption on the data and the observational policy. We use a
data-set from a major corporation that includes customer investments over a
three year period to create a semi-synthetic data distribution where the major
qualitative properties of the real dataset are preserved. We evaluate the
performance of our method and discuss practical challenges of deploying our
formal methodology and how to address them.
arXiv link: http://arxiv.org/abs/2103.08390v2
Mixture composite regression models with multi-type feature selection
claim severity modelling. Claim severity modelling poses several challenges
such as multimodality, heavy-tailedness and systematic effects in data. We
tackle this modelling problem by studying a mixture composite regression model
for simultaneous modeling of attritional and large claims, and for considering
systematic effects in both the mixture components as well as the mixing
probabilities. For model fitting, we present a group-fused regularization
approach that allows us for selecting the explanatory variables which
significantly impact the mixing probabilities and the different mixture
components, respectively. We develop an asymptotic theory for this regularized
estimation approach, and fitting is performed using a novel Generalized
Expectation-Maximization algorithm. We exemplify our approach on real motor
insurance data set.
arXiv link: http://arxiv.org/abs/2103.07200v2
Finding Subgroups with Significant Treatment Effects
to estimate the causal effects of interventions on outcomes of interest. Yet
these outcomes are often noisy, and estimated overall effects can be small or
imprecise. Nevertheless, we may still be able to produce reliable evidence of
the efficacy of an intervention by finding subgroups with significant effects.
In this paper, we propose a machine-learning method that is specifically
optimized for finding such subgroups in noisy data. Unlike available methods
for personalized treatment assignment, our tool is fundamentally designed to
take significance testing into account: it produces a subgroup that is chosen
to maximize the probability of obtaining a statistically significant positive
treatment effect. We provide a computationally efficient implementation using
decision trees and demonstrate its gain over selecting subgroups based on
positive (estimated) treatment effects. Compared to standard tree-based
regression and classification tools, this approach tends to yield higher power
in detecting subgroups affected by the treatment.
arXiv link: http://arxiv.org/abs/2103.07066v2
Estimating the causal effect of an intervention in a time series setting: the C-ARIMA approach
effect of an intervention as a contrast of potential outcomes. In recent years,
several methods have been developed under the RCM to estimate causal effects in
time series settings. None of these makes use of ARIMA models, which are
instead very common in the econometrics literature. In this paper, we propose a
novel approach, C-ARIMA, to define and estimate the causal effect of an
intervention in a time series setting under the RCM. We first formalize the
assumptions enabling the definition, the estimation and the attribution of the
effect to the intervention; we then check the validity of the proposed method
with an extensive simulation study, comparing its performance against a
standard intervention analysis approach. In the empirical application, we use
C-ARIMA to assess the causal effect of a permanent price reduction on
supermarket sales. The CausalArima R package provides an implementation of our
proposed approach.
arXiv link: http://arxiv.org/abs/2103.06740v3
Regression based thresholds in principal loading analysis
variables which have only a small distorting effect on the covariance matrix.
As a special case, principal loading analysis discards variables that are not
correlated with the remaining ones. In multivariate linear regression on the
other hand, predictors that are neither correlated with both the remaining
predictors nor with the dependent variables have a regression coefficients
equal to zero. Hence, if the goal is to select a number of predictors,
variables that do not correlate are discarded as it is also done in principal
loading analysis. That both methods select the same variables occurs not only
for the special case of zero correlation however. We contribute conditions
under which both methods share the same variable selection. Further, we extend
those conditions to provide a choice for the threshold in principal loading
analysis which only follows recommendations based on simulation results so far.
arXiv link: http://arxiv.org/abs/2103.06691v3
Convergence of Computed Dynamic Models with Unbounded Shock
the shock is unbounded. Most dynamic economic models lack a closed-form
solution. As such, approximate solutions by numerical methods are utilized.
Since the researcher cannot directly evaluate the exact policy function and the
associated exact likelihood, it is imperative that the approximate likelihood
asymptotically converges -- as well as to know the conditions of convergence --
to the exact likelihood, in order to justify and validate its usage. In this
regard, Fernandez-Villaverde, Rubio-Ramirez, and Santos (2006) show convergence
of the likelihood, when the shock has compact support. However, compact support
implies that the shock is bounded, which is not an assumption met in most
dynamic economic models, e.g., with normally distributed shocks. This paper
provides theoretical justification for most dynamic models used in the
literature by showing the conditions for convergence of the approximate
invariant measure obtained from numerical simulations to the exact invariant
measure, thus providing the conditions for convergence of the likelihood.
arXiv link: http://arxiv.org/abs/2103.06483v1
Causal inference with misspecified exposure mappings: separating definitions and assumptions
units interact in experiments. Current methods require experimenters to use the
same exposure mappings both to define the effect of interest and to impose
assumptions on the interference structure. However, the two roles rarely
coincide in practice, and experimenters are forced to make the often
questionable assumption that their exposures are correctly specified. This
paper argues that the two roles exposure mappings currently serve can, and
typically should, be separated, so that exposures are used to define effects
without necessarily assuming that they are capturing the complete causal
structure in the experiment. The paper shows that this approach is practically
viable by providing conditions under which exposure effects can be precisely
estimated when the exposures are misspecified. Some important questions remain
open.
arXiv link: http://arxiv.org/abs/2103.06471v2
More Robust Estimators for Instrumental-Variable Panel Designs, With An Application to the Effect of Imports from China on US Employment
non-convex combinations of location-and-period-specific treatment effects.
Thus, those regressions could be biased if effects are heterogeneous. We
propose an alternative instrumental-variable correlated-random-coefficient
(IV-CRC) estimator, that is more robust to heterogeneous effects. We revisit
Autor et al. (2013), who use a first-difference two-stages-least-squares
regression to estimate the effect of imports from China on US manufacturing
employment. Their regression estimates a highly non-convex combination of
effects. Our more robust IV-CRC estimator is small and insignificant. Though
its confidence interval is wide, it significantly differs from the
first-difference two-stages-least-squares estimator.
arXiv link: http://arxiv.org/abs/2103.06437v10
Optimal Targeting in Fundraising: A Causal Machine-Learning Approach
goods. We combine a field experiment and a causal machine-learning approach to
increase a charity's fundraising effectiveness. The approach optimally targets
a fundraising instrument to individuals whose expected donations exceed
solicitation costs. Our results demonstrate that machine-learning-based optimal
targeting allows the charity to substantially increase donations net of
fundraising costs relative to uniform benchmarks in which either everybody or
no one receives the gift. To that end, it (a) should direct its fundraising
efforts to a subset of past donors and (b) never address individuals who were
previously asked but never donated. Further, we show that the benefits of
machine-learning-based optimal targeting even materialize when the charity only
exploits publicly available geospatial information or applies the estimated
optimal targeting rule to later fundraising campaigns conducted in similar
samples. We conclude that charities not engaging in optimal targeting waste
significant resources.
arXiv link: http://arxiv.org/abs/2103.10251v3
Extension of the Lagrange multiplier test for error cross-section independence to large panels with non normal errors
independence in a large panel model where both the number of cross-sectional
units n and the number of time series observations T can be large. The first
contribution of the paper is an enlargement of the test with two extensions:
firstly the new asymptotic normality is derived in a simultaneous limiting
scheme where the two dimensions (n, T) tend to infinity with comparable
magnitudes; second, the result is valid for general error distribution (not
necessarily normal). The second contribution of the paper is a new test
statistic based on the sum of the fourth powers of cross-section correlations
from OLS residuals, instead of their squares used in the Lagrange multiplier
statistic. This new test is generally more powerful, and the improvement is
particularly visible against alternatives with weak or sparse cross-section
dependence. Both simulation study and real data analysis are proposed to
demonstrate the advantages of the enlarged Lagrange multiplier test and the
power enhanced test in comparison with the existing procedures.
arXiv link: http://arxiv.org/abs/2103.06075v1
Portfolio risk allocation through Shapley value
scheme for risk allocation among non-orthogonal risk factors is a natural way
of interpreting the contribution made by each of such factors to overall
portfolio risk. We discuss a Shapley value scheme for allocating risk to
non-orthogonal greeks in a portfolio of derivatives. Such a situation arises,
for example, when using a stochastic volatility model to capture option
volatility smile. We also show that Shapley value allows for a natural method
of interpreting components of enterprise risk measures such as VaR and ES. For
all applications discussed, we derive explicit formulas and / or numerical
algorithms to calculate the allocations.
arXiv link: http://arxiv.org/abs/2103.05453v1
Root-n-consistent Conditional ML estimation of dynamic panel logit models with fixed effects
Likelihood (CML) estimator for all the common parameters in the panel logit
AR(p) model with strictly exogenous covariates and fixed effects. Our CML
estimator (CMLE) converges in probability faster and is more easily computed
than the kernel-weighted CMLE of Honor\'e and Kyriazidou (2000). Next, we
propose a root-n-consistent CMLE for the coefficients of the exogenous
covariates only. We also discuss new CMLEs for the panel logit AR(p) model
without covariates. Finally, we propose CMLEs for multinomial dynamic panel
logit models with and without covariates. All CMLEs are asymptotically normally
distributed.
arXiv link: http://arxiv.org/abs/2103.04973v6
Approximate Bayesian inference and forecasting in huge-dimensional multi-country VARs
multi-country datasets. However, the number of estimated parameters can be
enormous, leading to computational and statistical issues. In this paper, we
develop fast Bayesian methods for estimating PVARs using integrated rotated
Gaussian approximations. We exploit the fact that domestic information is often
more important than international information and group the coefficients
accordingly. Fast approximations are used to estimate the latter while the
former are estimated with precision using Markov chain Monte Carlo techniques.
We illustrate, using a huge model of the world economy, that it produces
competitive forecasts quickly.
arXiv link: http://arxiv.org/abs/2103.04944v2
On a log-symmetric quantile tobit model applied to female labor supply data
the economic literature. This model assumes normality for the error
distribution and is not recommended for cases where positive skewness is
present. Moreover, in regression analysis, it is well-known that a quantile
regression approach allows us to study the influences of the explanatory
variables on the dependent variable considering different quantiles. Therefore,
we propose in this paper a quantile tobit regression model based on
quantile-based log-symmetric distributions. The proposed methodology allows us
to model data with positive skewness (which is not suitable for the classic
tobit model), and to study the influence of the quantiles of interest, in
addition to accommodating heteroscedasticity. The model parameters are
estimated using the maximum likelihood method and an elaborate Monte Carlo
study is performed to evaluate the performance of the estimates. Finally, the
proposed methodology is illustrated using two female labor supply data sets.
The results show that the proposed log-symmetric quantile tobit model has a
better fit than the classic tobit model.
arXiv link: http://arxiv.org/abs/2103.04449v1
The impact of online machine-learning methods on long-term investment decisions and generator utilization in electricity markets
reduce the chances of issues such as load frequency control and the chances of
electricity blackouts. To gain a better understanding of the load that is
likely to be required over the next 24h, estimations under uncertainty are
needed. This is especially difficult in a decentralized electricity market with
many micro-producers which are not under central control.
In this paper, we investigate the impact of eleven offline learning and five
online learning algorithms to predict the electricity demand profile over the
next 24h. We achieve this through integration within the long-term agent-based
model, ElecSim. Through the prediction of electricity demand profile over the
next 24h, we can simulate the predictions made for a day-ahead market. Once we
have made these predictions, we sample from the residual distributions and
perturb the electricity market demand using the simulation, ElecSim. This
enables us to understand the impact of errors on the long-term dynamics of a
decentralized electricity market.
We show we can reduce the mean absolute error by 30% using an online
algorithm when compared to the best offline algorithm, whilst reducing the
required tendered national grid reserve required. This reduction in national
grid reserves leads to savings in costs and emissions. We also show that large
errors in prediction accuracy have a disproportionate error on investments made
over a 17-year time frame, as well as electricity mix.
arXiv link: http://arxiv.org/abs/2103.04327v1
Asymptotic Theory for IV-Based Reinforcement Learning with Potential Endogeneity
all), and then data analysis is carried out. However, with the advancement of
digital technology, decision-makers constantly analyze past data and generate
new data through their decisions. We model this as a Markov decision process
and show that the dynamic interaction between data generation and data analysis
leads to a new type of bias -- reinforcement bias -- that exacerbates the
endogeneity problem in standard data analysis. We propose a class of instrument
variable (IV)-based reinforcement learning (RL) algorithms to correct for the
bias and establish their theoretical properties by incorporating them into a
stochastic approximation (SA) framework. Our analysis accommodates
iterate-dependent Markovian structures and, therefore, can be used to study RL
algorithms with policy improvement. We also provide formulas for inference on
optimal policies of the IV-RL algorithms. These formulas highlight how
intertemporal dependencies of the Markovian environment affect the inference.
arXiv link: http://arxiv.org/abs/2103.04021v3
Autocalibration and Tweedie-dominance for Insurance Pricing with Machine Learning
learning methods for insurance pricing. Often in practice, there are
nevertheless endless debates about the choice of the right loss function to be
used to train the machine learning model, as well as about the appropriate
metric to assess the performances of competing models. Also, the sum of fitted
values can depart from the observed totals to a large extent and this often
confuses actuarial analysts. The lack of balance inherent to training models by
minimizing deviance outside the familiar GLM with canonical link setting has
been empirically documented in W\"uthrich (2019, 2020) who attributes it to the
early stopping rule in gradient descent methods for model fitting. The present
paper aims to further study this phenomenon when learning proceeds by
minimizing Tweedie deviance. It is shown that minimizing deviance involves a
trade-off between the integral of weighted differences of lower partial moments
and the bias measured on a specific scale. Autocalibration is then proposed as
a remedy. This new method to correct for bias adds an extra local GLM step to
the analysis. Theoretically, it is shown that it implements the autocalibration
concept in pure premium calculation and ensures that balance also holds on a
local scale, not only at portfolio level as with existing bias-correction
techniques. The convex order appears to be the natural tool to compare
competing models, putting a new light on the diagnostic graphs and associated
metrics proposed by Denuit et al. (2019).
arXiv link: http://arxiv.org/abs/2103.03635v2
Modeling tail risks of inflation using unobserved component quantile regressions
(TVP) quantile regression (QR) models featuring conditional heteroskedasticity.
I use data augmentation schemes to render the model conditionally Gaussian and
develop an efficient Gibbs sampling algorithm. Regularization of the
high-dimensional parameter space is achieved via flexible dynamic shrinkage
priors. A simple version of TVP-QR based on an unobserved component model is
applied to dynamically trace the quantiles of the distribution of inflation in
the United States, the United Kingdom and the euro area. In an out-of-sample
forecast exercise, I find the proposed model to be competitive and perform
particularly well for higher-order and tail forecasts. A detailed analysis of
the resulting predictive distributions reveals that they are sometimes skewed
and occasionally feature heavy tails.
arXiv link: http://arxiv.org/abs/2103.03632v2
Prediction of financial time series using LSTM and data denoising methods
dealing with the non-stationary and nonlinear characteristics of high-frequency
financial time series data, especially its weak generalization ability, this
paper proposes an ensemble method based on data denoising methods, including
the wavelet transform (WT) and singular spectrum analysis (SSA), and long-term
short-term memory neural network (LSTM) to build a data prediction model, The
financial time series is decomposed and reconstructed by WT and SSA to denoise.
Under the condition of denoising, the smooth sequence with effective
information is reconstructed. The smoothing sequence is introduced into LSTM
and the predicted value is obtained. With the Dow Jones industrial average
index (DJIA) as the research object, the closing price of the DJIA every five
minutes is divided into short-term (1 hour), medium-term (3 hours) and
long-term (6 hours) respectively. . Based on root mean square error (RMSE),
mean absolute error (MAE), mean absolute percentage error (MAPE) and absolute
percentage error standard deviation (SDAPE), the experimental results show that
in the short-term, medium-term and long-term, data denoising can greatly
improve the accuracy and stability of the prediction, and can effectively
improve the generalization ability of LSTM prediction model. As WT and SSA can
extract useful information from the original sequence and avoid overfitting,
the hybrid model can better grasp the sequence pattern of the closing price of
the DJIA. And the WT-LSTM model is better than the benchmark LSTM model and
SSA-LSTM model.
arXiv link: http://arxiv.org/abs/2103.03505v1
Extremal points of Lorenz curves and applications to inequality analysis
compute the maximal $L^1$-distance between Lorenz curves with given values of
their Gini coefficients. As an application we introduce a bidimensional index
that simultaneously measures relative inequality and dissimilarity between two
populations. This proposal employs the Gini indices of the variables and an
$L^1$-distance between their Lorenz curves. The index takes values in a
right-angled triangle, two of whose sides characterize perfect relative
inequality-expressed by the Lorenz ordering between the underlying
distributions. Further, the hypotenuse represents maximal distance between the
two distributions. As a consequence, we construct a chart to, graphically,
either see the evolution of (relative) inequality and distance between two
income distributions over time or to compare the distribution of income of a
specific population between a fixed time point and a range of years. We prove
the mathematical results behind the above claims and provide a full description
of the asymptotic properties of the plug-in estimator of this index. Finally,
we apply the proposed bidimensional index to several real EU-SILC income
datasets to illustrate its performance in practice.
arXiv link: http://arxiv.org/abs/2103.03286v1
High-dimensional estimation of quadratic variation based on penalized realized variance
the quadratic variation (QV) of a high-dimensional continuous It\^{o}
semimartingale. We adapt the principle idea of regularization from linear
regression to covariance estimation in a continuous-time high-frequency
setting. We show that under a nuclear norm penalization, the PRV is computed by
soft-thresholding the eigenvalues of realized variance (RV). It therefore
encourages sparsity of singular values or, equivalently, low rank of the
solution. We prove our estimator is minimax optimal up to a logarithmic factor.
We derive a concentration inequality, which reveals that the rank of PRV is --
with a high probability -- the number of non-negligible eigenvalues of the QV.
Moreover, we also provide the associated non-asymptotic analysis for the spot
variance. We suggest an intuitive data-driven bootstrap procedure to select the
shrinkage parameter. Our theory is supplemented by a simulation study and an
empirical application. The PRV detects about three-five factors in the equity
market, with a notable rank decrease during times of distress in financial
markets. This is consistent with most standard asset pricing models, where a
limited amount of systematic factors driving the cross-section of stock returns
are perturbed by idiosyncratic errors, rendering the QV -- and also RV -- of
full rank.
arXiv link: http://arxiv.org/abs/2103.03237v1
Factor-Based Imputation of Missing Values and Covariances in Panel Data of Large Dimensions
than not, values in some entries of the data matrix are missing. Various
methods have been proposed to handle missing observations in a few variables.
We exploit the factor structure in panel data of large dimensions. Our
tall-project algorithm first estimates the factors from a
tall block in which data for all rows are observed, and projections of
variable specific length are then used to estimate the factor loadings. A
missing value is imputed as the estimated common component which we show is
consistent and asymptotically normal without further iteration. Implications
for using imputed data in factor augmented regressions are then discussed.
To compensate for the downward bias in covariance matrices created by an
omitted noise when the data point is not observed, we overlay the imputed data
with re-sampled idiosyncratic residuals many times and use the average of the
covariances to estimate the parameters of interest. Simulations show that the
procedures have desirable finite sample properties.
arXiv link: http://arxiv.org/abs/2103.03045v3
Theory of Evolutionary Spectra for Heteroskedasticity and Autocorrelation Robust Inference in Possibly Misspecified and Nonstationary Models
autocorrelation robust (HAR) inference when the data may not satisfy
second-order stationarity. Nonstationarity is a common feature of economic time
series which may arise either from parameter variation or model
misspecification. In such a context, the theories that support HAR inference
are either not applicable or do not provide accurate approximations. HAR tests
standardized by existing long-run variance estimators then may display size
distortions and little or no power. This issue can be more severe for methods
that use long bandwidths (i.e., fixed-b HAR tests). We introduce a class of
nonstationary processes that have a time-varying spectral representation which
evolves continuously except at a finite number of time points. We present an
extension of the classical heteroskedasticity and autocorrelation consistent
(HAC) estimators that applies two smoothing procedures. One is over the lagged
autocovariances, akin to classical HAC estimators, and the other is over time.
The latter element is important to flexibly account for nonstationarity. We
name them double kernel HAC (DK-HAC) estimators. We show the consistency of the
estimators and obtain an optimal DK-HAC estimator under the mean squared error
(MSE) criterion. Overall, HAR tests standardized by the proposed DK-HAC
estimators are competitive with fixed-b HAR tests, when the latter work well,
with regards to size control even when there is strong dependence. Notably, in
those empirically relevant situations in which previous HAR tests are
undersized and have little or no power, the DK-HAC estimator leads to tests
that have good size and power.
arXiv link: http://arxiv.org/abs/2103.02981v2
Modeling Macroeconomic Variations After COVID-19
months changed the time series properties of the data in ways that make many
pre-covid forecasting models inadequate. It also creates a new problem for
estimation of economic factors and dynamic causal effects because the
variations around the outbreak can be interpreted as outliers, as shifts to the
distribution of existing shocks, or as addition of new shocks. I take the
latter view and use covid indicators as controls to 'de-covid' the data prior
to estimation. I find that economic uncertainty remains high at the end of 2020
even though real economic activity has recovered and covid uncertainty has
receded. Dynamic responses of variables to shocks in a VAR similar in magnitude
and shape to the ones identified before 2020 can be recovered by directly or
indirectly modeling covid and treating it as exogenous. These responses to
economic shocks are distinctly different from those to a covid shock which are
much larger but shorter lived. Disentangling the two types of shocks can be
important in macroeconomic modeling post-covid.
arXiv link: http://arxiv.org/abs/2103.02732v4
Prewhitened Long-Run Variance Estimation Robust to Nonstationarity
(LRV) estimator for the construction of standard errors robust to
autocorrelation and heteroskedasticity that can be used for hypothesis testing
in a variety of contexts including the linear regression model. Existing
methods either are theoretically valid only under stationarity and have poor
finite-sample properties under nonstationarity (i.e., fixed-b methods), or are
theoretically valid under the null hypothesis but lead to tests that are not
consistent under nonstationary alternative hypothesis (i.e., both fixed-b and
traditional HAC estimators). The proposed estimator accounts explicitly for
nonstationarity, unlike previous prewhitened procedures which are known to be
unreliable, and leads to tests with accurate null rejection rates and good
monotonic power. We also establish MSE bounds for LRV estimation that are
sharper than previously established and use them to determine the
data-dependent bandwidths.
arXiv link: http://arxiv.org/abs/2103.02235v3
Slow-Growing Trees
(SGT), which uses a learning rate to tame CART's greedy algorithm. SGT exploits
the view that CART is an extreme case of an iterative weighted least square
procedure. Moreover, a unifying view of Boosted Trees (BT) and Random Forests
(RF) is presented. Greedy ML algorithms' outcomes can be improved using either
"slow learning" or diversification. SGT applies the former to estimate a single
deep tree, and Booging (bagging stochastic BT with a high learning rate) uses
the latter with additive shallow trees. The performance of this tree ensemble
quaternity (Booging, BT, SGT, RF) is assessed on simulated and real regression
tasks.
arXiv link: http://arxiv.org/abs/2103.01926v2
Theory of Low Frequency Contamination from Nonstationarity and Misspecification: Consequences for HAR Inference
long memory effects) induced by general nonstationarity for estimates such as
the sample autocovariance and the periodogram, and deduce consequences for
heteroskedasticity and autocorrelation robust (HAR) inference. We present
explicit expressions for the asymptotic bias of these estimates. We distinguish
cases where this contamination only occurs as a small-sample problem and cases
where the contamination continues to hold asymptotically. We show theoretically
that nonparametric smoothing over time is robust to low frequency
contamination. Our results provide new insights on the debate between
consistent versus inconsistent long-run variance (LRV) estimation. Existing LRV
estimators tend to be in inflated when the data are nonstationary. This results
in HAR tests that can be undersized and exhibit dramatic power losses. Our
theory indicates that long bandwidths or fixed-b HAR tests suffer more from low
frequency contamination relative to HAR tests based on HAC estimators, whereas
recently introduced double kernel HAC estimators do not super from this
problem. Finally, we present second-order Edgeworth expansions under
nonstationarity about the distribution of HAC and DK-HAC estimators and about
the corresponding t-test in the linear regression model.
arXiv link: http://arxiv.org/abs/2103.01604v3
Network Cluster-Robust Inference
network, researchers often partition the network into clusters in order to
apply cluster-robust inference methods. Existing such methods require clusters
to be asymptotically independent. Under mild conditions, we prove that, for
this requirement to hold for network-dependent data, it is necessary and
sufficient that clusters have low conductance, the ratio of edge boundary size
to volume. This yields a simple measure of cluster quality. We find in
simulations that when clusters have low conductance, cluster-robust methods
control size better than HAC estimators. However, for important classes of
networks lacking low-conductance clusters, the former can exhibit substantial
size distortion. To determine the number of low-conductance clusters and
construct them, we draw on results in spectral graph theory that connect
conductance to the spectrum of the graph Laplacian. Based on these results, we
propose to use the spectrum to determine the number of low-conductance clusters
and spectral clustering to construct them.
arXiv link: http://arxiv.org/abs/2103.01470v4
Some Finite Sample Properties of the Sign Test
First, we show that the sign-test is unbiased with independent, non-identically
distributed data for both one-sided and two-sided hypotheses. The proof for the
two-sided case is based on a novel argument that relates the derivatives of the
power function to a regular bipartite graph. Unbiasedness then follows from the
existence of perfect matchings on such graphs. Second, we provide a simple
theoretical counterexample to show that the sign test over-rejects when the
data exhibits correlation. Our results can be useful for understanding the
properties of approximate randomization tests in settings with few clusters.
arXiv link: http://arxiv.org/abs/2103.01412v2
Standing on the Shoulders of Machine Learning: Can We Improve Hypothesis Testing?
upon modern computational power and classification models from machine
learning. We show that a simple classification algorithm such as a boosted
decision stump can be used to fully recover the full size-power trade-off for
any single test statistic. This recovery implies an equivalence, under certain
conditions, between the basic building block of modern machine learning and
hypothesis testing. Second, we show that more complex algorithms such as the
random forest and gradient boosted machine can serve as mapping functions in
place of the traditional null distribution. This allows for multiple test
statistics and other information to be evaluated simultaneously and thus form a
pseudo-composite hypothesis test. Moreover, we show how practitioners can make
explicit the relative costs of Type I and Type II errors to contextualize the
test into a specific decision framework. To illustrate this approach we revisit
the case of testing for unit roots, a difficult problem in time series
econometrics for which existing tests are known to exhibit low power. Using a
simulation framework common to the literature we show that this approach can
improve upon overall accuracy of the traditional unit root test(s) by seventeen
percentage points, and the sensitivity by thirty six percentage points.
arXiv link: http://arxiv.org/abs/2103.01368v1
Dynamic covariate balancing: estimating treatment effects over time with potential local projections
panel data settings when treatments change dynamically over time.
We propose a method that allows for (i) treatments to be assigned dynamically
over time based on high-dimensional covariates, past outcomes and treatments;
(ii) outcomes and time-varying covariates to depend on treatment trajectories;
(iii) heterogeneity of treatment effects.
Our approach recursively projects potential outcomes' expectations on past
histories. It then controls the bias by balancing dynamically observable
characteristics. We study the asymptotic and numerical properties of the
estimator and illustrate the benefits of the procedure in an empirical
application.
arXiv link: http://arxiv.org/abs/2103.01280v4
The Kernel Trick for Nonlinear Factor Modeling
the common dynamics in a large panel of data with a few latent variables, or
factors, thus alleviating the curse of dimensionality. Despite its popularity
and widespread use for various applications ranging from genomics to finance,
this methodology has predominantly remained linear. This study estimates
factors nonlinearly through the kernel method, which allows flexible
nonlinearities while still avoiding the curse of dimensionality. We focus on
factor-augmented forecasting of a single time series in a high-dimensional
setting, known as diffusion index forecasting in macroeconomics literature. Our
main contribution is twofold. First, we show that the proposed estimator is
consistent and it nests linear PCA estimator as well as some nonlinear
estimators introduced in the literature as specific examples. Second, our
empirical application to a classical macroeconomic dataset demonstrates that
this approach can offer substantial advantages over mainstream methods.
arXiv link: http://arxiv.org/abs/2103.01266v1
Can Machine Learning Catch the COVID-19 Recession?
for the UK, labeled UK-MD and comparable to similar datasets for the US and
Canada, it seems the most promising avenue for forecasting during the pandemic
is to allow for general forms of nonlinearity by using machine learning (ML)
methods. But not all nonlinear ML methods are alike. For instance, some do not
allow to extrapolate (like regular trees and forests) and some do (when
complemented with linear dynamic components). This and other crucial aspects of
ML-based forecasting in unprecedented times are studied in an extensive
pseudo-out-of-sample exercise.
arXiv link: http://arxiv.org/abs/2103.01201v1
BERT based patent novelty search by training claims to their own description
description. By applying this method, BERT trains suitable descriptions for
claims. Such a trained BERT (claim-to-description- BERT) could be able to
identify novelty relevant descriptions for patents. In addition, we introduce a
new scoring scheme, relevance scoring or novelty scoring, to process the output
of BERT in a meaningful way. We tested the method on patent applications by
training BERT on the first claims of patents and corresponding descriptions.
BERT's output has been processed according to the relevance score and the
results compared with the cited X documents in the search reports. The test
showed that BERT has scored some of the cited X documents as highly relevant.
arXiv link: http://arxiv.org/abs/2103.01126v4
Structural models for policy-making: Coping with parametric uncertainty
based on estimated parameters as a stand-in for the true parameters. This
practice ignores uncertainty in the counterfactual policy predictions of the
model. We develop a generic approach that deals with parametric uncertainty
using uncertainty sets and frames model-informed policy-making as a decision
problem under uncertainty. The seminal human capital investment model by Keane
and Wolpin (1997) provides a well-known, influential, and empirically-grounded
test case. We document considerable uncertainty in the models's policy
predictions and highlight the resulting policy recommendations obtained from
using different formal rules of decision-making under uncertainty.
arXiv link: http://arxiv.org/abs/2103.01115v4
Extracting Complements and Substitutes from Sales Data: A Network Perspective
concepts in retail and marketing. Qualitatively, two products are said to be
substitutable if a customer can replace one product by the other, while they
are complementary if they tend to be bought together. In this article, we take
a network perspective to help automatically identify complements and
substitutes from sales transaction data. Starting from a bipartite
product-purchase network representation, with both transaction nodes and
product nodes, we develop appropriate null models to infer significant
relations, either complements or substitutes, between products, and design
measures based on random walks to quantify their importance. The resulting
unipartite networks between products are then analysed with community detection
methods, in order to find groups of similar products for the different types of
relationships. The results are validated by combining observations from a
real-world basket dataset with the existing product hierarchy, as well as a
large-scale flavour compound and recipe dataset.
arXiv link: http://arxiv.org/abs/2103.02042v2
Panel semiparametric quantile regression neural network for electricity consumption forecasting
long-term deepening of reform and opening up. However, the complex regional
economic, social and natural conditions, electricity resources are not evenly
distributed, which accounts for the electricity deficiency in some regions of
China. It is desirable to develop a robust electricity forecasting model.
Motivated by which, we propose a Panel Semiparametric Quantile Regression
Neural Network (PSQRNN) by utilizing the artificial neural network and
semiparametric quantile regression. The PSQRNN can explore a potential linear
and nonlinear relationships among the variables, interpret the unobserved
provincial heterogeneity, and maintain the interpretability of parametric
models simultaneously. And the PSQRNN is trained by combining the penalized
quantile regression with LASSO, ridge regression and backpropagation algorithm.
To evaluate the prediction accuracy, an empirical analysis is conducted to
analyze the provincial electricity consumption from 1999 to 2018 in China based
on three scenarios. From which, one finds that the PSQRNN model performs better
for electricity consumption forecasting by considering the economic and
climatic factors. Finally, the provincial electricity consumptions of the next
$5$ years (2019-2023) in China are reported by forecasting.
arXiv link: http://arxiv.org/abs/2103.00711v1
On the Subbagging Estimation for Massive Data
approaches for big data analysis with memory constraints of computers.
Specifically, for the whole dataset with size $N$, $m_N$ subsamples are
randomly drawn, and each subsample with a subsample size $k_N\ll N$ to meet the
memory constraint is sampled uniformly without replacement. Aggregating the
estimators of $m_N$ subsamples can lead to subbagging estimation. To analyze
the theoretical properties of the subbagging estimator, we adapt the incomplete
$U$-statistics theory with an infinite order kernel to allow overlapping drawn
subsamples in the sampling procedure. Utilizing this novel theoretical
framework, we demonstrate that via a proper hyperparameter selection of $k_N$
and $m_N$, the subbagging estimator can achieve $N$-consistency and
asymptotic normality under the condition $(k_Nm_N)/N\to \alpha \in (0,\infty]$.
Compared to the full sample estimator, we theoretically show that the
$N$-consistent subbagging estimator has an inflation rate of $1/\alpha$
in its asymptotic variance. Simulation experiments are presented to demonstrate
the finite sample performances. An American airline dataset is analyzed to
illustrate that the subbagging estimate is numerically close to the full sample
estimate, and can be computationally fast under the memory constraint.
arXiv link: http://arxiv.org/abs/2103.00631v1
Algorithmic subsampling under multiway clustering
sketching) for multiway cluster dependent data. We establish a new uniform weak
law of large numbers and a new central limit theorem for the multiway
algorithmic subsample means. Consequently, we discover an additional advantage
of the algorithmic subsampling that it allows for robustness against potential
degeneracy, and even non-Gaussian degeneracy, of the asymptotic distribution
under multiway clustering. Simulation studies support this novel result, and
demonstrate that inference with the algorithmic subsampling entails more
accuracy than that without the algorithmic subsampling. Applying these basic
asymptotic theories, we derive the consistency and the asymptotic normality for
the multiway algorithmic subsampling generalized method of moments estimator
and for the multiway algorithmic subsampling M-estimator. We illustrate an
application to scanner data.
arXiv link: http://arxiv.org/abs/2103.00557v4
Confronting Machine Learning With Financial Research
learning for financial research. Machine learning algorithms have been
developed for certain data environments which substantially differ from the one
we encounter in finance. Not only do difficulties arise due to some of the
idiosyncrasies of financial markets, there is a fundamental tension between the
underlying paradigm of machine learning and the research philosophy in
financial economics. Given the peculiar features of financial markets and the
empirical framework within social science, various adjustments have to be made
to the conventional machine learning methodology. We discuss some of the main
challenges of machine learning in finance and examine how these could be
accounted for. Despite some of the challenges, we argue that machine learning
could be unified with financial research to become a robust complement to the
econometrician's toolbox. Moreover, we discuss the various applications of
machine learning in the research process such as estimation, empirical
discovery, testing, causal inference and prediction.
arXiv link: http://arxiv.org/abs/2103.00366v2
Forecasting high-frequency financial time series: an adaptive learning approach with the order book data
with the past studies on the order book and high-frequency data, with
applications to hypothesis testing. In line with the past literature, we
produce brackets of summaries of statistics from the high-frequency bid and ask
data in the CSI 300 Index Futures market and aim to forecast the one-step-ahead
prices. Traditional time series issues, e.g. ARIMA order selection,
stationarity, together with potential financial applications are covered in the
exploratory data analysis, which pave paths to the adaptive learning model. By
designing and running the learning model, we found it to perform well compared
to the top fixed models, and some could improve the forecasting accuracy by
being more stable and resilient to non-stationarity. Applications to hypothesis
testing are shown with a rolling window, and further potential applications to
finance and statistics are outlined.
arXiv link: http://arxiv.org/abs/2103.00264v1
Simultaneous Bandwidths Determination for DK-HAC Estimators and Long-Run Variance Estimation in Nonparametric Settings
double kernel heteroskedasticity and autocorrelation consistent (DK-HAC)
estimators. In addition to the usual smoothing over lagged autocovariances for
classical HAC estimators, the DK-HAC estimator also applies smoothing over the
time direction. We obtain the optimal bandwidths that jointly minimize the
global asymptotic MSE criterion and discuss the trade-off between bias and
variance with respect to smoothing over lagged autocovariances and over time.
Unlike the MSE results of Andrews (1991), we establish how nonstationarity
affects the bias-variance trade-o?. We use the plug-in approach to construct
data-dependent bandwidths for the DK-HAC estimators and compare them with the
DK-HAC estimators from Casini (2021) that use data-dependent bandwidths
obtained from a sequential MSE criterion. The former performs better in terms
of size control, especially with stationary and close to stationary data.
Finally, we consider long-run variance estimation under the assumption that the
series is a function of a nonparametric estimator rather than of a
semiparametric estimator that enjoys the usual T^(1/2) rate of convergence.
Thus, we also establish the validity of consistent long-run variance estimation
in nonparametric parameter estimation settings.
arXiv link: http://arxiv.org/abs/2103.00060v1
Permutation Tests at Nonparametric Rates
exact size in finite samples, but they fail to control size for testing
equality of parameters that summarize each distribution. This paper proposes
permutation tests for equality of parameters that are estimated at root-$n$ or
slower rates. Our general framework applies to both parametric and
nonparametric models, with two samples or one sample split into two subsamples.
Our tests have correct size asymptotically while preserving exact size in
finite samples when distributions are equal. They have no loss in local
asymptotic power compared to tests that use asymptotic critical values. We
propose confidence sets with correct coverage in large samples that also have
exact coverage in finite samples if distributions are equal up to a
transformation. We apply our theory to four commonly-used hypothesis tests of
nonparametric functions evaluated at a point. Lastly, simulations show good
finite sample properties, and two empirical examples illustrate our tests in
practice.
arXiv link: http://arxiv.org/abs/2102.13638v3
General Bayesian time-varying parameter VARs for predicting government bond yields
in the coefficients is determined by a simple stochastic process such as a
random walk. While such models are capable of capturing a wide range of dynamic
patterns, the true nature of time variation might stem from other sources, or
arise from different laws of motion. In this paper, we propose a flexible TVP
VAR that assumes the TVPs to depend on a panel of partially latent covariates.
The latent part of these covariates differ in their state dynamics and thus
capture smoothly evolving or abruptly changing coefficients. To determine which
of these covariates are important, and thus to decide on the appropriate state
evolution, we introduce Bayesian shrinkage priors to perform model selection.
As an empirical application, we forecast the US term structure of interest
rates and show that our approach performs well relative to a set of competing
models. We then show how the model can be used to explain structural breaks in
coefficients related to the US yield curve.
arXiv link: http://arxiv.org/abs/2102.13393v1
Online Multi-Armed Bandits with Adaptive Inference
conduct inference on the true mean reward of each arm based on data collected
so far at each step. However, since the arms are adaptively selected--thereby
yielding non-iid data--conducting inference accurately is not straightforward.
In particular, sample averaging, which is used in the family of UCB and
Thompson sampling (TS) algorithms, does not provide a good choice as it suffers
from bias and a lack of good statistical properties (e.g. asymptotic
normality). Our thesis in this paper is that more sophisticated inference
schemes that take into account the adaptive nature of the sequentially
collected data can unlock further performance gains, even though both UCB and
TS type algorithms are optimal in the worst case. In particular, we propose a
variant of TS-style algorithms--which we call doubly adaptive TS--that
leverages recent advances in causal inference and adaptively reweights the
terms of a doubly robust estimator on the true mean reward of each arm. Through
20 synthetic domain experiments and a semi-synthetic experiment based on data
from an A/B test of a web service, we demonstrate that using an adaptive
inferential scheme (while still retaining the exploration efficacy of TS)
provides clear benefits in online decision making: the proposed DATS algorithm
has superior empirical performance to existing baselines (UCB and TS) in terms
of regret and sample complexity in identifying the best arm. In addition, we
also provide a finite-time regret bound of doubly adaptive TS that matches (up
to log factors) those of UCB and TS algorithms, thereby establishing that its
improved practical benefits do not come at the expense of worst-case
suboptimality.
arXiv link: http://arxiv.org/abs/2102.13202v2
A Control Function Approach to Estimate Panel Data Binary Response Model
model in a triangular system with multiple unobserved heterogeneities The CFs
are the expected values of the heterogeneity terms in the reduced form
equations conditional on the histories of the endogenous and the exogenous
variables. The method requires weaker restrictions compared to CF methods with
similar imposed structures. If the support of endogenous regressors is large,
average partial effects are point-identified even when instruments are
discrete. Bounds are provided when the support assumption is violated. An
application and Monte Carlo experiments compare several alternative methods
with ours.
arXiv link: http://arxiv.org/abs/2102.12927v2
Next Generation Models for Portfolio Risk Management: An Approach Using Financial Big Data
address potential information loss. The proposed model takes advantage of
financial big data to incorporate out-of-target-portfolio information that may
be missed when one considers the Value at Risk (VaR) measures only from certain
assets of the portfolio. We investigate how the curse of dimensionality can be
overcome in the use of financial big data and discuss where and when benefits
occur from a large number of assets. In this regard, the proposed approach is
the first to suggest the use of financial big data to improve the accuracy of
risk analysis. We compare the proposed model with benchmark approaches and
empirically show that the use of financial big data improves small portfolio
risk analysis. Our findings are useful for portfolio managers and financial
regulators, who may seek for an innovation to improve the accuracy of portfolio
risk estimation.
arXiv link: http://arxiv.org/abs/2102.12783v3
Quasi-maximum likelihood estimation of break point in high-dimensional factor models
a single structural break in factor loadings at a common unknown date. First,
we propose a quasi-maximum likelihood (QML) estimator of the change point based
on the second moments of factors, which are estimated by principal component
analysis. We show that the QML estimator performs consistently when the
covariance matrix of the pre- or post-break factor loading, or both, is
singular. When the loading matrix undergoes a rotational type of change while
the number of factors remains constant over time, the QML estimator incurs a
stochastically bounded estimation error. In this case, we establish an
asymptotic distribution of the QML estimator. The simulation results validate
the feasibility of this estimator when used in finite samples. In addition, we
demonstrate empirical applications of the proposed method by applying it to
estimate the break points in a U.S. macroeconomic dataset and a stock return
dataset.
arXiv link: http://arxiv.org/abs/2102.12666v3
Overnight GARCH-Itô Volatility Models
to incorporate high-frequency realized volatilities and better capture market
dynamics. However, because high-frequency trading data are not available during
the close-to-open period, the volatility models often ignore volatility
information over the close-to-open period and thus may suffer from loss of
important information relevant to market dynamics. In this paper, to account
for whole-day market dynamics, we propose an overnight volatility model based
on It\^o diffusions to accommodate two different instantaneous volatility
processes for the open-to-close and close-to-open periods. We develop a
weighted least squares method to estimate model parameters for two different
periods and investigate its asymptotic properties. We conduct a simulation
study to check the finite sample performance of the proposed model and method.
Finally, we apply the proposed approaches to real trading data.
arXiv link: http://arxiv.org/abs/2102.13467v2
Inference in Incomplete Models
identifying assumptions. We show the equivalence of several natural
formulations of correct specification, which we take as our null hypothesis.
From a natural empirical version of the latter, we derive a Kolmogorov-Smirnov
statistic for Choquet capacity functionals, which we use to construct our test.
We derive the limiting distribution of our test statistic under the null, and
show that our test is consistent against certain classes of alternatives. When
the model is given in parametric form, the test can be inverted to yield
confidence regions for the identified parameter set. The approach can be
applied to the estimation of models with sample selection, censored observables
and to games with multiple equilibria.
arXiv link: http://arxiv.org/abs/2102.12257v1
Set Identification in Models with Multiple Equilibria
of models with multiple equilibria in pure or mixed strategies. It is shown
that in the case of Shapley regular normal form games, the identified set is
characterized by the inclusion of the true data distribution within the core of
a Choquet capacity, which is interpreted as the generalized likelihood of the
model. In turn, this inclusion is characterized by a finite set of inequalities
and efficient and easily implementable combinatorial methods are described to
check them. In all normal form games, the identified set is characterized in
terms of the value of a submodular or convex optimization program. Efficient
algorithms are then given and compared to check inclusion of a parameter in
this identified set. The latter are illustrated with family bargaining games
and oligopoly entry games.
arXiv link: http://arxiv.org/abs/2102.12249v1
Deep Video Prediction for Time Series Forecasting
this work, we address the challenge of predicting prices evolution among
multiple potentially interacting financial assets. A solution to this problem
has obvious importance for governments, banks, and investors. Statistical
methods such as Auto Regressive Integrated Moving Average (ARIMA) are widely
applied to these problems. In this paper, we propose to approach economic time
series forecasting of multiple financial assets in a novel way via video
prediction. Given past prices of multiple potentially interacting financial
assets, we aim to predict the prices evolution in the future. Instead of
treating the snapshot of prices at each time point as a vector, we spatially
layout these prices in 2D as an image, such that we can harness the power of
CNNs in learning a latent representation for these financial assets. Thus, the
history of these prices becomes a sequence of images, and our goal becomes
predicting future images. We build on a state-of-the-art video prediction
method for forecasting future images. Our experiments involve the prediction
task of the price evolution of nine financial assets traded in U.S. stock
markets. The proposed method outperforms baselines including ARIMA, Prophet,
and variations of the proposed method, demonstrating the benefits of harnessing
the power of CNNs in the problem of economic time series forecasting.
arXiv link: http://arxiv.org/abs/2102.12061v2
Hierarchical Regularizers for Mixed-Frequency Vector Autoregressions
variables recorded at different frequencies. However, as the number of series
and high-frequency observations per low-frequency period grow, MF-VARs suffer
from the "curse of dimensionality". We curb this curse through a regularizer
that permits hierarchical sparsity patterns by prioritizing the inclusion of
coefficients according to the recency of the information they contain.
Additionally, we investigate the presence of nowcasting relations by sparsely
estimating the MF-VAR error covariance matrix. We study predictive Granger
causality relations in a MF-VAR for the U.S. economy and construct a coincident
indicator of GDP growth. Supplementary Materials for this article are available
online.
arXiv link: http://arxiv.org/abs/2102.11780v2
Non-stationary GARCH modelling for fitting higher order moments of financial series within moving time windows
moments for different companies' stock prices. When we assume a gaussian
conditional distribution, we fail to capture any empirical data when fitting
the first three even moments of financial time series. We show instead that a
double gaussian conditional probability distribution better captures the higher
order moments of the data. To demonstrate this point, we construct regions
(phase diagrams), in the fourth and sixth order standardised moment space,
where a GARCH(1,1) model can be used to fit these moments and compare them with
the corresponding moments from empirical data for different sectors of the
economy. We found that the ability of the GARCH model with a double gaussian
conditional distribution to fit higher order moments is dictated by the time
window our data spans. We can only fit data collected within specific time
window lengths and only with certain parameters of the conditional double
gaussian distribution. In order to incorporate the non-stationarity of
financial series, we assume that the parameters of the GARCH model have time
dependence.
arXiv link: http://arxiv.org/abs/2102.11627v4
Bridging factor and sparse models
low-dimensional structure in high-dimensions. However, they are seemingly
mutually exclusive. We propose a lifting method that combines the merits of
these two models in a supervised learning methodology that allows for
efficiently exploring all the information in high-dimensional datasets. The
method is based on a flexible model for high-dimensional panel data, called
factor-augmented regression model with observable and/or latent common factors,
as well as idiosyncratic components. This model not only includes both
principal component regression and sparse regression as specific models but
also significantly weakens the cross-sectional dependence and facilitates model
selection and interpretability. The method consists of several steps and a
novel test for (partial) covariance structure in high dimensions to infer the
remaining cross-section dependence at each step. We develop the theory for the
model and demonstrate the validity of the multiplier bootstrap for testing a
high-dimensional (partial) covariance structure. The theory is supported by a
simulation study and applications.
arXiv link: http://arxiv.org/abs/2102.11341v4
Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem
missing data. However, applications of imputation often lack a firm foundation
in statistical theory. This paper originated when we were unable to find
analysis substantiating claims that imputation of missing data has good
frequentist properties when data are missing at random (MAR). We focused on the
use of observed covariates to impute missing covariates when estimating
conditional means of the form E(y|x, w). Here y is an outcome whose
realizations are always observed, x is a covariate whose realizations are
always observed, and w is a covariate whose realizations are sometimes
unobserved. We examine the probability limit of simple imputation estimates of
E(y|x, w) as sample size goes to infinity. We find that these estimates are not
consistent when covariate data are MAR. To the contrary, the estimates suffer
from a shrinkage problem. They converge to points intermediate between the
conditional mean of interest, E(y|x, w), and the mean E(y|x) that conditions
only on x. We use a type of genotype imputation to illustrate.
arXiv link: http://arxiv.org/abs/2102.11334v1
Estimating Sibling Spillover Effects with Unobserved Confounding Using Gain-Scores
health-related sibling spillover effects, or the effect of one individual's
exposure on their sibling's outcome. The health and health care of family
members may be inextricably confounded by unobserved factors, rendering
identification of spillover effects within families particularly challenging.
We demonstrate a gain-score regression method for identifying
exposure-to-outcome spillover effects within sibling pairs in a linear fixed
effects framework. The method can identify the exposure-to-outcome spillover
effect if only one sibling's exposure affects the other's outcome; and it
identifies the difference between the spillover effects if both siblings'
exposures affect the others' outcomes. The method fails in the presence of
outcome-to-exposure spillover and outcome-to-outcome spillover. Analytic
results and Monte Carlo simulations demonstrate the method and its limitations.
To exercise this method, we estimate the spillover effect of a child's preterm
birth on an older sibling's literacy skills, measured by the Phonological
Awarenesses Literacy Screening-Kindergarten test. We analyze 20,010 sibling
pairs from a population-wide, Wisconsin-based (United States) birth cohort.
Without covariate adjustment, we estimate that preterm birth modestly decreases
an older sibling's test score (-2.11 points; 95% confidence interval: -3.82,
-0.40 points). In conclusion, gain-scores are a promising strategy for
identifying exposure-to-outcome spillovers in sibling pairs while controlling
for sibling-invariant unobserved confounding in linear settings.
arXiv link: http://arxiv.org/abs/2102.11150v3
Kernel Ridge Riesz Representers: Generalization, Mis-specification, and the Counterfactual Effective Dimension
effects, based on the idea of balancing covariates for the treated group and
untreated group in feature space, often with ridge regularization. Previous
works on the classical kernel ridge balancing weights have certain limitations:
(i) not articulating generalization error for the balancing weights, (ii)
typically requiring correct specification of features, and (iii) justifying
Gaussian approximation for only average effects.
I interpret kernel balancing weights as kernel ridge Riesz representers
(KRRR) and address these limitations via a new characterization of the
counterfactual effective dimension. KRRR is an exact generalization of kernel
ridge regression and kernel ridge balancing weights. I prove strong properties
similar to kernel ridge regression: population $L_2$ rates controlling
generalization error, and a standalone closed form solution that can
interpolate. The framework relaxes the stringent assumption that the underlying
regression model is correctly specified by the features. It extends Gaussian
approximation beyond average effects to heterogeneous effects, justifying
confidence sets for causal functions. I use KRRR to quantify uncertainty for
heterogeneous treatment effects, by age, of 401(k) eligibility on assets.
arXiv link: http://arxiv.org/abs/2102.11076v4
Cointegrated Solutions of Unit-Root VARs: An Extended Representation Theorem
A specific algebraic technique is devised to recover stationarity from the
solution of the model in the form of a cointegrating transformation. Closed
forms of the results of interest are derived for integrated processes up to the
4-th order. An extension to higher-order processes turns out to be within the
reach on an induction argument.
arXiv link: http://arxiv.org/abs/2102.10626v1
A Novel Multi-Period and Multilateral Price Index
a multi-period or a multilateral framework, is presented. The index turns out
to be the generalized least squares solution of a regression model linking
values and quantities of the commodities. The index reference basket, which is
the union of the intersections of the baskets of all country/period taken in
pair, has a coverage broader than extant indices. The properties of the index
are investigated and updating formulas established. Applications to both real
and simulated data provide evidence of the better index performance in
comparison with extant alternatives.
arXiv link: http://arxiv.org/abs/2102.10528v1
Estimation and Inference by Stochastic Optimization: Three Examples
resampled Newton-Raphson (rNR) and resampled quasi-Newton (rqN) algorithms
which speed-up estimation and bootstrap inference for structural models. An
empirical application to BLP shows that computation time decreases from nearly
5 hours with the standard bootstrap to just over 1 hour with rNR, and only 15
minutes using rqN. A first Monte-Carlo exercise illustrates the accuracy of the
method for estimation and inference in a probit IV regression. A second
exercise additionally illustrates statistical efficiency gains relative to
standard estimation for simulation-based estimation using a dynamic panel
regression example.
arXiv link: http://arxiv.org/abs/2102.10443v1
Logarithmic Regret in Feature-based Dynamic Pricing
prices for highly differentiated products with applications in digital
marketing, online sales, real estate and so on. The problem was formally
studied as an online learning problem [Javanmard & Nazerzadeh, 2019] where a
seller needs to propose prices on the fly for a sequence of $T$ products based
on their features $x$ while having a small regret relative to the best --
"omniscient" -- pricing strategy she could have come up with in hindsight. We
revisit this problem and provide two algorithms (EMLP and ONSP) for stochastic
and adversarial feature settings, respectively, and prove the optimal
$O(dT)$ regret bounds for both. In comparison, the best existing results
are $O\left(\min\left\{1{\lambda_{\min}^2}T,
T\right\}\right)$ and $O(T^{2/3})$ respectively, with $\lambda_{\min}$
being the smallest eigenvalue of $E[xx^T]$ that could be arbitrarily
close to $0$. We also prove an $\Omega(T)$ information-theoretic lower
bound for a slightly more general setting, which demonstrates that
"knowing-the-demand-curve" leads to an exponential improvement in feature-based
dynamic pricing.
arXiv link: http://arxiv.org/abs/2102.10221v2
Monitoring the pandemic: A fractional filter for the COVID-19 contact rate
of a Susceptible-Infected-Recovered (SIR) model. From observable data on
confirmed, recovered, and deceased cases, a noisy measurement for the contact
rate can be constructed. To filter out measurement errors and seasonality, a
novel unobserved components (UC) model is set up. It specifies the log contact
rate as a latent, fractionally integrated process of unknown integration order.
The fractional specification reflects key characteristics of aggregate social
behavior such as strong persistence and gradual adjustments to new information.
A computationally simple modification of the Kalman filter is introduced and is
termed the fractional filter. It allows to estimate UC models with richer
long-run dynamics, and provides a closed-form expression for the prediction
error of UC models. Based on the latter, a conditional-sum-of-squares (CSS)
estimator for the model parameters is set up that is shown to be consistent and
asymptotically normally distributed. The resulting contact rate estimates for
several countries are well in line with the chronology of the pandemic, and
allow to identify different contact regimes generated by policy interventions.
As the fractional filter is shown to provide precise contact rate estimates at
the end of the sample, it bears great potential for monitoring the pandemic in
real time.
arXiv link: http://arxiv.org/abs/2102.10067v1
Approximate Bayes factors for unit root testing
testing in financial time series. We propose a convenient approximation of the
Bayes factor in terms of the Bayesian Information Criterion as a
straightforward and effective strategy for testing the unit root hypothesis.
Our approximate approach relies on few assumptions, is of general
applicability, and preserves a satisfactory error rate. Among its advantages,
it does not require the prior distribution on model's parameters to be
specified. Our simulation study and empirical application on real exchange
rates show great accordance between the suggested simple approach and both
Bayesian and non-Bayesian alternatives.
arXiv link: http://arxiv.org/abs/2102.10048v2
Spatial Correlation Robust Inference
many forms of spatial correlation. The interval has the familiar `estimator
plus and minus a standard error times a critical value' form, but we propose
new methods for constructing the standard error and the critical value. The
standard error is constructed using population principal components from a
given `worst-case' spatial covariance model. The critical value is chosen to
ensure coverage in a benchmark parametric model for the spatial correlations.
The method is shown to control coverage in large samples whenever the spatial
correlation is weak, i.e., with average pairwise correlations that vanish as
the sample size gets large. We also provide results on correct coverage in a
restricted but nonparametric class of strong spatial correlations, as well as
on the efficiency of the method. In a design calibrated to match economic
activity in U.S. states the method outperforms previous suggestions for
spatially robust inference about the population mean.
arXiv link: http://arxiv.org/abs/2102.09353v1
Deep Structural Estimation: With an Application to Option Pricing
surrogate of an economic model with deep neural networks. Our methodology
alleviates the curse of dimensionality and speeds up the evaluation and
parameter estimation by orders of magnitudes, which significantly enhances
one's ability to conduct analyses that require frequent parameter
re-estimation. As an empirical application, we compare two popular option
pricing models (the Heston and the Bates model with double-exponential jumps)
against a non-parametric random forest model. We document that: a) the Bates
model produces better out-of-sample pricing on average, but both structural
models fail to outperform random forest for large areas of the volatility
surface; b) random forest is more competitive at short horizons (e.g., 1-day),
for short-dated options (with less than 7 days to maturity), and on days with
poor liquidity; c) both structural models outperform random forest in
out-of-sample delta hedging; d) the Heston model's relative performance has
deteriorated significantly after the 2008 financial crisis.
arXiv link: http://arxiv.org/abs/2102.09209v1
On the implementation of Approximate Randomization Tests in Linear Models with a Small Number of Clusters
randomization tests developed in Canay, Romano, and Shaikh (2017) when
specialized to linear regressions with clustered data. An important feature of
the methodology is that it applies to settings in which the number of clusters
is small -- even as small as five. We provide a step-by-step algorithmic
description of how to implement the test and construct confidence intervals for
the parameter of interest. In doing so, we additionally present three novel
results concerning the methodology: we show that the method admits an
equivalent implementation based on weighted scores; we show the test and
confidence intervals are invariant to whether the test statistic is studentized
or not; and we prove convexity of the confidence intervals for scalar
parameters. We also articulate the main requirements underlying the test,
emphasizing in particular common pitfalls that researchers may encounter.
Finally, we illustrate the use of the methodology with two applications that
further illuminate these points. The companion {\tt R} and {\tt Stata} packages
facilitate the implementation of the methodology and the replication of the
empirical exercises.
arXiv link: http://arxiv.org/abs/2102.09058v4
Big Data meets Causal Survey Research: Understanding Nonresponse in the Recruitment of a Mixed-mode Online Panel
their research as digitization makes it much easier to construct
high-dimensional (or "big") data sets through tools such as online surveys and
mobile applications. Machine learning methods are able to handle such data, and
they have been successfully applied to solve predictive problems.
However, in many situations, survey statisticians want to learn about
causal relationships to draw conclusions and be able to transfer the
findings of one survey to another. Standard machine learning methods provide
biased estimates of such relationships. We introduce into survey statistics the
double machine learning approach, which gives approximately unbiased estimators
of causal parameters, and show how it can be used to analyze survey nonresponse
in a high-dimensional panel setting.
arXiv link: http://arxiv.org/abs/2102.08994v1
Adaptive Doubly Robust Estimator from Non-stationary Logging Policy under a Convergence of Average Probability
and multi-armed bandit algorithms, have garnered attention in various
applications, such as social experiments, clinical trials, and online
advertisement optimization. This paper considers estimating the mean outcome of
an action from samples obtained in adaptive experiments. In causal inference,
the mean outcome of an action has a crucial role, and the estimation is an
essential task, where the average treatment effect estimation and off-policy
value estimation are its variants. In adaptive experiments, the probability of
choosing an action (logging policy) is allowed to be sequentially updated based
on past observations. Due to this logging policy depending on the past
observations, the samples are often not independent and identically distributed
(i.i.d.), making developing an asymptotically normal estimator difficult. A
typical approach for this problem is to assume that the logging policy
converges in a time-invariant function. However, this assumption is restrictive
in various applications, such as when the logging policy fluctuates or becomes
zero at some periods. To mitigate this limitation, we propose another
assumption that the average logging policy converges to a time-invariant
function and show the doubly robust (DR) estimator's asymptotic normality.
Under the assumption, the logging policy itself can fluctuate or be zero for
some actions. We also show the empirical properties by simulations.
arXiv link: http://arxiv.org/abs/2102.08975v2
Testing for Nonlinear Cointegration under Heteroskedasticity
nonlinear cointegration in the presence of variance breaks. We build on
cointegration test approaches under heteroskedasticity (Cavaliere and Taylor,
2006, Journal of Time Series Analysis) and nonlinearity, serial correlation,
and endogeneity (Choi and Saikkonen, 2010, Econometric Theory) to propose a
bootstrap test and prove its consistency. A Monte Carlo study shows the
approach to have satisfactory finite-sample properties in a variety of
scenarios. We provide an empirical application to the environmental Kuznets
curves (EKC), finding that the cointegration test provides little evidence for
the EKC hypothesis. Additionally, we examine a nonlinear relation between the
US money demand and the interest rate, finding that our test does not reject
the null of a smooth transition cointegrating relation
arXiv link: http://arxiv.org/abs/2102.08809v5
On-Demand Transit User Preference Analysis using Hybrid Choice Models
transit (FRT) services into on-demand transit (ODT) services, there exists a
strong need for a comprehensive evaluation of the effects of this shift on the
users. Such an analysis can help the municipalities and service providers to
design and operate more convenient, attractive, and sustainable transit
solutions. To understand the user preferences, we developed three hybrid choice
models: integrated choice and latent variable (ICLV), latent class (LC), and
latent class integrated choice and latent variable (LC-ICLV) models. We used
these models to analyze the public transit user's preferences in Belleville,
Ontario, Canada. Hybrid choice models were estimated using a rich dataset that
combined the actual level of service attributes obtained from Belleville's ODT
service and self-reported usage behaviour obtained from a revealed preference
survey of the ODT users. The latent class models divided the users into two
groups with different travel behaviour and preferences. The results showed that
the captive user's preference for ODT service was significantly affected by the
number of unassigned trips, in-vehicle time, and main travel mode before the
ODT service started. On the other hand, the non-captive user's service
preference was significantly affected by the Time Sensitivity and the Online
Service Satisfaction latent variables, as well as the performance of the ODT
service and trip purpose. This study attaches importance to improving the
reliability and performance of the ODT service and outlines directions for
reducing operational costs by updating the required fleet size and assigning
more vehicles for work-related trips.
arXiv link: http://arxiv.org/abs/2102.08256v2
LATE for History
historical phenomenon or leverage this persistence to identify causal
relationships of interest in the present. In this chapter, we analyze the
implications of allowing for heterogeneous treatment effects in these studies.
We delineate their common empirical structure, argue that heterogeneous
treatment effects are likely in their context, and propose minimal abstract
models that help interpret results and guide the development of empirical
strategies to uncover the mechanisms generating the effects.
arXiv link: http://arxiv.org/abs/2102.08174v1
A Unified Framework for Specification Tests of Continuous Treatment Effect Models
treatment effect models. We assume a general residual function, which includes
the average and quantile treatment effect models as special cases. The null
models are identified under the unconfoundedness condition and contain a
nonparametric weighting function. We propose a test statistic for the null
model in which the weighting function is estimated by solving an expanding set
of moment equations. We establish the asymptotic distributions of our test
statistic under the null hypothesis and under fixed and local alternatives. The
proposed test statistic is shown to be more efficient than that constructed
from the true weighting function and can detect local alternatives deviated
from the null models at the rate of $O(N^{-1/2})$. A simulation method is
provided to approximate the null distribution of the test statistic.
Monte-Carlo simulations show that our test exhibits a satisfactory
finite-sample performance, and an application shows its practical value.
arXiv link: http://arxiv.org/abs/2102.08063v2
Constructing valid instrumental variables in generalized linear causal models from directed acyclic graphs
variables can deal with unobserved sources of both variable errors, variable
omissions, and sampling bias, and still arrive at consistent estimates of
average treatment effects. The only problem is to find the valid instruments.
Using the definition of Pearl (2009) of valid instrumental variables, a formal
condition for validity can be stated for variables in generalized linear causal
models. The condition can be applied in two different ways: As a tool for
constructing valid instruments, or as a foundation for testing whether an
instrument is valid. When perfectly valid instruments are not found, the
squared bias of the IV-estimator induced by an imperfectly valid instrument --
estimated with bootstrapping -- can be added to its empirical variance in a
mean-square-error-like reliability measure.
arXiv link: http://arxiv.org/abs/2102.08056v1
Entropy methods for identifying hedonic models
First, it makes use of Queyranne's reformulation of a hedonic model in the
discrete case as a network flow problem in order to provide a proof of
existence and integrality of a hedonic equilibrium and efficient computation of
hedonic prices. Second, elaborating on entropic methods developed in Galichon
and Salani\'{e} (2014), this paper proposes a new identification strategy for
hedonic models in a single market. This methodology allows one to introduce
heterogeneities in both consumers' and producers' attributes and to recover
producers' profits and consumers' utilities based on the observation of
production and consumption patterns and the set of hedonic prices.
arXiv link: http://arxiv.org/abs/2102.07491v1
A Distance Covariance-based Estimator
condition of conventional instrumental variable (IV) methods, allowing
endogenous covariates to be weakly correlated, uncorrelated, or even
mean-independent, though not independent of instruments. As a result, the
estimator can exploit the maximum number of relevant instruments in any given
empirical setting. Identification is feasible without excludability, and the
disturbance term does not need to possess finite moments. Identification is
achieved under a weak conditional median independence condition on pairwise
differences in disturbances, along with mild regularity conditions.
Furthermore, the estimator is shown to be consistent and asymptotically normal.
The relevance condition required for identification is shown to be testable.
arXiv link: http://arxiv.org/abs/2102.07008v3
Statistical Power for Estimating Treatment Effects Using Difference-in-Differences and Comparative Interrupted Time Series Designs with Variation in Treatment Timing
for commonly used difference-in-differences (DID) and comparative interrupted
time series (CITS) panel data estimators. The main contribution is to
incorporate variation in treatment timing into the analysis. The power formulas
also account for other key design features that arise in practice:
autocorrelated errors, unequal measurement intervals, and clustering due to the
unit of treatment assignment. We consider power formulas for both
cross-sectional and longitudinal models and allow for covariates. An
illustrative power analysis provides guidance on appropriate sample sizes. The
key finding is that accounting for treatment timing increases required sample
sizes. Further, DID estimators have considerably more power than standard CITS
and ITS estimators. An available Shiny R dashboard performs the sample size
calculations for the considered estimators.
arXiv link: http://arxiv.org/abs/2102.06770v2
Linear programming approach to nonparametric inference under shape restrictions: with an application to regression kink designs
regression functions under shape constraints. This method can be implemented
via a linear programming, and it is thus computationally appealing. We
illustrate a usage of our proposed method with an application to the regression
kink design (RKD). Econometric analyses based on the RKD often suffer from wide
confidence intervals due to slow convergence rates of nonparametric derivative
estimators. We demonstrate that economic models and structures motivate shape
restrictions, which in turn contribute to shrinking the confidence interval for
an analysis of the causal effects of unemployment insurance benefits on
unemployment durations.
arXiv link: http://arxiv.org/abs/2102.06586v1
Identification and Inference Under Narrative Restrictions
restrictions', which are inequality restrictions on functions of the structural
shocks in specific periods. These restrictions raise novel problems related to
identification and inference, and there is currently no frequentist procedure
for conducting inference in these models. We propose a solution that is valid
from both Bayesian and frequentist perspectives by: 1) formalizing the
identification problem under narrative restrictions; 2) correcting a feature of
the existing (single-prior) Bayesian approach that can distort inference; 3)
proposing a robust (multiple-prior) Bayesian approach that is useful for
assessing and eliminating the posterior sensitivity that arises in these models
due to the likelihood having flat regions; and 4) showing that the robust
Bayesian approach has asymptotic frequentist validity. We illustrate our
methods by estimating the effects of US monetary policy under a variety of
narrative restrictions.
arXiv link: http://arxiv.org/abs/2102.06456v1
Inference on two component mixtures under tail restrictions
two-component mixtures and we show that they are nonparametrically point
identified by a combination of an exclusion restriction and tail restrictions.
Our identification analysis suggests simple closed-form estimators of the
component distributions and mixing proportions, as well as a specification
test. We derive their asymptotic properties using results on tail empirical
processes and we present a simulation study that documents their finite-sample
performance.
arXiv link: http://arxiv.org/abs/2102.06232v1
Interactive Network Visualization of Opioid Crisis Related Data- Policy, Pharmaceutical, Training, and More
by evidence from linking and analyzing multiple data sources. This paper
discusses how 20 available resources can be combined to answer pressing public
health questions related to the crisis. It presents a network view based on
U.S. geographical units and other standard concepts, crosswalked to communicate
the coverage and interlinkage of these resources. These opioid-related datasets
can be grouped by four themes: (1) drug prescriptions, (2) opioid related
harms, (3) opioid treatment workforce, jobs, and training, and (4) drug policy.
An interactive network visualization was created and is freely available
online; it lets users explore key metadata, relevant scholarly works, and data
interlinkages in support of informed decision making through data analysis.
arXiv link: http://arxiv.org/abs/2102.05596v1
Sharp Sensitivity Analysis for Inverse Propensity Weighting via Quantile Balancing
treatment effects from observational data. However, its correctness relies on
the untestable (and frequently implausible) assumption that all confounders
have been measured. This paper introduces a robust sensitivity analysis for IPW
that estimates the range of treatment effects compatible with a given amount of
unobserved confounding. The estimated range converges to the narrowest possible
interval (under the given assumptions) that must contain the true treatment
effect. Our proposal is a refinement of the influential sensitivity analysis by
Zhao, Small, and Bhattacharya (2019), which we show gives bounds that are too
wide even asymptotically. This analysis is based on new partial identification
results for Tan (2006)'s marginal sensitivity model.
arXiv link: http://arxiv.org/abs/2102.04543v3
Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy
potential unobserved predictors. This paper proposes a general algorithm that
assesses how the omission of an unobserved variable with high explanatory power
could affect the predictions of the model. Moreover, the algorithm extends the
usage of machine learning from pointwise predictions to inference and
sensitivity analysis. In the application, we show how the framework can be
applied to data with inherent uncertainty, such as students' scores in a
standardized assessment on financial literacy. First, using Bayesian Additive
Regression Trees (BART), we predict students' financial literacy scores (FLS)
for a subgroup of students with missing FLS. Then, we assess the sensitivity of
predictions by comparing the predictions and performance of models with and
without a highly explanatory synthetic predictor. We find no significant
difference in the predictions and performances of the augmented (i.e., the
model with the synthetic predictor) and original model. This evidence sheds a
light on the stability of the predictive model used in the application. The
proposed methodology can be used, above and beyond our motivating empirical
example, in a wide range of machine learning applications in social and health
sciences.
arXiv link: http://arxiv.org/abs/2102.04382v1
Duality in dynamic discrete-choice models
identification and estimation of discrete choice models which we call the Mass
Transport Approach (MTA). We show that the conditional choice probabilities and
the choice-specific payoffs in these models are related in the sense of
conjugate duality, and that the identification problem is a mass transport
problem. Based on this, we propose a new two-step estimator for these models;
interestingly, the first step of our estimator involves solving a linear
program which is identical to the classic assignment (two-sided matching) game
of Shapley and Shubik (1971). The application of convex-analytic tools to
dynamic discrete choice models, and the connection with two-sided matching
models, is new in the literature.
arXiv link: http://arxiv.org/abs/2102.06076v2
Extreme dependence for multivariate data
between two random vectors which relies on the extremality of the
cross-covariance matrix between these two vectors. Using a partial ordering on
the cross-covariance matrices, we also generalize the notion of positive upper
dependence. We then proposes a means to quantify the strength of the dependence
between two given multivariate series and to increase this strength while
preserving the marginal distributions. This allows for the design of
stress-tests of the dependence between two sets of financial variables, that
can be useful in portfolio management or derivatives pricing.
arXiv link: http://arxiv.org/abs/2102.04461v1
Dilation bootstrap
identified models of general form. The region is obtained by inverting a test
of internal consistency of the econometric structure. We develop a dilation
bootstrap methodology to deal with sampling uncertainty without reference to
the hypothesized economic structure. It requires bootstrapping the quantile
process for univariate data and a novel generalization of the latter to higher
dimensions. Once the dilation is chosen to control the confidence level, the
unknown true distribution of the observed data can be replaced by the known
empirical distribution and confidence regions can then be obtained as in
Galichon and Henry (2011) and Beresteanu, Molchanov and Molinari (2011).
arXiv link: http://arxiv.org/abs/2102.04457v1
Optimal transportation and the falsifiability of incompletely specified economic models
based on a sample of their observable components. It is shown that, when the
restrictions implied by the economic theory are insufficient to identify the
unknown quantities of the structure, the duality of optimal transportation with
zero-one cost function delivers interpretable and operational formulations of
the hypothesis of specification correctness from which tests can be constructed
to falsify the model.
arXiv link: http://arxiv.org/abs/2102.04162v2
A test of non-identifying restrictions and confidence regions for partially identified parameters
theoretical restrictions on the relationship between economic variables, which
do not necessarily identify the data generating process. The restrictions can
be derived from any model of interactions, allowing censoring and multiple
equilibria. When the restrictions are parameterized, the test can be inverted
to yield confidence regions for partially identified parameters, thereby
complementing other proposals, primarily Chernozhukov et al. [Chernozhukov, V.,
Hong, H., Tamer, E., 2007. Estimation and confidence regions for parameter sets
in econometric models. Econometrica 75, 1243-1285].
arXiv link: http://arxiv.org/abs/2102.04151v1
A note on global identification in structural vector autoregressions
literature, Rubio-Ramirez, Waggoner, and Zha (2010, `Structural Vector
Autoregressions: Theory of Identification and Algorithms for Inference,' Review
of Economic Studies) shows a necessary and sufficient condition for equality
restrictions to globally identify the structural parameters of a SVAR. The
simplest form of the necessary and sufficient condition shown in Theorem 7 of
Rubio-Ramirez et al (2010) checks the number of zero restrictions and the ranks
of particular matrices without requiring knowledge of the true value of the
structural or reduced-form parameters. However, this note shows by
counterexample that this condition is not sufficient for global identification.
Analytical investigation of the counterexample clarifies why their sufficiency
claim breaks down. The problem with the rank condition is that it allows for
the possibility that restrictions are redundant, in the sense that one or more
restrictions may be implied by other restrictions, in which case the implied
restriction contains no identifying information. We derive a modified necessary
and sufficient condition for SVAR global identification and clarify how it can
be assessed in practice.
arXiv link: http://arxiv.org/abs/2102.04048v2
Inference under Covariate-Adaptive Randomization with Imperfect Compliance
covariate-adaptive randomization (CAR) and imperfect compliance of a binary
treatment. In this context, we study inference on the LATE. As in Bugni et al.
(2018,2019), CAR refers to randomization schemes that first stratify according
to baseline covariates and then assign treatment status so as to achieve
"balance" within each stratum. In contrast to these papers, however, we allow
participants of the RCT to endogenously decide to comply or not with the
assigned treatment status.
We study the properties of an estimator of the LATE derived from a "fully
saturated" IV linear regression, i.e., a linear regression of the outcome on
all indicators for all strata and their interaction with the treatment
decision, with the latter instrumented with the treatment assignment. We show
that the proposed LATE estimator is asymptotically normal, and we characterize
its asymptotic variance in terms of primitives of the problem. We provide
consistent estimators of the standard errors and asymptotically exact
hypothesis tests. In the special case when the target proportion of units
assigned to each treatment does not vary across strata, we can also consider
two other estimators of the LATE, including the one based on the "strata fixed
effects" IV linear regression, i.e., a linear regression of the outcome on
indicators for all strata and the treatment decision, with the latter
instrumented with the treatment assignment.
Our characterization of the asymptotic variance of the LATE estimators allows
us to understand the influence of the parameters of the RCT. We use this to
propose strategies to minimize their asymptotic variance in a hypothetical RCT
based on data from a pilot study. We illustrate the practical relevance of
these results using a simulation study and an empirical application based on
Dupas et al. (2018).
arXiv link: http://arxiv.org/abs/2102.03937v3
Identification of Matching Complementarities: A Geometric Viewpoint
matching surplus function and we show how the estimation problem can be solved
by the introduction of a generalized entropy function over the set of
matchings.
arXiv link: http://arxiv.org/abs/2102.03875v1
Applications of Machine Learning in Document Digitisation
availability of data directly impacts the quality and extent of conclusions and
insights. In particular, larger and more detailed datasets provide convincing
answers even to complex research questions. The main problem is that 'large and
detailed' usually implies 'costly and difficult', especially when the data
medium is paper and books. Human operators and manual transcription have been
the traditional approach for collecting historical data. We instead advocate
the use of modern machine learning techniques to automate the digitisation
process. We give an overview of the potential for applying machine digitisation
for data collection through two illustrative applications. The first
demonstrates that unsupervised layout classification applied to raw scans of
nurse journals can be used to construct a treatment indicator. Moreover, it
allows an assessment of assignment compliance. The second application uses
attention-based neural networks for handwritten text recognition in order to
transcribe age and birth and death dates from a large collection of Danish
death certificates. We describe each step in the digitisation pipeline and
provide implementation insights.
arXiv link: http://arxiv.org/abs/2102.03239v1
Hypothetical bias in stated choice experiments: Part II. Macro-scale analysis of literature and effectiveness of bias mitigation methods
experiments (CEs). It presents a bibliometric analysis and summary of empirical
evidence of their effectiveness. The paper follows the review of empirical
evidence on the existence of HB presented in Part I of this study. While the
number of CE studies has rapidly increased since 2010, the critical issue of HB
has been studied in only a small fraction of CE studies. The present review
includes both ex-ante and ex-post bias mitigation methods. Ex-ante bias
mitigation methods include cheap talk, real talk, consequentiality scripts,
solemn oath scripts, opt-out reminders, budget reminders, honesty priming,
induced truth telling, indirect questioning, time to think and pivot designs.
Ex-post methods include follow-up certainty calibration scales, respondent
perceived consequentiality scales, and revealed-preference-assisted estimation.
It is observed that the use of mitigation methods markedly varies across
different sectors of applied economics. The existing empirical evidence points
to their overall effectives in reducing HB, although there is some variation.
The paper further discusses how each mitigation method can counter a certain
subset of HB sources. Considering the prevalence of HB in CEs and the
effectiveness of bias mitigation methods, it is recommended that implementation
of at least one bias mitigation method (or a suitable combination where
possible) becomes standard practice in conducting CEs. Mitigation method(s)
suited to the particular application should be implemented to ensure that
inferences and subsequent policy decisions are as much as possible free of HB.
arXiv link: http://arxiv.org/abs/2102.02945v1
Hypothetical bias in stated choice experiments: Part I. Integrative synthesis of empirical evidence and conceptualisation of external validity
fundamental issue in relation to the use of hypothetical survey methods.
Whether or to what extent choices of survey participants and subsequent
inferred estimates translate to real-world settings continues to be debated.
While HB has been extensively studied in the broader context of contingent
valuation, it is much less understood in relation to choice experiments (CE).
This paper reviews the empirical evidence for HB in CE in various fields of
applied economics and presents an integrative framework for how HB relates to
external validity. Results suggest mixed evidence on the prevalence, extent and
direction of HB as well as considerable context and measurement dependency.
While HB is found to be an undeniable issue when conducting CEs, the empirical
evidence on HB does not render CEs unable to represent real-world preferences.
While health-related choice experiments often find negligible degrees of HB,
experiments in consumer behaviour and transport domains suggest that
significant degrees of HB are ubiquitous. Assessments of bias in environmental
valuation studies provide mixed evidence. Also, across these disciplines many
studies display HB in their total willingness to pay estimates and opt-in rates
but not in their hypothetical marginal rates of substitution (subject to scale
correction). Further, recent findings in psychology and brain imaging studies
suggest neurocognitive mechanisms underlying HB that may explain some of the
discrepancies and unexpected findings in the mainstream CE literature. The
review also observes how the variety of operational definitions of HB prohibits
consistent measurement of HB in CE. The paper further identifies major sources
of HB and possible moderating factors. Finally, it explains how HB represents
one component of the wider concept of external validity.
arXiv link: http://arxiv.org/abs/2102.02940v1
The Econometrics and Some Properties of Separable Matching Models
utility. We discuss identification and inference in these separable models, and
we show how their comparative statics are readily analyzed.
arXiv link: http://arxiv.org/abs/2102.02564v1
Discretizing Unobserved Heterogeneity
revealed in a first step, in environments where population heterogeneity is not
discrete. We focus on two-step grouped fixed-effects (GFE) estimators, where
individuals are first classified into groups using kmeans clustering, and the
model is then estimated allowing for group-specific heterogeneity. Our
framework relies on two key properties: heterogeneity is a function - possibly
nonlinear and time-varying - of a low-dimensional continuous latent type, and
informative moments are available for classification. We illustrate the method
in a model of wages and labor market participation, and in a probit model with
time-varying heterogeneity. We derive asymptotic expansions of two-step GFE
estimators as the number of groups grows with the two dimensions of the panel.
We propose a data-driven rule for the number of groups, and discuss bias
reduction and inference.
arXiv link: http://arxiv.org/abs/2102.02124v1
Teams: Heterogeneity, Sorting, and Complementarity
framework to quantify individual contributions when only the output of their
teams is observed. The identification strategy relies on following individuals
who work in different teams over time. I consider two production technologies.
For a production function that is additive in worker inputs, I propose a
regression estimator and show how to obtain unbiased estimates of variance
components that measure the contributions of heterogeneity and sorting. To
estimate nonlinear models with complementarity, I propose a mixture approach
under the assumption that individual types are discrete, and rely on a
mean-field variational approximation for estimation. To illustrate the methods,
I estimate the impact of economists on their research output, and the
contributions of inventors to the quality of their patents.
arXiv link: http://arxiv.org/abs/2102.01802v1
Adaptive Random Bandwidth for Inference in CAViaR Models
(Engle and Manganelli, 2004). We find that the usual estimation strategy on
test statistics yields inaccuracies. Indeed, we show that existing density
estimation methods cannot adapt to the time-variation in the conditional
probability densities of CAViaR models. Consequently, we develop a method
called adaptive random bandwidth which can approximate time-varying conditional
probability densities robustly for inference testing on CAViaR models based on
the asymptotic normality of the model parameter estimator. This proposed method
also avoids the problem of choosing an optimal bandwidth in estimating
probability densities, and can be extended to multivariate quantile regressions
straightforward.
arXiv link: http://arxiv.org/abs/2102.01636v1
Efficient Estimation for Staggered Rollout Designs
settings where there is staggered treatment adoption and the timing of
treatment is as-good-as randomly assigned. We derive the most efficient
estimator in a class of estimators that nests several popular generalized
difference-in-differences methods. A feasible plug-in version of the efficient
estimator is asymptotically unbiased with efficiency (weakly) dominating that
of existing approaches. We provide both $t$-based and permutation-test-based
methods for inference. In an application to a training program for police
officers, confidence intervals for the proposed estimator are as much as eight
times shorter than for existing approaches.
arXiv link: http://arxiv.org/abs/2102.01291v7
A first-stage representation for instrumental variables quantile regression
instrumental variables (IV) quantile regression (QR) model. The quantile
first-stage is analogous to the least squares case, i.e., a linear projection
of the endogenous variables on the instruments and other exogenous covariates,
with the difference that the QR case is a weighted projection. The weights are
given by the conditional density function of the innovation term in the QR
structural model, conditional on the endogeneous and exogenous covariates, and
the instruments as well, at a given quantile. We also show that the required
Jacobian identification conditions for IVQR models are embedded in the quantile
first-stage. We then suggest inference procedures to evaluate the adequacy of
instruments by evaluating their statistical significance using the first-stage
result. The test is developed in an over-identification context, since
consistent estimation of the weights for implementation of the first-stage
requires at least one valid instrument to be available. Monte Carlo experiments
provide numerical evidence that the proposed tests work as expected in terms of
empirical size and power in finite samples. An empirical application
illustrates that checking for the statistical significance of the instruments
at different quantiles is important. The proposed procedures may be specially
useful in QR since the instruments may be relevant at some quantiles but not at
others.
arXiv link: http://arxiv.org/abs/2102.01212v4
Comparing hundreds of machine learning classifiers and discrete choice models in predicting travel behavior: an empirical benchmark
models (DCMs) in predicting travel demand. However, these studies often lack
generalizability as they compare models deterministically without considering
contextual variations. To address this limitation, our study develops an
empirical benchmark by designing a tournament model, thus efficiently
summarizing a large number of experiments, quantifying the randomness in model
comparisons, and using formal statistical tests to differentiate between the
model and contextual effects. This benchmark study compares two large-scale
data sources: a database compiled from literature review summarizing 136
experiments from 35 studies, and our own experiment data, encompassing a total
of 6,970 experiments from 105 models and 12 model families. This benchmark
study yields two key findings. Firstly, many ML models, particularly the
ensemble methods and deep learning, statistically outperform the DCM family
(i.e., multinomial, nested, and mixed logit models). However, this study also
highlights the crucial role of the contextual factors (i.e., data sources,
inputs and choice categories), which can explain models' predictive performance
more effectively than the differences in model types alone. Model performance
varies significantly with data sources, improving with larger sample sizes and
lower dimensional alternative sets. After controlling all the model and
contextual factors, significant randomness still remains, implying inherent
uncertainty in such model comparisons. Overall, we suggest that future
researchers shift more focus from context-specific model comparisons towards
examining model transferability across contexts and characterizing the inherent
uncertainty in ML, thus creating more robust and generalizable next-generation
travel demand models.
arXiv link: http://arxiv.org/abs/2102.01130v3
CRPS Learning
accuracy. This also holds for probabilistic forecasting methods where
predictive distributions are combined. There are several time-varying and
adaptive weighting schemes such as Bayesian model averaging (BMA). However, the
quality of different forecasts may vary not only over time but also within the
distribution. For example, some distribution forecasts may be more accurate in
the center of the distributions, while others are better at predicting the
tails. Therefore, we introduce a new weighting method that considers the
differences in performance over time and within the distribution. We discuss
pointwise combination algorithms based on aggregation across quantiles that
optimize with respect to the continuous ranked probability score (CRPS). After
analyzing the theoretical properties of pointwise CRPS learning, we discuss B-
and P-Spline-based estimation techniques for batch and online learning, based
on quantile regression and prediction with expert advice. We prove that the
proposed fully adaptive Bernstein online aggregation (BOA) method for pointwise
CRPS online learning has optimal convergence properties. They are confirmed in
simulations and a probabilistic forecasting study for European emission
allowance (EUA) prices.
arXiv link: http://arxiv.org/abs/2102.00968v3
Time Series (re)sampling using Generative Adversarial Networks
Adversarial networks (GANs). We show that the dynamics of common stationary
time series processes can be learned by GANs and demonstrate that GANs trained
on a single sample path can be used to generate additional samples from the
process. We find that temporal convolutional neural networks provide a suitable
design for the generator and discriminator, and that convincing samples can be
generated on the basis of a vector of iid normal noise. We demonstrate the
finite sample properties of GAN sampling and the suggested bootstrap using
simulations where we compare the performance to circular block bootstrapping in
the case of resampling an AR(1) time series processes. We find that resampling
using the GAN can outperform circular block bootstrapping in terms of empirical
coverage.
arXiv link: http://arxiv.org/abs/2102.00208v1
Tree-based Node Aggregation in Sparse Graphical Models
that is aimed at reducing the number of edges in a network. In this work, we
show how even simpler networks can be produced by aggregating the nodes of the
graphical model. We develop a new convex regularized method, called the
tree-aggregated graphical lasso or tag-lasso, that estimates graphical models
that are both edge-sparse and node-aggregated. The aggregation is performed in
a data-driven fashion by leveraging side information in the form of a tree that
encodes node similarity and facilitates the interpretation of the resulting
aggregated nodes. We provide an efficient implementation of the tag-lasso by
using the locally adaptive alternating direction method of multipliers and
illustrate our proposal's practical advantages in simulation and in
applications in finance and biology.
arXiv link: http://arxiv.org/abs/2101.12503v1
The Bootstrap for Network Dependent Processes
conditional $\psi$-weak dependence. Such processes are distinct from other
forms of random fields studied in the statistics and econometrics literature so
that the existing bootstrap methods cannot be applied directly. We propose a
block-based approach and a modification of the dependent wild bootstrap for
constructing confidence sets for the mean of a network dependent process. In
addition, we establish the consistency of these methods for the smooth function
model and provide the bootstrap alternatives to the network
heteroskedasticity-autocorrelation consistent (HAC) variance estimator. We find
that the modified dependent wild bootstrap and the corresponding variance
estimator are consistent under weaker conditions relative to the block-based
method, which makes the former approach preferable for practical
implementation.
arXiv link: http://arxiv.org/abs/2101.12312v1
Simple Adaptive Estimation of Quadratic Functionals in Nonparametric IV Models
in a nonparametric instrumental variables (NPIV) model, which is an important
problem in optimal estimation of a nonlinear functional of an ill-posed inverse
regression with an unknown operator. We first show that a leave-one-out, sieve
NPIV estimator of the quadratic functional can attain a convergence rate that
coincides with the lower bound previously derived in Chen and Christensen
[2018]. The minimax rate is achieved by the optimal choice of the sieve
dimension (a key tuning parameter) that depends on the smoothness of the NPIV
function and the degree of ill-posedness, both are unknown in practice. We next
propose a Lepski-type data-driven choice of the key sieve dimension adaptive to
the unknown NPIV model features. The adaptive estimator of the quadratic
functional is shown to attain the minimax optimal rate in the severely
ill-posed case and in the regular mildly ill-posed case, but up to a
multiplicative $\log n$ factor in the irregular mildly ill-posed case.
arXiv link: http://arxiv.org/abs/2101.12282v2
Gaussian Process Latent Class Choice Models
integrate a non-parametric class of probabilistic machine learning within
discrete choice models (DCMs). Gaussian Processes (GPs) are kernel-based
algorithms that incorporate expert knowledge by assuming priors over latent
functions rather than priors over parameters, which makes them more flexible in
addressing nonlinear problems. By integrating a Gaussian Process within a LCCM
structure, we aim at improving discrete representations of unobserved
heterogeneity. The proposed model would assign individuals probabilistically to
behaviorally homogeneous clusters (latent classes) using GPs and simultaneously
estimate class-specific choice models by relying on random utility models.
Furthermore, we derive and implement an Expectation-Maximization (EM) algorithm
to jointly estimate/infer the hyperparameters of the GP kernel function and the
class-specific choice parameters by relying on a Laplace approximation and
gradient-based numerical optimization methods, respectively. The model is
tested on two different mode choice applications and compared against different
LCCM benchmarks. Results show that GP-LCCM allows for a more complex and
flexible representation of heterogeneity and improves both in-sample fit and
out-of-sample predictive power. Moreover, behavioral and economic
interpretability is maintained at the class-specific choice model level while
local interpretation of the latent classes can still be achieved, although the
non-parametric characteristic of GPs lessens the transparency of the model.
arXiv link: http://arxiv.org/abs/2101.12252v1
Choice modelling in the age of machine learning -- discussion paper
theory-driven modelling approaches. Machine learning offers an alternative
data-driven approach for modelling choice behaviour and is increasingly drawing
interest in our field. Cross-pollination of machine learning models, techniques
and practices could help overcome problems and limitations encountered in the
current theory-driven modelling paradigm, such as subjective labour-intensive
search processes for model selection, and the inability to work with text and
image data. However, despite the potential benefits of using the advances of
machine learning to improve choice modelling practices, the choice modelling
field has been hesitant to embrace machine learning. This discussion paper aims
to consolidate knowledge on the use of machine learning models, techniques and
practices for choice modelling, and discuss their potential. Thereby, we hope
not only to make the case that further integration of machine learning in
choice modelling is beneficial, but also to further facilitate it. To this end,
we clarify the similarities and differences between the two modelling
paradigms; we review the use of machine learning for choice modelling; and we
explore areas of opportunities for embracing machine learning models and
techniques to improve our practices. To conclude this discussion paper, we put
forward a set of research questions which must be addressed to better
understand if and how machine learning can benefit choice modelling.
arXiv link: http://arxiv.org/abs/2101.11948v2
A Bayesian approach for estimation of weight matrices in spatial autoregressive models
autoregressive (or spatial lag) models. Datasets in regional economic
literature are typically characterized by a limited number of time periods T
relative to spatial units N. When the spatial weight matrix is subject to
estimation severe problems of over-parametrization are likely. To make
estimation feasible, our approach focusses on spatial weight matrices which are
binary prior to row-standardization. We discuss the use of hierarchical priors
which impose sparsity in the spatial weight matrix. Monte Carlo simulations
show that these priors perform very well where the number of unknown parameters
is large relative to the observations. The virtues of our approach are
demonstrated using global data from the early phase of the COVID-19 pandemic.
arXiv link: http://arxiv.org/abs/2101.11938v2
The sooner the better: lives saved by the lockdown during the COVID-19 outbreak. The case of Italy
mainly, the lockdown - on the COVID-19 mortality rate for the case of Italy,
the first Western country to impose a national shelter-in-place order. We use a
new estimator, the Augmented Synthetic Control Method (ASCM), that overcomes
some limits of the standard Synthetic Control Method (SCM). The results are
twofold. From a methodological point of view, the ASCM outperforms the SCM in
that the latter cannot select a valid donor set, assigning all the weights to
only one country (Spain) while placing zero weights to all the remaining. From
an empirical point of view, we find strong evidence of the effectiveness of
non-pharmaceutical interventions in avoiding losses of human lives in Italy:
conservative estimates indicate that for each human life actually lost, in the
absence of lockdown there would have been on average other 1.15, the policy
saved in total 20,400 human lives.
arXiv link: http://arxiv.org/abs/2101.11901v1
Predictive Quantile Regression with Mixed Roots and Increasing Dimensions: The ALQR Approach
regression (ALQR). Reflecting empirical findings, we allow predictors to have
various degrees of persistence and exhibit different signal strengths. The
number of predictors is allowed to grow with the sample size. We study
regularity conditions under which stationary, local unit root, and cointegrated
predictors are present simultaneously. We next show the convergence rates,
model selection consistency, and asymptotic distributions of ALQR. We apply the
proposed method to the out-of-sample quantile prediction problem of stock
returns and find that it outperforms the existing alternatives. We also provide
numerical evidence from additional Monte Carlo experiments, supporting the
theoretical results.
arXiv link: http://arxiv.org/abs/2101.11568v4
Identifying and Estimating Perceived Returns to Binary Investments
that relies on cross-sectional data containing binary choices and prices, where
prices may be imperfectly known to agents. This method identifies the scale of
perceived returns by assuming agent knowledge of an identity that relates
profits, revenues, and costs rather than by eliciting or assuming agent beliefs
about structural parameters that are estimated by researchers. With this
assumption, modest adjustments to standard binary choice estimators enable
consistent estimation of perceived returns when using price instruments that
are uncorrelated with unobserved determinants of agents' price misperceptions
as well as other unobserved determinants of their perceived returns. I
demonstrate the method, and the importance of using price variation that is
known to agents, in a series of data simulations.
arXiv link: http://arxiv.org/abs/2101.10941v1
Robustness of the international oil trade network under targeted attacks to economies
extreme events may spread over the entire network along the trade links of the
central economies and even lead to the collapse of the whole system. In this
study, we focus on the concept of "too central to fail" and use traditional
centrality indicators as strategic indicators for simulating attacks on
economic nodes, and simulates various situations in which the structure and
function of the global oil trade network are lost when the economies suffer
extreme trade shocks. The simulation results show that the global oil trade
system has become more vulnerable in recent years. The regional aggregation of
oil trade is an essential source of iOTN's vulnerability. Maintaining global
oil trade stability and security requires a focus on economies with greater
influence within the network module of the iOTN. International organizations
such as OPEC and OECD established more trade links around the world, but their
influence on the iOTN is declining. We improve the framework of oil security
and trade risk assessment based on the topological index of iOTN, and provide a
reference for finding methods to maintain network robustness and trade
stability.
arXiv link: http://arxiv.org/abs/2101.10679v2
A nowcasting approach to generate timely estimates of Mexican economic activity: An application to the period of COVID-19
(DFMs) to perform nowcasts for the percentage annual variation of the Mexican
Global Economic Activity Indicator (IGAE in Spanish). The procedure consists of
the following steps: i) build a timely and correlated database by using
economic and financial time series and real-time variables such as social
mobility and significant topics extracted by Google Trends; ii) estimate the
common factors using the two-step methodology of Doz et al. (2011); iii) use
the common factors in univariate time-series models for test data; and iv)
according to the best results obtained in the previous step, combine the
statistically equal better nowcasts (Diebold-Mariano test) to generate the
current nowcasts. We obtain timely and accurate nowcasts for the IGAE,
including those for the current phase of drastic drops in the economy related
to COVID-19 sanitary measures. Additionally, the approach allows us to
disentangle the key variables in the DFM by estimating the confidence interval
for both the factor loadings and the factor estimates. This approach can be
used in official statistics to obtain preliminary estimates for IGAE up to 50
days before the official results.
arXiv link: http://arxiv.org/abs/2101.10383v1
A Benchmark Model for Fixed-Target Arctic Sea Ice Forecasting
forecasting of Arctic sea ice extent, and we provide a case study of its
real-time performance for target date September 2020. We visually detail the
evolution of the statistically-optimal point, interval, and density forecasts
as time passes, new information arrives, and the end of September approaches.
Comparison to the BPM may prove useful for evaluating and selecting among
various more sophisticated dynamical sea ice models, which are widely used to
quantify the likely future evolution of Arctic conditions and their two-way
interaction with economic activity.
arXiv link: http://arxiv.org/abs/2101.10359v3
Consistent specification testing under spatial dependence
function when data are spatially dependent, the `space' being of a general
economic or social nature. Dependence can be parametric, parametric with
increasing dimension, semiparametric or any combination thereof, thus covering
a vast variety of settings. These include spatial error models of varying types
and levels of complexity. Under a new smooth spatial dependence condition, our
test statistic is asymptotically standard normal. To prove the latter property,
we establish a central limit theorem for quadratic forms in linear processes in
an increasing dimension setting. Finite sample performance is investigated in a
simulation study, with a bootstrap method also justified and illustrated, and
empirical examples illustrate the test with real-world data.
arXiv link: http://arxiv.org/abs/2101.10255v3
Kernel regression analysis of tie-breaker designs
(RCTs) and Regression Discontinuity Designs (RDDs) in which subjects with
moderate scores are placed in an RCT while subjects with extreme scores are
deterministically assigned to the treatment or control group. In settings where
it is unfair or uneconomical to deny the treatment to the more deserving
recipients, the tie-breaker design (TBD) trades off the practical advantages of
the RDD with the statistical advantages of the RCT. The practical costs of the
randomization in TBDs can be hard to quantify in generality, while the
statistical benefits conferred by randomization in TBDs have only been studied
under linear and quadratic models. In this paper, we discuss and quantify the
statistical benefits of TBDs without using parametric modelling assumptions. If
the goal is estimation of the average treatment effect or the treatment effect
at more than one score value, the statistical benefits of using a TBD over an
RDD are apparent. If the goal is nonparametric estimation of the mean treatment
effect at merely one score value, we prove that about 2.8 times more subjects
are needed for an RDD in order to achieve the same asymptotic mean squared
error. We further demonstrate using both theoretical results and simulations
from the Angrist and Lavy (1999) classroom size dataset, that larger
experimental radii choices for the TBD lead to greater statistical efficiency.
arXiv link: http://arxiv.org/abs/2101.09605v5
Inference on the New Keynesian Phillips Curve with Very Many Instrumental Variables
other single-equation macroeconomic relations is characterised by weak and
high-dimensional instrumental variables (IVs). Beyond the efficiency concerns
previously raised in the literature, I show by simulation that ad-hoc selection
procedures can lead to substantial biases in post-selection inference. I
propose a Sup Score test that remains valid under dependent data, arbitrarily
weak identification, and a number of IVs that increases exponentially with the
sample size. Conducting inference on a standard NKPC with 359 IVs and 179
observations, I find substantially wider confidence sets than those commonly
found.
arXiv link: http://arxiv.org/abs/2101.09543v2
A Design-Based Perspective on Synthetic Control Methods
(SC) methods have quickly become one of the leading methods for estimating
causal effects in observational studies in settings with panel data. Formal
discussions often motivate SC methods by the assumption that the potential
outcomes were generated by a factor model. Here we study SC methods from a
design-based perspective, assuming a model for the selection of the treated
unit(s) and period(s). We show that the standard SC estimator is generally
biased under random assignment. We propose a Modified Unbiased Synthetic
Control (MUSC) estimator that guarantees unbiasedness under random assignment
and derive its exact, randomization-based, finite-sample variance. We also
propose an unbiased estimator for this variance. We document in settings with
real data that under random assignment, SC-type estimators can have root
mean-squared errors that are substantially lower than that of other common
estimators. We show that such an improvement is weakly guaranteed if the
treated period is similar to the other periods, for example, if the treated
period was randomly selected. While our results only directly apply in settings
where treatment is assigned randomly, we believe that they can complement
model-based approaches even for observational studies.
arXiv link: http://arxiv.org/abs/2101.09398v4
Yield Spread Selection in Predicting Recession Probabilities: A Machine Learning Approach
10-year--three-month Treasury yield spread without verification on the pair
selection. This study investigates whether the predictive ability of spread can
be improved by letting a machine learning algorithm identify the best maturity
pair and coefficients. Our comprehensive analysis shows that, despite the
likelihood gain, the machine learning approach does not significantly improve
prediction, owing to the estimation error. This is robust to the forecasting
horizon, control variable, sample period, and oversampling of the recession
observations. Our finding supports the use of the 10-year--three-month spread.
arXiv link: http://arxiv.org/abs/2101.09394v2
HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition
combination with AI based transcription models, are developing rapidly.
Probably the single most important identifier for linking is personal names.
However, personal names are prone to enumeration and transcription errors and
although modern linking methods are designed to handle such challenges, these
sources of errors are critical and should be minimized. For this purpose,
improved transcription methods and large-scale databases are crucial
components. This paper describes and provides documentation for HANA, a newly
constructed large-scale database which consists of more than 3.3 million names.
The database contain more than 105 thousand unique names with a total of more
than 1.1 million images of personal names, which proves useful for transfer
learning to other settings. We provide three examples hereof, obtaining
significantly improved transcription accuracy on both Danish and US census
data. In addition, we present benchmark results for deep learning models
automatically transcribing the personal names from the scanned documents.
Through making more challenging large-scale databases publicly available we
hope to foster more sophisticated, accurate, and robust models for handwritten
text recognition.
arXiv link: http://arxiv.org/abs/2101.10862v2
Discrete Choice Analysis with Machine Learning Capabilities
policy analysis settings and the limitations of direct applications of
off-the-shelf machine learning methodologies to such settings. Traditional
econometric methodologies for building discrete choice models for policy
analysis involve combining data with modeling assumptions guided by
subject-matter considerations. Such considerations are typically most useful in
specifying the systematic component of random utility discrete choice models
but are typically of limited aid in determining the form of the random
component. We identify an area where machine learning paradigms can be
leveraged, namely in specifying and systematically selecting the best
specification of the random component of the utility equations. We review two
recent novel applications where mixed-integer optimization and cross-validation
are used to algorithmically select optimal specifications for the random
utility components of nested logit and logit mixture models subject to
interpretability constraints.
arXiv link: http://arxiv.org/abs/2101.10261v1
Decomposition of Bilateral Trade Flows Using a Three-Dimensional Panel Data Model
panel data model. Under the scenario that all three dimensions diverge to
infinity, we propose an estimation approach to identify the number of global
shocks and country-specific shocks sequentially, and establish the asymptotic
theories accordingly. From the practical point of view, being able to separate
the pervasive and nonpervasive shocks in a multi-dimensional panel data is
crucial for a range of applications, such as, international financial linkages,
migration flows, etc. In the numerical studies, we first conduct intensive
simulations to examine the theoretical findings, and then use the proposed
approach to investigate the international trade flows from two major trading
groups (APEC and EU) over 1982-2019, and quantify the network of bilateral
trade.
arXiv link: http://arxiv.org/abs/2101.06805v1
GDP Forecasting using Payments Transaction Data
adjusted for prior periods. This paper contemplates breaking away from the
historic GDP measure to a more dynamic method using Bank Account, Cheque and
Credit Card payment transactions as possible predictors for faster and real
time measure of GDP value. Historic timeseries data available from various
public domain for various payment types, values, volume and nominal UK GDP was
used for this analysis. Low Value Payments was selected for simple Ordinary
Least Square Simple Linear Regression with mixed results around explanatory
power of the model and reliability measured through residuals distribution and
variance. Future research could potentially expand this work using datasets
split by period of economic shocks to further test the OLS method or explore
one of General Least Square method or an autoregression on GDP timeseries
itself.
arXiv link: http://arxiv.org/abs/2101.06478v1
Causal Gradient Boosting: Boosted Instrumental Variable Regression
learning algorithms are ill-suited for problems with endogenous explanatory
variables. To correct for the endogeneity bias, many variants of nonparameteric
instrumental variable regression methods have been developed. In this paper, we
propose an alternative algorithm called boostIV that builds on the traditional
gradient boosting algorithm and corrects for the endogeneity bias. The
algorithm is very intuitive and resembles an iterative version of the standard
2SLS estimator. Moreover, our approach is data driven, meaning that the
researcher does not have to make a stance on neither the form of the target
function approximation nor the choice of instruments. We demonstrate that our
estimator is consistent under mild conditions. We carry out extensive Monte
Carlo simulations to demonstrate the finite sample performance of our algorithm
compared to other recently developed methods. We show that boostIV is at worst
on par with the existing methods and on average significantly outperforms them.
arXiv link: http://arxiv.org/abs/2101.06078v1
Using Monotonicity Restrictions to Identify Models with Partially Latent Covariates
partially latent covariates. Such data structures arise in industrial
organization and labor economics settings where data are collected using an
input-based sampling strategy, e.g., if the sampling unit is one of multiple
labor input factors. We show that the latent covariates can be
nonparametrically identified, if they are functions of a common shock
satisfying some plausible monotonicity assumptions. With the latent covariates
identified, semiparametric estimation of the outcome equation proceeds within a
standard IV framework that accounts for the endogeneity of the covariates. We
illustrate the usefulness of our method using a new application that focuses on
the production functions of pharmacies. We find that differences in technology
between chains and independent pharmacies may partially explain the observed
transformation of the industry structure.
arXiv link: http://arxiv.org/abs/2101.05847v5
Explicit non-asymptotic bounds for the distance to the first-order Edgeworth expansion
the cumulative distribution function of a standardized sum $S_n$ of $n$
independent centered random variables with moments of order four and its
first-order Edgeworth expansion. Those bounds are valid for any sample size
with $n^{-1/2}$ rate under moment conditions only and $n^{-1}$ rate under
additional regularity constraints on the tail behavior of the characteristic
function of $S_n$. In both cases, the bounds are further sharpened if the
variables involved in $S_n$ are unskewed. We also derive new Berry-Esseen-type
bounds from our results and discuss their links with existing ones. We finally
apply our results to illustrate the lack of finite-sample validity of one-sided
tests based on the normal approximation of the mean.
arXiv link: http://arxiv.org/abs/2101.05780v3
Assessing the Impact: Does an Improvement to a Revenue Management System Lead to an Improved Revenue?
Management Systems to maximize revenue for decades. While improving the
different components of these systems has been the focus of numerous studies,
estimating the impact of such improvements on the revenue has been overlooked
in the literature despite its practical importance. Indeed, quantifying the
benefit of a change in a system serves as support for investment decisions.
This is a challenging problem as it corresponds to the difference between the
generated value and the value that would have been generated keeping the system
as before. The latter is not observable. Moreover, the expected impact can be
small in relative value. In this paper, we cast the problem as counterfactual
prediction of unobserved revenue. The impact on revenue is then the difference
between the observed and the estimated revenue. The originality of this work
lies in the innovative application of econometric methods proposed for
macroeconomic applications to a new problem setting. Broadly applicable, the
approach benefits from only requiring revenue data observed for
origin-destination pairs in the network of the airline at each day, before and
after a change in the system is applied. We report results using real
large-scale data from Air Canada. We compare a deep neural network
counterfactual predictions model with econometric models. They achieve
respectively 1% and 1.1% of error on the counterfactual revenue predictions,
and allow to accurately estimate small impacts (in the order of 2%).
arXiv link: http://arxiv.org/abs/2101.10249v2
Full-Information Estimation of Heterogeneous Agent Models Using Macro and Micro Data
heterogeneous agent models, combining aggregate time series data and repeated
cross sections of micro data. To handle unobserved aggregate state variables
that affect cross-sectional distributions, we compute a numerically unbiased
estimate of the model-implied likelihood function. Employing the likelihood
estimate in a Markov Chain Monte Carlo algorithm, we obtain fully efficient and
valid Bayesian inference. Evaluation of the micro part of the likelihood lends
itself naturally to parallel computing. Numerical illustrations in models with
heterogeneous households or firms demonstrate that the proposed
full-information method substantially sharpens inference relative to using only
macro data, and for some parameters micro data is essential for identification.
arXiv link: http://arxiv.org/abs/2101.04771v2
Empirical Decomposition of the IV-OLS Gap with Heterogeneous and Nonlinear Effects
decompose the difference between IV and OLS estimates given by a linear
regression model when the true causal effects of the treatment are nonlinear in
treatment levels and heterogeneous across covariates. I show that the IV-OLS
coefficient gap consists of three estimable components: the difference in
weights on the covariates, the difference in weights on the treatment levels,
and the difference in identified marginal effects that arises from endogeneity
bias. Applications of this framework to return-to-schooling estimates
demonstrate the empirical relevance of this distinction in properly
interpreting the IV-OLS gap.
arXiv link: http://arxiv.org/abs/2101.04346v5
Dynamic Ordering Learning in Multivariate Forecasting
decision making problems, the good understanding of the contemporaneous
relations among different series is crucial for the estimation of the
covariance matrix. In recent years, the modified Cholesky decomposition
appeared as a popular approach to covariance matrix estimation. However, its
main drawback relies on the imposition of the series ordering structure. In
this work, we propose a highly flexible and fast method to deal with the
problem of ordering uncertainty in a dynamic fashion with the use of Dynamic
Order Probabilities. We apply the proposed method in two different forecasting
contexts. The first is a dynamic portfolio allocation problem, where the
investor is able to learn the contemporaneous relationships among different
currencies improving final decisions and economic performance. The second is a
macroeconomic application, where the econometrician can adapt sequentially to
new economic environments, switching the contemporaneous relations among
macroeconomic variables over time.
arXiv link: http://arxiv.org/abs/2101.04164v3
Bootstrapping Non-Stationary Stochastic Volatility
regressions when the volatility of the innovations is random and
non-stationary. The volatility of many economic and financial time series
displays persistent changes and possible non-stationarity. However, the theory
of the bootstrap for such models has focused on deterministic changes of the
unconditional variance and little is known about the performance and the
validity of the bootstrap when the volatility is driven by a non-stationary
stochastic process. This includes near-integrated volatility processes as well
as near-integrated GARCH processes. This paper develops conditions for
bootstrap validity in time series regressions with non-stationary, stochastic
volatility. We show that in such cases the distribution of bootstrap statistics
(conditional on the data) is random in the limit. Consequently, the
conventional approaches to proving bootstrap validity, involving weak
convergence in probability of the bootstrap statistic, fail to deliver the
required results. Instead, we use the concept of `weak convergence in
distribution' to develop and establish novel conditions for validity of the
wild bootstrap, conditional on the volatility process. We apply our results to
several testing problems in the presence of non-stationary stochastic
volatility, including testing in a location model, testing for structural
change and testing for an autoregressive unit root. Sufficient conditions for
bootstrap validity include the absence of statistical leverage effects, i.e.,
correlation between the error process and its future conditional variance. The
results are illustrated using Monte Carlo simulations, which indicate that the
wild bootstrap leads to size control even in small samples.
arXiv link: http://arxiv.org/abs/2101.03562v1
Online Multivalid Learning: Means, Moments, and Prediction Intervals
predictions that are "multivalid" in various senses, against an online sequence
of adversarially chosen examples $(x,y)$. This means that the resulting
estimates correctly predict various statistics of the labels $y$ not just
marginally -- as averaged over the sequence of examples -- but also
conditionally on $x \in G$ for any $G$ belonging to an arbitrary intersecting
collection of groups $G$.
We provide three instantiations of this framework. The first is mean
prediction, which corresponds to an online algorithm satisfying the notion of
multicalibration from Hebert-Johnson et al. The second is variance and higher
moment prediction, which corresponds to an online algorithm satisfying the
notion of mean-conditioned moment multicalibration from Jung et al. Finally, we
define a new notion of prediction interval multivalidity, and give an algorithm
for finding prediction intervals which satisfy it. Because our algorithms
handle adversarially chosen examples, they can equally well be used to predict
statistics of the residuals of arbitrary point prediction methods, giving rise
to very general techniques for quantifying the uncertainty of predictions of
black box algorithms, even in an online adversarial setting. When instantiated
for prediction intervals, this solves a similar problem as conformal
prediction, but in an adversarial environment and with multivalidity guarantees
stronger than simple marginal coverage guarantees.
arXiv link: http://arxiv.org/abs/2101.01739v1
Partial Identification in Nonseparable Binary Response Models with Endogenous Regressors
parameters in binary response models with possibly endogenous regressors. Our
framework allows for nonseparable index functions with multi-dimensional latent
variables, and does not require parametric distributional assumptions. We
leverage results on hyperplane arrangements and cell enumeration from the
literature on computational geometry in order to provide a tractable means of
computing the identified set. We demonstrate how various functional form,
independence, and monotonicity assumptions can be imposed as constraints in our
optimization procedure to tighten the identified set. Finally, we apply our
method to study the effects of health insurance on the decision to seek medical
treatment.
arXiv link: http://arxiv.org/abs/2101.01254v5
Regression Discontinuity Design with Many Thresholds
multiple cutoffs and heterogeneous treatments. A common practice is to
normalize all the cutoffs to zero and estimate one effect. This procedure
identifies the average treatment effect (ATE) on the observed distribution of
individuals local to existing cutoffs. However, researchers often want to make
inferences on more meaningful ATEs, computed over general counterfactual
distributions of individuals, rather than simply the observed distribution of
individuals local to existing cutoffs. This paper proposes a consistent and
asymptotically normal estimator for such ATEs when heterogeneity follows a
non-parametric function of cutoff characteristics in the sharp case. The
proposed estimator converges at the minimax optimal rate of root-n for a
specific choice of tuning parameters. Identification in the fuzzy case, with
multiple cutoffs, is impossible unless heterogeneity follows a
finite-dimensional function of cutoff characteristics. Under parametric
heterogeneity, this paper proposes an ATE estimator for the fuzzy case that
optimally combines observations to maximize its precision.
arXiv link: http://arxiv.org/abs/2101.01245v1
Better Bunching, Nicer Notching
parameter that summarizes agents' responses to changes in slope (kink) or
intercept (notch) of a schedule of incentives. We show that current bunching
methods may be very sensitive to implicit assumptions in the literature about
unobserved individual heterogeneity. We overcome this sensitivity concern with
new non- and semi-parametric estimators. Our estimators allow researchers to
show how bunching elasticities depend on different identifying assumptions and
when elasticities are robust to them. We follow the literature and derive our
methods in the context of the iso-elastic utility model and an income tax
schedule that creates a piece-wise linear budget constraint. We demonstrate
bunching behavior provides robust estimates for self-employed and not-married
taxpayers in the context of the U.S. Earned Income Tax Credit. In contrast,
estimates for self-employed and married taxpayers depend on specific
identifying assumptions, which highlight the value of our approach. We provide
the Stata package "bunching" to implement our procedures.
arXiv link: http://arxiv.org/abs/2101.01170v3
Shoiuld Humans Lie to Machines: The Incentive Compatibility of Lasso and General Weighted Lasso
learning method that tries to predict her best option based on a random sample
of other users. The predictor is incentive-compatible if the user has no
incentive to misreport her covariates. Focusing on the popular Lasso estimation
technique, we borrow tools from high-dimensional statistics to characterize
sufficient conditions that ensure that Lasso is incentive compatible in large
samples. We extend our results to the Conservative Lasso estimator and provide
new moment bounds for this generalized weighted version of Lasso. Our results
show that incentive compatibility is achieved if the tuning parameter is kept
above some threshold. We present simulations that illustrate how this can be
done in practice.
arXiv link: http://arxiv.org/abs/2101.01144v2
Estimation of Tempered Stable Lévy Models of Infinite Variation
stable L\'{e}vy model. The estimation procedure combines iteratively an
approximate semiparametric method of moment estimator, Truncated Realized
Quadratic Variations (TRQV), and a newly found small-time high-order
approximation for the optimal threshold of the TRQV of tempered stable
processes. The method is tested via simulations to estimate the volatility and
the Blumenthal-Getoor index of the generalized CGMY model as well as the
integrated volatility of a Heston-type model with CGMY jumps. The method
outperforms other efficient alternatives proposed in the literature when
working with a L\'evy process (i.e., the volatility is constant), or when the
index of jump intensity $Y$ is larger than $3/2$ in the presence of stochastic
volatility.
arXiv link: http://arxiv.org/abs/2101.00565v2
COVID-19 spreading in financial networks: A semiparametric matrix regression model
financial relationships among heterogeneous firms in the system. In this paper,
we propose a new semiparametric model for temporal multilayer causal networks
with both intra- and inter-layer connectivity. A Bayesian model with a
hierarchical mixture prior distribution is assumed to capture heterogeneity in
the response of the network edges to a set of risk factors including the
European COVID-19 cases. We measure the financial connectedness arising from
the interactions between two layers defined by stock returns and volatilities.
In the empirical analysis, we study the topology of the network before and
after the spreading of the COVID-19 disease.
arXiv link: http://arxiv.org/abs/2101.00422v1
The Law of Large Numbers for Large Stable Matchings
college admissions problem), the researcher performs statistical inference
under the assumption that they observe a random sample from a large matching
market. In this paper, we consider a setting in which the researcher observes
either all or a nontrivial fraction of outcomes from a stable matching. We
establish a concentration inequality for empirical matching probabilities
assuming strong correlation among the colleges' preferences while allowing
students' preferences to be fully heterogeneous. Our concentration inequality
yields laws of large numbers for the empirical matching probabilities and other
statistics commonly used in empirical analyses of a large matching market. To
illustrate the usefulness of our concentration inequality, we prove consistency
for estimators of conditional matching probabilities and measures of positive
assortative matching.
arXiv link: http://arxiv.org/abs/2101.00399v8
Assessing Sensitivity to Unconfoundedness: Estimation and Inference
treatment effects estimated using the unconfoundedness assumption (also known
as selection on observables or conditional independence). Specifically, we
estimate and do inference on bounds on various treatment effect parameters,
like the average treatment effect (ATE) and the average effect of treatment on
the treated (ATT), under nonparametric relaxations of the unconfoundedness
assumption indexed by a scalar sensitivity parameter c. These relaxations allow
for limited selection on unobservables, depending on the value of c. For large
enough c, these bounds equal the no assumptions bounds. Using a non-standard
bootstrap method, we show how to construct confidence bands for these bound
functions which are uniform over all values of c. We illustrate these methods
with an empirical application to effects of the National Supported Work
Demonstration program. We implement these methods in a companion Stata module
for easy use in practice.
arXiv link: http://arxiv.org/abs/2012.15716v1
Breaking Ties: Regression Discontinuity Design Meets Market Design
Centralized school assignment algorithms ration seats at over-subscribed
schools using randomly assigned lottery numbers, non-lottery tie-breakers like
test scores, or both. The New York City public high school match illustrates
the latter, using test scores and other criteria to rank applicants at
“screened” schools, combined with lottery tie-breaking at unscreened
“lottery” schools. We show how to identify causal effects of school
attendance in such settings. Our approach generalizes regression discontinuity
methods to allow for multiple treatments and multiple running variables, some
of which are randomly assigned. The key to this generalization is a local
propensity score that quantifies the school assignment probabilities induced by
lottery and non-lottery tie-breakers. The local propensity score is applied in
an empirical assessment of the predictive value of New York City's school
report cards. Schools that receive a high grade indeed improve SAT math scores
and increase graduation rates, though by much less than OLS estimates suggest.
Selection bias in OLS estimates is egregious for screened schools.
arXiv link: http://arxiv.org/abs/2101.01093v1
Assessing the Sensitivity of Synthetic Control Treatment Effect Estimates to Misspecification Error
estimates to interrogate the assumption that the SC method is well-specified,
namely that choosing weights to minimize pre-treatment prediction error yields
accurate predictions of counterfactual post-treatment outcomes. Our data-driven
procedure recovers the set of treatment effects consistent with the assumption
that the misspecification error incurred by the SC method is at most the
observable misspecification error incurred when using the SC estimator to
predict the outcomes of some control unit. We show that under one definition of
misspecification error, our procedure provides a simple, geometric motivation
for comparing the estimated treatment effect to the distribution of placebo
residuals to assess estimate credibility. When we apply our procedure to
several canonical studies that report SC estimates, we broadly confirm the
conclusions drawn by the source papers.
arXiv link: http://arxiv.org/abs/2012.15367v3
Adversarial Estimation of Riesz Representers
The Riesz representer is a key component in the asymptotic variance of a
semiparametrically estimated linear functional. We propose an adversarial
framework to estimate the Riesz representer using general function spaces. We
prove a nonasymptotic mean square rate in terms of an abstract quantity called
the critical radius, then specialize it for neural networks, random forests,
and reproducing kernel Hilbert spaces as leading cases. Our estimators are
highly compatible with targeted and debiased machine learning with sample
splitting; our guarantees directly verify general conditions for inference that
allow mis-specification. We also use our guarantees to prove inference without
sample splitting, based on stability or complexity. Our estimators achieve
nominal coverage in highly nonlinear simulations where some previous methods
break down. They shed new light on the heterogeneous effects of matching
grants.
arXiv link: http://arxiv.org/abs/2101.00009v3
A Pairwise Strategic Network Formation Model with Group Heterogeneity: With an Application to International Travel
dyad of agents strategically determines the link status between them. Our model
allows the agents to have unobserved group heterogeneity in the propensity of
link formation. For the model estimation, we propose a three-step maximum
likelihood (ML) method. First, we obtain consistent estimates for the
heterogeneity parameters at individual level using the ML estimator. Second, we
estimate the latent group structure using the binary segmentation algorithm
based on the results obtained from the first step. Finally, based on the
estimated group membership, we re-execute the ML estimation. Under certain
regularity conditions, we show that the proposed estimator is asymptotically
unbiased and distributed as normal at the parametric rate. As an empirical
illustration, we focus on the network data of international visa-free travels.
The results indicate the presence of significant strategic complementarity and
a certain level of degree heterogeneity in the network formation behavior.
arXiv link: http://arxiv.org/abs/2012.14886v2
Bias-Aware Inference in Regularized Regression Models
on the magnitude of the control coefficients. A class of estimators based on a
regularized propensity score regression is shown to exactly solve a tradeoff
between worst-case bias and variance. We derive confidence intervals (CIs)
based on these estimators that are bias-aware: they account for the possible
bias of the estimator. Under homoskedastic Gaussian errors, these estimators
and CIs are near-optimal in finite samples for MSE and CI length. We also
provide conditions for asymptotic validity of the CI with unknown and possibly
heteroskedastic error distribution, and derive novel optimal rates of
convergence under high-dimensional asymptotics that allow the number of
regressors to increase more quickly than the number of observations. Extensive
simulations and an empirical application illustrate the performance of our
methods.
arXiv link: http://arxiv.org/abs/2012.14823v2
Bayesian analysis of seasonally cointegrated VAR model
quarterly data. We propose the prior structure, derive the set of full
conditional posterior distributions, and propose the sampling scheme. The
identification of cointegrating spaces is obtained via orthonormality
restrictions imposed on vectors spanning them. In the case of annual frequency,
the cointegrating vectors are complex, which should be taken into account when
identifying them. The point estimation of the cointegrating spaces is also
discussed. The presented methods are illustrated by a simulation experiment and
are employed in the analysis of money and prices in the Polish economy.
arXiv link: http://arxiv.org/abs/2012.14820v2
The impact of Climate on Economic and Financial Cycles: A Markov-switching Panel Approach
analysing jointly business and financial cycles, in different phases and
disentangling the effects for different sector channels. A Bayesian Panel
Markov-switching framework is proposed to jointly estimate the impact of
extreme weather events on the economies as well as the interaction between
business and financial cycles. Results from the empirical analysis suggest that
extreme weather events impact asymmetrically across the different phases of the
economy and heterogeneously across the EU countries. Moreover, we highlight how
the manufacturing output, a component of the industrial production index,
constitutes the main channel through which climate shocks impact the EU
economies.
arXiv link: http://arxiv.org/abs/2012.14693v1
Time-Transformed Test for the Explosive Bubbles under Non-stationary Volatility
non-stationary volatility. Because the limiting distribution of the seminal
Phillips et al. (2011) test depends on the variance function and usually
requires a bootstrap implementation under heteroskedasticity, we construct the
test based on a deformation of the time domain. The proposed test is
asymptotically pivotal under the null hypothesis and its limiting distribution
coincides with that of the standard test under homoskedasticity, so that the
test does not require computationally extensive methods for inference.
Appealing finite sample properties are demonstrated through Monte-Carlo
simulations. An empirical application demonstrates that the upsurge behavior of
cryptocurrency time series in the middle of the sample is partially explained
by the volatility change.
arXiv link: http://arxiv.org/abs/2012.13937v2
Weighting-Based Treatment Effect Estimation via Distribution Learning
upon the idea of propensity scores or covariate balance. They usually impose
strong assumptions on treatment assignment or outcome model to obtain unbiased
estimation, such as linearity or specific functional forms, which easily leads
to the major drawback of model mis-specification. In this paper, we aim to
alleviate these issues by developing a distribution learning-based weighting
method. We first learn the true underlying distribution of covariates
conditioned on treatment assignment, then leverage the ratio of covariates'
density in the treatment group to that of the control group as the weight for
estimating treatment effects. Specifically, we propose to approximate the
distribution of covariates in both treatment and control groups through
invertible transformations via change of variables. To demonstrate the
superiority, robustness, and generalizability of our method, we conduct
extensive experiments using synthetic and real data. From the experiment
results, we find that our method for estimating average treatment effect on
treated (ATT) with observational data outperforms several cutting-edge
weighting-only benchmarking methods, and it maintains its advantage under a
doubly-robust estimation framework that combines weighting with some advanced
outcome modeling methods.
arXiv link: http://arxiv.org/abs/2012.13805v4
Analysis of Randomized Experiments with Network Interference and Noncompliance
randomized experiments, the traditional approach has been based on the Stable
Unit Treatment Value (SUTVA: rubin) assumption which dictates that there
is no interference between individuals. However, the SUTVA assumption fails to
hold in many applications due to social interaction, general equilibrium,
and/or externality effects. While much progress has been made in relaxing the
SUTVA assumption, most of this literature has only considered a setting with
perfect compliance to treatment assignment. In practice, however, noncompliance
occurs frequently where the actual treatment receipt is different from the
assignment to the treatment. In this paper, we study causal effects in
randomized experiments with network interference and noncompliance. Spillovers
are allowed to occur at both treatment choice stage and outcome realization
stage. In particular, we explicitly model treatment choices of agents as a
binary game of incomplete information where resulting equilibrium treatment
choice probabilities affect outcomes of interest. Outcomes are further
characterized by a random coefficient model to allow for general unobserved
heterogeneity in the causal effects. After defining our causal parameters of
interest, we propose a simple control function estimator and derive its
asymptotic properties under large-network asymptotics. We apply our methods to
the randomized subsidy program of dupas where we find evidence of
spillover effects on both short-run and long-run adoption of
insecticide-treated bed nets. Finally, we illustrate the usefulness of our
methods by analyzing the impact of counterfactual subsidy policies.
arXiv link: http://arxiv.org/abs/2012.13710v1
Quantile regression with generated dependent variable and covariates
variable are not directly observed but estimated in an initial first step and
used in the second step quantile regression for estimating the quantile
parameters. This general class of generated quantile regression (GQR) covers
various statistical applications, for instance, estimation of endogenous
quantile regression models and triangular structural equation models, and some
new relevant applications are discussed. We study the asymptotic distribution
of the two-step estimator, which is challenging because of the presence of
generated covariates and/or dependent variable in the non-smooth quantile
regression estimator. We employ techniques from empirical process theory to
find uniform Bahadur expansion for the two step estimator, which is used to
establish the asymptotic results. We illustrate the performance of the GQR
estimator through simulations and an empirical application based on auctions.
arXiv link: http://arxiv.org/abs/2012.13614v1
Filtering the intensity of public concern from social media count data with jumps
have drawn increasing interest among academics and market analysts over the
past decade. Transforming Web activity records into counts yields time series
with peculiar features, including the coexistence of smooth paths and sudden
jumps, as well as cross-sectional and temporal dependence. Using Twitter posts
about country risks for the United Kingdom and the United States, this paper
proposes an innovative state space model for multivariate count data with
jumps. We use the proposed model to assess the impact of public concerns in
these countries on market systems. To do so, public concerns inferred from
Twitter data are unpacked into country-specific persistent terms, risk social
amplification events, and co-movements of the country series. The identified
components are then used to investigate the existence and magnitude of
country-risk spillovers and social amplification effects on the volatility of
financial markets.
arXiv link: http://arxiv.org/abs/2012.13267v1
Machine Learning Advances for Time Series Forecasting
learning and high-dimensional models for time series forecasting. We consider
both linear and nonlinear alternatives. Among the linear methods we pay special
attention to penalized regressions and ensemble of models. The nonlinear
methods considered in the paper include shallow and deep neural networks, in
their feed-forward and recurrent versions, and tree-based methods, such as
random forests and boosted trees. We also consider ensemble and hybrid models
by combining ingredients from different alternatives. Tests for superior
predictive ability are briefly reviewed. Finally, we discuss application of
machine learning in economics and finance and provide an illustration with
high-frequency financial data.
arXiv link: http://arxiv.org/abs/2012.12802v3
Invidious Comparisons: Ranking and Selection as Compound Decisions
mentality," to construct rankings. Schools, hospitals, sports teams, movies,
and myriad other objects are ranked even though their inherent
multi-dimensionality would suggest that -- at best -- only partial orderings
were possible. We consider a large class of elementary ranking problems in
which we observe noisy, scalar measurements of merit for $n$ objects of
potentially heterogeneous precision and are asked to select a group of the
objects that are "most meritorious." The problem is naturally formulated in the
compound decision framework of Robbins's (1956) empirical Bayes theory, but it
also exhibits close connections to the recent literature on multiple testing.
The nonparametric maximum likelihood estimator for mixture models (Kiefer and
Wolfowitz (1956)) is employed to construct optimal ranking and selection rules.
Performance of the rules is evaluated in simulations and an application to
ranking U.S kidney dialysis centers.
arXiv link: http://arxiv.org/abs/2012.12550v3
Split-then-Combine simplex combination and selection of forecasters
Juan, 2014) to combine forecasts inside the simplex space, the sample space of
positive weights adding up to one. As it turns out, the simplicial statistic
given by the center of the simplex compares favorably against the fixed-weight,
average forecast. Besides, we also develop a Combine-After-Selection (CAS)
method to get rid of redundant forecasters. We apply these two approaches to
make out-of-sample one-step ahead combinations and subcombinations of forecasts
for several economic variables. This methodology is particularly useful when
the sample size is smaller than the number of forecasts, a case where other
methods (e.g., Least Squares (LS) or Principal Component Analysis (PCA)) are
not applicable.
arXiv link: http://arxiv.org/abs/2012.11935v1
Discordant Relaxations of Misspecified Models
characterization of the identified set. Therefore, researchers often rely on
non-sharp identification conditions, and empirical results are often based on
an outer set of the identified set. This practice is often viewed as
conservative yet valid because an outer set is always a superset of the
identified set. However, this paper shows that when the model is refuted by the
data, two sets of non-sharp identification conditions derived from the same
model could lead to disjoint outer sets and conflicting empirical results. We
provide a sufficient condition for the existence of such discordancy, which
covers models characterized by conditional moment inequalities and the Artstein
(1983) inequalities. We also derive sufficient conditions for the non-existence
of discordant submodels, therefore providing a class of models for which
constructing outer sets cannot lead to misleading interpretations. In the case
of discordancy, we follow Masten and Poirier (2021) by developing a method to
salvage misspecified models, but unlike them, we focus on discrete relaxations.
We consider all minimum relaxations of a refuted model that restores
data-consistency. We find that the union of the identified sets of these
minimum relaxations is robust to detectable misspecifications and has an
intuitive empirical interpretation.
arXiv link: http://arxiv.org/abs/2012.11679v5
On the Aggregation of Probability Assessments: Regularized Mixtures of Predictive Densities for Eurozone Inflation and Real Interest Rates
forecasts. We explore a variety of objectives and regularization penalties, and
we use them in a substantive exploration of Eurozone inflation and real
interest rate density forecasts. All individual inflation forecasters (even the
ex post best forecaster) are outperformed by our regularized mixtures. From the
Great Recession onward, the optimal regularization tends to move density
forecasts' probability mass from the centers to the tails, correcting for
overconfidence.
arXiv link: http://arxiv.org/abs/2012.11649v3
Uncertainty on the Reproduction Ratio in the SIR Model
estimated reproduction ratio $R_0$ observed in practice. For expository purpose
we consider a discrete time stochastic version of the
Susceptible-Infected-Recovered (SIR) model, and introduce different approximate
maximum likelihood (AML) estimators of $R_0$. We carefully discuss the
properties of these estimators and illustrate by a Monte-Carlo study the width
of confidence intervals on $R_0$.
arXiv link: http://arxiv.org/abs/2012.11542v1
A Nearly Similar Powerful Test for Mediation
Testing for mediation is empirically very important in psychology, sociology,
medicine, economics and business, generating over 100,000 citations to a single
key paper. The no-mediation hypothesis $H_{0}:\theta_{1}\theta _{2}=0$ also
poses a theoretically interesting statistical problem since it defines a
manifold that is non-regular in the origin where rejection probabilities of
standard tests are extremely low. We prove that a similar test for mediation
only exists if the size is the reciprocal of an integer. It is unique, but has
objectionable properties. We propose a new test that is nearly similar with
power close to the envelope without these abject properties and is easy to use
in practice. Construction uses the general varying $g$-method that we propose.
We illustrate the results in an educational setting with gender role beliefs
and in a trade union sentiment application.
arXiv link: http://arxiv.org/abs/2012.11342v2
Weak Identification with Bounds in a Class of Minimum Distance Models
valuable source of information. Existing weak identification estimation and
inference results are unable to combine weak identification with bounds. Within
a class of minimum distance models, this paper proposes identification-robust
inference that incorporates information from bounds when parameters are weakly
identified. This paper demonstrates the value of the bounds and
identification-robust inference in a simple latent factor model and a simple
GARCH model. This paper also demonstrates the identification-robust inference
in an empirical application, a factor model for parental investments in
children.
arXiv link: http://arxiv.org/abs/2012.11222v5
Binary Classification Tests, Imperfect Standards, and Ambiguous Information
pre-established test. For example, rapid Antigen tests for the detection of
SARS-CoV-2 are assessed relative to more established PCR tests. In this paper,
I argue that the new test can be described as producing ambiguous information
when the pre-established is imperfect. This allows for a phenomenon called
dilation -- an extreme form of non-informativeness. As an example, I present
hypothetical test data satisfying the WHO's minimum quality requirement for
rapid Antigen tests which leads to dilation. The ambiguity in the information
arises from a missing data problem due to imperfection of the established test:
the joint distribution of true infection and test results is not observed.
Using results from Copula theory, I construct the (usually non-singleton) set
of all these possible joint distributions, which allows me to assess the new
test's informativeness. This analysis leads to a simple sufficient condition to
make sure that a new test is not a dilation. I illustrate my approach with
applications to data from three COVID-19 related tests. Two rapid Antigen tests
satisfy my sufficient condition easily and are therefore informative. However,
less accurate procedures, like chest CT scans, may exhibit dilation.
arXiv link: http://arxiv.org/abs/2012.11215v3
Policy Transforms and Learning Optimal Policies
environments using models that may be incomplete and/or partially identified.
We consider a policymaker who wishes to choose a policy to maximize a
particular counterfactual quantity called a policy transform. We characterize
learnability of a set of policy options by the existence of a decision rule
that closely approximates the maximin optimal value of the policy transform
with high probability. Sufficient conditions are provided for the existence of
such a rule. However, learnability of an optimal policy is an ex-ante notion
(i.e. before observing a sample), and so ex-post (i.e. after observing a
sample) theoretical guarantees for certain policy rules are also provided. Our
entire approach is applicable when the distribution of unobservables is not
parametrically specified, although we discuss how semiparametric restrictions
can be used. Finally, we show possible applications of the procedure to a
simultaneous discrete choice example and a program evaluation example.
arXiv link: http://arxiv.org/abs/2012.11046v1
Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem
prevalent in both research and practice. A common empirical strategy involves
the application of predictive modeling techniques to 'mine' variables of
interest from available data, followed by the inclusion of those variables into
an econometric framework, with the objective of estimating causal effects.
Recent work highlights that, because the predictions from machine learning
models are inevitably imperfect, econometric analyses based on the predicted
variables are likely to suffer from bias due to measurement error. We propose a
novel approach to mitigate these biases, leveraging the ensemble learning
technique known as the random forest. We propose employing random forest not
just for prediction, but also for generating instrumental variables to address
the measurement error embedded in the prediction. The random forest algorithm
performs best when comprised of a set of trees that are individually accurate
in their predictions, yet which also make 'different' mistakes, i.e., have
weakly correlated prediction errors. A key observation is that these properties
are closely related to the relevance and exclusion requirements of valid
instrumental variables. We design a data-driven procedure to select tuples of
individual trees from a random forest, in which one tree serves as the
endogenous covariate and the other trees serve as its instruments. Simulation
experiments demonstrate the efficacy of the proposed approach in mitigating
estimation biases and its superior performance over three alternative methods
for bias correction.
arXiv link: http://arxiv.org/abs/2012.10790v1
Kernel Methods for Unobserved Confounding: Negative Controls, Proxies, and Instruments
treatment and outcome in the presence of unmeasured confounding. The treatment
effect can nonetheless be identified if two auxiliary variables are available:
a negative control treatment (which has no effect on the actual outcome), and a
negative control outcome (which is not affected by the actual treatment). These
auxiliary variables can also be viewed as proxies for a traditional set of
control variables, and they bear resemblance to instrumental variables. I
propose a family of algorithms based on kernel ridge regression for learning
nonparametric treatment effects with negative controls. Examples include dose
response curves, dose response curves with distribution shift, and
heterogeneous treatment effects. Data may be discrete or continuous, and low,
high, or infinite dimensional. I prove uniform consistency and provide finite
sample rates of convergence. I estimate the dose response curve of cigarette
smoking on infant birth weight adjusting for unobserved confounding due to
household income, using a data set of singleton births in the state of
Pennsylvania between 1989 and 1991.
arXiv link: http://arxiv.org/abs/2012.10315v5
Two-way Fixed Effects and Differences-in-Differences Estimators with Several Treatments
variables. Under a parallel trends assumption, we show that the coefficient on
each treatment identifies a weighted sum of that treatment's effect, with
possibly negative weights, plus a weighted sum of the effects of the other
treatments. Thus, those estimators are not robust to heterogeneous effects and
may be contaminated by other treatments' effects. We further show that omitting
a treatment from the regression can actually reduce the estimator's bias,
unlike what would happen under constant treatment effects. We propose an
alternative difference-in-differences estimator, robust to heterogeneous
effects and immune to the contamination problem. In the application we
consider, the TWFE regression identifies a highly non-convex combination of
effects, with large contamination weights, and one of its coefficients
significantly differs from our heterogeneity-robust estimator.
arXiv link: http://arxiv.org/abs/2012.10077v8
The Variational Method of Moments
structural causal parameters in terms of observables, a prominent example being
instrumental variable regression. A standard approach reduces the problem to a
finite set of marginal moment conditions and applies the optimally weighted
generalized method of moments (OWGMM), but this requires we know a finite set
of identifying moments, can still be inefficient even if identifying, or can be
theoretically efficient but practically unwieldy if we use a growing sieve of
moment conditions. Motivated by a variational minimax reformulation of OWGMM,
we define a very general class of estimators for the conditional moment
problem, which we term the variational method of moments (VMM) and which
naturally enables controlling infinitely-many moments. We provide a detailed
theoretical analysis of multiple VMM estimators, including ones based on kernel
methods and neural nets, and provide conditions under which these are
consistent, asymptotically normal, and semiparametrically efficient in the full
conditional moment model. We additionally provide algorithms for valid
statistical inference based on the same kind of variational reformulations,
both for kernel- and neural-net-based varieties. Finally, we demonstrate the
strong performance of our proposed estimation and inference algorithms in a
detailed series of synthetic experiments.
arXiv link: http://arxiv.org/abs/2012.09422v4
Exact Trend Control in Estimating Treatment Effects Using Panel Data with Heterogenous Trends
outcomes constructed by Abadie et al., Hsiao et al. (2012), and Doudchenko and
Imbens (2017) may all be confounded by uncontrolled heterogenous trends. Based
on exact-matching on the trend predictors, I propose new methods of estimating
the model-specific treatment effects, which are free from heterogenous trends.
When applied to Abadie et al.'s (2010) model and data, the new estimators
suggest considerably smaller effects of California's tobacco control program.
arXiv link: http://arxiv.org/abs/2012.08988v1
United States FDA drug approvals are persistent and polycyclic: Insights into economic cycles, innovation dynamics, and national policy
(such as economic or policy) on the rate of US drug approvals. Here, a novel
approach, termed the Chronological Hurst Exponent (CHE), is proposed, which
hypothesizes that changes in the long-range memory latent within the dynamics
of time series data may be temporally associated with changes in such
influences. Using the monthly number the FDA Center for Drug Evaluation and
Research (CDER) approvals from 1939 to 2019 as the data source, it is
demonstrated that the CHE has a distinct S-shaped structure demarcated by an
8-year (1939-1947) Stagnation Period, a 27-year (1947-1974) Emergent
(time-varying Period, and a 45-year (1974-2019) Saturation Period. Further,
dominant periodicities (resolved via wavelet analyses) are identified during
the most recent 45-year CHE Saturation Period at 17, 8 and 4 years; thus, US
drug approvals have been following a Juglar-Kuznet mid-term cycle with
Kitchin-like bursts. As discussed, this work suggests that (1) changes in
extrinsic factors (e.g., of economic and/or policy origin ) during the Emergent
Period may have led to persistent growth in US drug approvals enjoyed since
1974, (2) the CHE may be a valued method to explore influences on time series
data, and (3) innovation-related economic cycles exist (as viewed via the proxy
metric of US drug approvals).
arXiv link: http://arxiv.org/abs/2012.09627v3
Minimax Risk and Uniform Convergence Rates for Nonparametric Dyadic Regression
large population. For each unit we observe the vector of regressors $X_{i}$
and, for each of the $N\left(N-1\right)$ ordered pairs of units, an outcome
$Y_{ij}$. The outcomes $Y_{ij}$ and $Y_{kl}$ are independent if their indices
are disjoint, but dependent otherwise (i.e., "dyadically dependent"). Let
$W_{ij}=\left(X_{i}',X_{j}'\right)'$; using the sampled data we seek to
construct a nonparametric estimate of the mean regression function
$g\left(W_{ij}\right)def{\equiv}E\left[\left.Y_{ij}\right|X_{i},X_{j}\right].$
We present two sets of results. First, we calculate lower bounds on the
minimax risk for estimating the regression function at (i) a point and (ii)
under the infinity norm. Second, we calculate (i) pointwise and (ii) uniform
convergence rates for the dyadic analog of the familiar Nadaraya-Watson (NW)
kernel regression estimator. We show that the NW kernel regression estimator
achieves the optimal rates suggested by our risk bounds when an appropriate
bandwidth sequence is chosen. This optimal rate differs from the one available
under iid data: the effective sample size is smaller and
$d_W=dim(W_{ij})$ influences the rate differently.
arXiv link: http://arxiv.org/abs/2012.08444v2
Long-term prediction intervals with many covariates
econometric time-series. Often practitioners and policy makers want to predict
outcomes of an entire time horizon in the future instead of just a single
$k$-step ahead prediction. These series, apart from their own possible
non-linear dependence, are often also influenced by many external predictors.
In this paper, we construct prediction intervals of time-aggregated forecasts
in a high-dimensional regression setting. Our approach is based on quantiles of
residuals obtained by the popular LASSO routine. We allow for general
heavy-tailed, long-memory, and nonlinear stationary error process and
stochastic predictors. Through a series of systematically arranged consistency
results we provide theoretical guarantees of our proposed quantile-based method
in all of these scenarios. After validating our approach using simulations we
also propose a novel bootstrap based method that can boost the coverage of the
theoretical intervals. Finally analyzing the EPEX Spot data, we construct
prediction intervals for hourly electricity prices over horizons spanning 17
weeks and contrast them to selected Bayesian and bootstrap interval forecasts.
arXiv link: http://arxiv.org/abs/2012.08223v2
Real-time Inflation Forecasting Using Non-linear Dimension Reduction Techniques
techniques pays off for forecasting inflation in real-time. Several recent
methods from the machine learning literature are adopted to map a large
dimensional dataset into a lower dimensional set of latent factors. We model
the relationship between inflation and the latent factors using constant and
time-varying parameter (TVP) regressions with shrinkage priors. Our models are
then used to forecast monthly US inflation in real-time. The results suggest
that sophisticated dimension reduction methods yield inflation forecasts that
are highly competitive to linear approaches based on principal components.
Among the techniques considered, the Autoencoder and squared principal
components yield factors that have high predictive power for one-month- and
one-quarter-ahead inflation. Zooming into model performance over time reveals
that controlling for non-linear relations in the data is of particular
importance during recessionary episodes of the business cycle or the current
COVID-19 pandemic.
arXiv link: http://arxiv.org/abs/2012.08155v3
Identification of inferential parameters in the covariate-normalized linear conditional logit model
customers' product feature preferences using choice data. Using these models at
scale, however, can result in numerical imprecision and optimization failure
due to a combination of large-valued covariates and the softmax probability
function. Standard machine learning approaches alleviate these concerns by
applying a normalization scheme to the matrix of covariates, scaling all values
to sit within some interval (such as the unit simplex). While this type of
normalization is innocuous when using models for prediction, it has the side
effect of perturbing the estimated coefficients, which are necessary for
researchers interested in inference. This paper shows that, for two common
classes of normalizers, designated scaling and centered scaling, the
data-generating non-scaled model parameters can be analytically recovered along
with their asymptotic distributions. The paper also shows the numerical
performance of the analytical results using an example of a scaling normalizer.
arXiv link: http://arxiv.org/abs/2012.08022v1
Trademark filings and patent application count time series are structurally near-identical and cointegrated: Implications for studies in innovation
extends the trademark/patent inter-relationship as proposed in the normative
intellectual-property (IP)-oriented Innovation Agenda view of the science and
technology (S&T) firm. Beyond simple correlation, it is shown that
trademark-filing (Trademarks) and patent-application counts (Patents) have
similar (if not, identical) structural attributes (including similar
distribution characteristics and seasonal variation, cross-wavelet
synchronicity/coherency (short-term cross-periodicity) and structural breaks)
and are cointegrated (integration order of 1) over a period of approximately 40
years (given the monthly observations). The existence of cointegration strongly
suggests a "long-run" equilibrium between the two indices; that is, there is
(are) exogenous force(s) restraining the two indices from diverging from one
another. Structural breakpoints in the chrono-dynamics of the indices supports
the existence of potentially similar exogeneous forces(s), as the break dates
are simultaneous/near-simultaneous (Trademarks: 1987, 1993, 1999, 2005, 2011;
Patents: 1988, 1994, 2000, and 2011). A discussion of potential triggers
(affecting both time series) causing these breaks, and the concept of
equilibrium in the context of these proxy measures are presented. The
cointegration order and structural co-movements resemble other macro-economic
variables, stoking the opportunity of using econometrics approaches to further
analyze these data. As a corollary, this work further supports the inclusion of
trademark analysis in innovation studies. Lastly, the data and corresponding
analysis tools (R program) are presented as Supplementary Materials for
reproducibility and convenience to conduct future work for interested readers.
arXiv link: http://arxiv.org/abs/2012.10400v1
Welfare Analysis via Marginal Treatment Effects
confoundedness) in empirical data, where an instrumental variable is available.
In this setting, we show that the mean social welfare function can be
identified and represented via the marginal treatment effect (MTE, Bjorklund
and Moffitt, 1987) as the operator kernel. This representation result can be
applied to a variety of statistical decision rules for treatment choice,
including plug-in rules, Bayes rules, and empirical welfare maximization (EWM)
rules as in Hirano and Porter (2020, Section 2.3). Focusing on the application
to the EWM framework of Kitagawa and Tetenov (2018), we provide convergence
rates of the worst case average welfare loss (regret) in the spirit of Manski
(2004).
arXiv link: http://arxiv.org/abs/2012.07624v1
Occupational segregation in a Roy model with composition preferences
comparative advantage, as in the Roy model, and sector composition preference.
Two groups choose between two sectors based on heterogeneous potential incomes
and group compositions in each sector. Potential incomes incorporate group
specific human capital accumulation and wage discrimination. Composition
preferences are interpreted as reflecting group specific amenity preferences as
well as homophily and aversion to minority status. We show that occupational
segregation is amplified by the composition preferences and we highlight a
resulting tension between redistribution and diversity. The model also exhibits
tipping from extreme compositions to more balanced ones. Tipping occurs when a
small nudge, associated with affirmative action, pushes the system to a very
different equilibrium, and when the set of equilibria changes abruptly when a
parameter governing the relative importance of pecuniary and composition
preferences crosses a threshold.
arXiv link: http://arxiv.org/abs/2012.04485v3
Forecasting the Olympic medal distribution during a pandemic: a socio-economic machine learning model
for different stakeholders: Ex ante, sports betting companies can determine the
odds while sponsors and media companies can allocate their resources to
promising teams. Ex post, sports politicians and managers can benchmark the
performance of their teams and evaluate the drivers of success. To
significantly increase the Olympic medal forecasting accuracy, we apply machine
learning, more specifically a two-staged Random Forest, thus outperforming more
traditional na\"ive forecast for three previous Olympics held between 2008 and
2016 for the first time. Regarding the Tokyo 2020 Games in 2021, our model
suggests that the United States will lead the Olympic medal table, winning 120
medals, followed by China (87) and Great Britain (74). Intriguingly, we predict
that the current COVID-19 pandemic will not significantly alter the medal count
as all countries suffer from the pandemic to some extent (data inherent) and
limited historical data points on comparable diseases (model inherent).
arXiv link: http://arxiv.org/abs/2012.04378v2
Who Should Get Vaccinated? Individualized Allocation of Vaccines Over SIR Network
important policy decisions in pandemic times. This paper develops a procedure
to estimate an individualized vaccine allocation policy under limited supply,
exploiting social network data containing individual demographic
characteristics and health status. We model spillover effects of the vaccines
based on a Heterogeneous-Interacted-SIR network model and estimate an
individualized vaccine allocation policy by maximizing an estimated social
welfare (public health) criterion incorporating the spillovers. While this
optimization problem is generally an NP-hard integer optimization problem, we
show that the SIR structure leads to a submodular objective function, and
provide a computationally attractive greedy algorithm for approximating a
solution that has theoretical performance guarantee. Moreover, we characterise
a finite sample welfare regret bound and examine how its uniform convergence
rate depends on the complexity and riskiness of social network. In the
simulation, we illustrate the importance of considering spillovers by comparing
our method with targeting without network information.
arXiv link: http://arxiv.org/abs/2012.04055v4
Asymptotic Normality for Multivariate Random Forest Estimators
estimators in practical applications. A recent paper by Athey and Wager shows
that the random forest estimate at any point is asymptotically Gaussian; in
this paper, we extend this result to the multivariate case and show that the
vector of estimates at multiple points is jointly normal. Specifically, the
covariance matrix of the limiting normal distribution is diagonal, so that the
estimates at any two points are independent in sufficiently deep trees.
Moreover, the off-diagonal term is bounded by quantities capturing how likely
two points belong to the same partition of the resulting tree. Our results
relies on certain a certain stability property when constructing splits, and we
give examples of splitting rules for which this assumption is and is not
satisfied. We test our proposed covariance bound and the associated coverage
rates of confidence intervals in numerical simulations.
arXiv link: http://arxiv.org/abs/2012.03486v3
Binary Response Models for Heterogeneous Panel Data with Interactive Fixed Effects
data with interactive fixed effects by allowing both the cross-sectional
dimension and the temporal dimension to diverge. From a practical point of
view, the proposed framework can be applied to predict the probability of
corporate failure, conduct credit rating analysis, etc. Theoretically and
methodologically, we establish a link between a maximum likelihood estimation
and a least squares approach, provide a simple information criterion to detect
the number of factors, and achieve the asymptotic distributions accordingly. In
addition, we conduct intensive simulations to examine the theoretical findings.
In the empirical study, we focus on the sign prediction of stock returns, and
then use the results of sign forecast to conduct portfolio analysis.
arXiv link: http://arxiv.org/abs/2012.03182v2
Forecasting: theory and practice
The uncertainty that surrounds the future is both exciting and challenging,
with individuals and organisations seeking to minimise risks and maximise
utilities. The large number of forecasting applications calls for a diverse set
of forecasting methods to tackle real-life challenges. This article provides a
non-systematic review of the theory and the practice of forecasting. We provide
an overview of a wide range of theoretical, state-of-the-art models, methods,
principles, and approaches to prepare, produce, organise, and evaluate
forecasts. We then demonstrate how such theoretical concepts are applied in a
variety of real-life contexts.
We do not claim that this review is an exhaustive list of methods and
applications. However, we wish that our encyclopedic presentation will offer a
point of reference for the rich work that has been undertaken over the last
decades, with some key insights for the future of forecasting theory and
practice. Given its encyclopedic nature, the intended mode of reading is
non-linear. We offer cross-references to allow the readers to navigate through
the various topics. We complement the theoretical concepts and applications
covered by large lists of free or open-source software implementations and
publicly-available databases.
arXiv link: http://arxiv.org/abs/2012.03854v4
A Multivariate Realized GARCH Model
realized measures of volatility and correlations. The key innovation is an
unconstrained vector parametrization of the conditional correlation matrix,
which enables the use of factor models for correlations. This approach
elegantly addresses the main challenge faced by multivariate GARCH models in
high-dimensional settings. As an illustration, we explore block correlation
matrices that naturally simplify to linear factor models for the conditional
correlations. The model is applied to the returns of nine assets, and its
in-sample and out-of-sample performance compares favorably against several
popular benchmarks.
arXiv link: http://arxiv.org/abs/2012.02708v3
A Canonical Representation of Block Matrices with Applications to Covariance and Correlation Matrices
facilitates simple computation of the determinant, the matrix inverse, and
other powers of a block matrix, as well as the matrix logarithm and the matrix
exponential. These results are particularly useful for block covariance and
block correlation matrices, where evaluation of the Gaussian log-likelihood and
estimation are greatly simplified. We illustrate this with an empirical
application using a large panel of daily asset returns. Moreover, the
representation paves new ways to regularizing large covariance/correlation
matrices, test block structures in matrices, and estimate regressions with many
variables.
arXiv link: http://arxiv.org/abs/2012.02698v2
Asymmetric uncertainty : Nowcasting using skewness in real-time data
producing density nowcasts of GDP growth. The approach relies on modelling
location, scale and shape common factors in real-time macroeconomic data. While
movements in the location generate shifts in the central part of the predictive
density, the scale controls its dispersion (akin to general uncertainty) and
the shape its asymmetry, or skewness (akin to downside and upside risks). The
empirical application is centred on US GDP growth and the real-time data come
from Fred-MD. The results show that there is more to real-time data than their
levels or means: their dispersion and asymmetry provide valuable information
for nowcasting economic activity. Scale and shape common factors (i) yield more
reliable measures of uncertainty and (ii) improve precision when macroeconomic
uncertainty is at its peak.
arXiv link: http://arxiv.org/abs/2012.02601v4
A New Parametrization of Correlation Matrices
reparametrization facilitates modeling of correlation and covariance matrices
by an unrestricted vector, where positive definiteness is an innate property.
This parametrization can be viewed as a generalization of Fisther's
Z-transformation to higher dimensions and has a wide range of potential
applications. An algorithm for reconstructing the unique n x n correlation
matrix from any d-dimensional vector (with d = n(n-1)/2) is provided, and we
derive its numerical complexity.
arXiv link: http://arxiv.org/abs/2012.02395v1
Sharp Bounds in the Latent Index Selection Model
is: what can we learn about parameters that are relevant for policy but not
necessarily point-identified by the exogenous variation we observe? This paper
provides an answer in terms of sharp, analytic characterizations and bounds for
an important class of policy-relevant treatment effects, consisting of marginal
treatment effects and linear functionals thereof, in the latent index selection
model as formalized in Vytlacil (2002). The sharp bounds use the full content
of identified marginal distributions, and analytic derivations rely on the
theory of stochastic orders. The proposed methods also make it possible to
sharply incorporate new auxiliary assumptions on distributions into the latent
index selection framework. Empirically, I apply the methods to study the
effects of Medicaid on emergency room utilization in the Oregon Health
Insurance Experiment, showing that the predictions from extrapolations based on
a distribution assumption (rank similarity) differ substantively and
consistently from existing extrapolations based on a parametric mean assumption
(linearity). This underscores the value of utilizing the model's full empirical
content in combination with auxiliary assumptions.
arXiv link: http://arxiv.org/abs/2012.02390v2
Inference in mixed causal and noncausal models with generalized Student's t-distributions
models with a generalized Student's t error process are reviewed. Several known
existing methods are typically not applicable in the heavy-tailed framework. To
this end, a new approach to make inference on causal and noncausal parameters
in finite sample sizes is proposed. It exploits the empirical variance of the
generalized Student's-t, without the existence of population variance. Monte
Carlo simulations show a good performance of the new variance construction for
fat tail series. Finally, different existing approaches are compared using
three empirical applications: the variation of daily COVID-19 deaths in
Belgium, the monthly wheat prices, and the monthly inflation rate in Brazil.
arXiv link: http://arxiv.org/abs/2012.01888v2
Competition analysis on the over-the-counter credit default swap market
data collected as part of the EMIR regulation.
First, we study the competition between central counterparties through
collateral requirements. We present models that successfully estimate the
initial margin requirements. However, our estimations are not precise enough to
use them as input to a predictive model for CCP choice by counterparties in the
OTC market.
Second, we model counterpart choice on the interdealer market using a novel
semi-supervised predictive task. We present our methodology as part of the
literature on model interpretability before arguing for the use of conditional
entropy as the metric of interest to derive knowledge from data through a
model-agnostic approach. In particular, we justify the use of deep neural
networks to measure conditional entropy on real-world datasets. We create the
$Razor entropy$ using the framework of algorithmic information theory
and derive an explicit formula that is identical to our semi-supervised
training objective. Finally, we borrow concepts from game theory to define
$top-k Shapley values$. This novel method of payoff distribution
satisfies most of the properties of Shapley values, and is of particular
interest when the value function is monotone submodular. Unlike classical
Shapley values, top-k Shapley values can be computed in quadratic time of the
number of features instead of exponential. We implement our methodology and
report the results on our particular task of counterpart choice.
Finally, we present an improvement to the $node2vec$ algorithm that
could for example be used to further study intermediation. We show that the
neighbor sampling used in the generation of biased walks can be performed in
logarithmic time with a quasilinear time pre-computation, unlike the current
implementations that do not scale well.
arXiv link: http://arxiv.org/abs/2012.01883v1
Bull and Bear Markets During the COVID-19 Pandemic
activity worldwide. We assess what happened to the aggregate U.S. stock market
during this period, including implications for both short and long-horizon
investors. Using the model of Maheu, McCurdy and Song (2012), we provide
smoothed estimates and out-of-sample forecasts associated with stock market
dynamics during the pandemic. We identify bull and bear market regimes
including their bull correction and bear rally components, demonstrate the
model's performance in capturing periods of significant regime change, and
provide forecasts that improve risk management and investment decisions. The
paper concludes with out-of-sample forecasts of market states one year ahead.
arXiv link: http://arxiv.org/abs/2012.01623v1
Testable Implications of Multiple Equilibria in Discrete Games with Correlated Types
incomplete information. Unlike de Paula and Tang (2012), we allow the players'
private signals to be correlated. In static games, we leverage independence of
private types across games whose equilibrium selection is correlated. In
dynamic games with serially correlated discrete unobserved heterogeneity, our
testable implication builds on the fact that the distribution of a sequence of
choices and states are mixtures over equilibria and unobserved heterogeneity.
The number of mixture components is a known function of the length of the
sequence as well as the cardinality of equilibria and unobserved heterogeneity
support. In both static and dynamic cases, these testable implications are
implementable using existing statistical tools.
arXiv link: http://arxiv.org/abs/2012.00787v1
Evaluating (weighted) dynamic treatment effects by double machine learning
multiple treatment sequences in various periods, based on double machine
learning to control for observed, time-varying covariates in a data-driven way
under a selection-on-observables assumption. To this end, we make use of
so-called Neyman-orthogonal score functions, which imply the robustness of
treatment effect estimation to moderate (local) misspecifications of the
dynamic outcome and treatment models. This robustness property permits
approximating outcome and treatment models by double machine learning even
under high dimensional covariates and is combined with data splitting to
prevent overfitting. In addition to effect estimation for the total population,
we consider weighted estimation that permits assessing dynamic treatment
effects in specific subgroups, e.g. among those treated in the first treatment
period. We demonstrate that the estimators are asymptotically normal and
$n$-consistent under specific regularity conditions and investigate
their finite sample properties in a simulation study. Finally, we apply the
methods to the Job Corps study in order to assess different sequences of
training programs under a large set of covariates.
arXiv link: http://arxiv.org/abs/2012.00370v5
Double machine learning for sample selection models
outcomes are only observed for a subpopulation due to sample selection or
outcome attrition. For identification, we combine a selection-on-observables
assumption for treatment assignment with either selection-on-observables or
instrumental variable assumptions concerning the outcome attrition/sample
selection process. We also consider dynamic confounding, meaning that
covariates that jointly affect sample selection and the outcome may (at least
partly) be influenced by the treatment. To control in a data-driven way for a
potentially high dimensional set of pre- and/or post-treatment covariates, we
adapt the double machine learning framework for treatment evaluation to sample
selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and
efficient score functions, which imply the robustness of treatment effect
estimation to moderate regularization biases in the machine learning-based
estimation of the outcome, treatment, or sample selection models and (b) sample
splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that
the proposed estimators are asymptotically normal and root-n consistent under
specific regularity conditions concerning the machine learners and investigate
their finite sample properties in a simulation study. We also apply our
proposed methodology to the Job Corps data for evaluating the effect of
training on hourly wages which are only observed conditional on employment. The
estimator is available in the causalweight package for the statistical software
R.
arXiv link: http://arxiv.org/abs/2012.00745v5
An Automatic Finite-Sample Robustness Metric: When Can Dropping a Little Data Make a Big Difference?
policy decisions in non-random ways. Researchers typically believe that such
departures from random sampling -- due to changes in the population over time
and space, or difficulties in sampling truly randomly -- are small, and their
corresponding impact on the inference should be small as well. We might
therefore be concerned if the conclusions of our studies are excessively
sensitive to a very small proportion of our sample data. We propose a method to
assess the sensitivity of applied econometric conclusions to the removal of a
small fraction of the sample. Manually checking the influence of all possible
small subsets is computationally infeasible, so we use an approximation to find
the most influential subset. Our metric, the "Approximate Maximum Influence
Perturbation," is based on the classical influence function, and is
automatically computable for common methods including (but not limited to) OLS,
IV, MLE, GMM, and variational Bayes. We provide finite-sample error bounds on
approximation performance. At minimal extra cost, we provide an exact
finite-sample lower bound on sensitivity. We find that sensitivity is driven by
a signal-to-noise ratio in the inference problem, is not reflected in standard
errors, does not disappear asymptotically, and is not due to misspecification.
While some empirical applications are robust, results of several influential
economics papers can be overturned by removing less than 1% of the sample.
arXiv link: http://arxiv.org/abs/2011.14999v5
Adaptive Inference in Multivariate Nonparametric Regression Models Under Monotonicity
point under a multivariate nonparametric regression setting. The regression
function belongs to a H\"older class and is assumed to be monotone with respect
to some or all of the arguments. We derive the minimax rate of convergence for
confidence intervals (CIs) that adapt to the underlying smoothness, and provide
an adaptive inference procedure that obtains this minimax rate. The procedure
differs from that of Cai and Low (2004), intended to yield shorter CIs under
practically relevant specifications. The proposed method applies to general
linear functionals of the regression function, and is shown to have favorable
performance compared to existing inference procedures.
arXiv link: http://arxiv.org/abs/2011.14219v1
Inference in Regression Discontinuity Designs under Monotonicity
design (RDD) under monotonicity, with possibly multiple running variables.
Specifically, we consider the case where the true regression function is
monotone with respect to (all or some of) the running variables and assumed to
lie in a Lipschitz smoothness class. Such a monotonicity condition is natural
in many empirical contexts, and the Lipschitz constant has an intuitive
interpretation. We propose a minimax two-sided confidence interval (CI) and an
adaptive one-sided CI. For the two-sided CI, the researcher is required to
choose a Lipschitz constant where she believes the true regression function to
lie in. This is the only tuning parameter, and the resulting CI has uniform
coverage and obtains the minimax optimal length. The one-sided CI can be
constructed to maintain coverage over all monotone functions, providing maximum
credibility in terms of the choice of the Lipschitz constant. Moreover, the
monotonicity makes it possible for the (excess) length of the CI to adapt to
the true Lipschitz constant of the unknown regression function. Overall, the
proposed procedures make it easy to see under what conditions on the underlying
regression function the given estimates are significant, which can add more
transparency to research using RDD methods.
arXiv link: http://arxiv.org/abs/2011.14216v1
A Comparison of Statistical and Machine Learning Algorithms for Predicting Rents in the San Francisco Bay Area
modeling methods to develop model systems that are useful in planning
applications. Machine learning methods have been considered too 'black box',
lacking interpretability, and their use has been limited within the land use
and transportation modeling literature. We present a use case in which
predictive accuracy is of primary importance, and compare the use of random
forest regression to multiple regression using ordinary least squares, to
predict rents per square foot in the San Francisco Bay Area using a large
volume of rental listings scraped from the Craigslist website. We find that we
are able to obtain useful predictions from both models using almost exclusively
local accessibility variables, though the predictive accuracy of the random
forest model is substantially higher.
arXiv link: http://arxiv.org/abs/2011.14924v1
Simultaneous inference for time-varying models
paper. We estimate the regression coefficients by using local linear
M-estimation. For these estimators, weak Bahadur representations are obtained
and are used to construct simultaneous confidence bands. For practical
implementation, we propose a bootstrap based method to circumvent the slow
logarithmic convergence of the theoretical simultaneous bands. Our results
substantially generalize and unify the treatments for several time-varying
regression and auto-regression models. The performance for ARCH and GARCH
models is studied in simulations and a few real-life applications of our study
are presented through analysis of some popular financial datasets.
arXiv link: http://arxiv.org/abs/2011.13157v2
Implementation of a cost-benefit analysis of Demand-Responsive Transport with a Multi-Agent Transport Simulation
of a Demand Responsive Transport (DRT) service with the traffic simulation
software MATSim are elaborated in order to achieve the long-term goal of
assessing the introduction of a DRT service in G\"ottingen and the surrounding
area. The aim was to determine if the software is suitable for a cost-benefit
analysis while providing a user manual for building a basic simulation that can
be extended with public transport and DRT. The main result is that the software
is suitable for a cost-benefit analysis of a DRT service. In particular, the
most important internal and external costs, such as usage costs of the various
modes of transport and emissions, can be integrated into the simulation
scenarios. Thus, the scenarios presented in this paper can be extended by data
from a mobility study of G\"ottingen and its surroundings in order to achieve
the long-term goal. This paper is aimed at transport economists and researchers
who are not familiar with MATSim, to provide them with a guide for the first
steps in working with a traffic simulation software.
arXiv link: http://arxiv.org/abs/2011.12869v2
Functional Principal Component Analysis for Cointegrated Functional Time Series
in the development of functional time series analysis. This note investigates
how FPCA can be used to analyze cointegrated functional time series and
proposes a modification of FPCA as a novel statistical tool. Our modified FPCA
not only provides an asymptotically more efficient estimator of the
cointegrating vectors, but also leads to novel FPCA-based tests for examining
essential properties of cointegrated functional time series.
arXiv link: http://arxiv.org/abs/2011.12781v8
Doubly weighted M-estimation for nonrandom assignment and missing outcomes
twin problems of nonrandom treatment assignment and missing outcomes, both of
which are common issues in the treatment effects literature. The proposed class
is characterized by a `robustness' property, which makes it resilient to
parametric misspecification in either a conditional model of interest (for
example, mean or quantile function) or the two weighting functions. As leading
applications, the paper discusses estimation of two specific causal parameters;
average and quantile treatment effects (ATE, QTEs), which can be expressed as
functions of the doubly weighted estimator, under misspecification of the
framework's parametric components. With respect to the ATE, this paper shows
that the proposed estimator is doubly robust even in the presence of missing
outcomes. Finally, to demonstrate the estimator's viability in empirical
settings, it is applied to Calonico and Smith (2017)'s reconstructed sample
from the National Supported Work training program.
arXiv link: http://arxiv.org/abs/2011.11485v1
Non-Identifiability in Network Autoregressions
network. Most identification conditions that are available for these models
either rely on the network being observed repeatedly, are only sufficient, or
require strong distributional assumptions. This paper derives conditions that
apply even when the individuals composing the network are observed only once,
are necessary and sufficient for identification, and require weak
distributional assumptions. We find that the model parameters are generically,
in the measure theoretic sense, identified even without repeated observations,
and analyze the combinations of the interaction matrix and the regressor matrix
causing identification failures. This is done both in the original model and
after certain transformations in the sample space, the latter case being
relevant, for example, in some fixed effects specifications.
arXiv link: http://arxiv.org/abs/2011.11084v2
Exploiting network information to disentangle spillover effects in a field experiment on teens' museum attendance
and artistic heritage. We analyze a field experiment conducted in Florence
(Italy) to assess how appropriate incentives assigned to high-school classes
may induce teens to visit museums in their free time. Non-compliance and
spillover effects make the impact evaluation of this clustered encouragement
design challenging. We propose to blend principal stratification and causal
mediation, by defining sub-populations of units according to their compliance
behavior and using the information on their friendship networks as mediator. We
formally define principal natural direct and indirect effects and principal
controlled direct and spillover effects, and use them to disentangle spillovers
from other causal channels. We adopt a Bayesian approach for inference.
arXiv link: http://arxiv.org/abs/2011.11023v2
Nonparametric instrumental regression with right censored duration outcomes
treatment is not randomly assigned. The confounding issue is treated using a
discrete instrumental variable explaining the treatment and independent of the
error term of the model. Our framework is nonparametric and allows for random
right censoring. This specification generates a nonlinear inverse problem and
the average treatment effect is derived from its solution. We provide local and
global identification properties that rely on a nonlinear system of equations.
We propose an estimation procedure to solve this system and derive rates of
convergence and conditions under which the estimator is asymptotically normal.
When censoring makes identification fail, we develop partial identification
results. Our estimators exhibit good finite sample properties in simulations.
We also apply our methodology to the Illinois Reemployment Bonus Experiment.
arXiv link: http://arxiv.org/abs/2011.10423v1
A Semi-Parametric Bayesian Generalized Least Squares Estimator
estimator. In a generic setting where each error is a vector, the parametric
Generalized Least Square estimator maintains the assumption that each error
vector has the same distributional parameters. In reality, however, errors are
likely to be heterogeneous regarding their distributions. To cope with such
heterogeneity, a Dirichlet process prior is introduced for the distributional
parameters of the errors, leading to the error distribution being a mixture of
a variable number of normal distributions. Our method let the number of normal
components be data driven. Semi-parametric Bayesian estimators for two specific
cases are then presented: the Seemingly Unrelated Regression for equation
systems and the Random Effects Model for panel data. We design a series of
simulation experiments to explore the performance of our estimators. The
results demonstrate that our estimators obtain smaller posterior standard
deviations and mean squared errors than the Bayesian estimators using a
parametric mixture of normal distributions or a normal distribution. We then
apply our semi-parametric Bayesian estimators for equation systems and panel
data models to empirical data.
arXiv link: http://arxiv.org/abs/2011.10252v2
Visual Time Series Forecasting: An Image-driven Approach
Traditional approaches rely on statistical methods to forecast given past
numeric values. In practice, end-users often rely on visualizations such as
charts and plots to reason about their forecasts. Inspired by practitioners, we
re-imagine the topic by creating a novel framework to produce visual forecasts,
similar to the way humans intuitively do. In this work, we leverage advances in
deep learning to extend the field of time series forecasting to a visual
setting. We capture input data as an image and train a model to produce the
subsequent image. This approach results in predicting distributions as opposed
to pointwise values. We examine various synthetic and real datasets with
diverse degrees of complexity. Our experiments show that visual forecasting is
effective for cyclic data but somewhat less for irregular data such as stock
price. Importantly, when using image-based evaluation metrics, we find the
proposed visual forecasting method to outperform various numerical baselines,
including ARIMA and a numerical variation of our method. We demonstrate the
benefits of incorporating vision-based approaches in forecasting tasks -- both
for the quality of the forecasts produced, as well as the metrics that can be
used to evaluate them.
arXiv link: http://arxiv.org/abs/2011.09052v3
A Two-Way Transformed Factor Model for Matrix-Variate Time Series
series by a two-way transformation, where the transformed data consist of a
matrix-variate factor process, which is dynamically dependent, and three other
blocks of white noises. Specifically, for a given $p_1\times p_2$
matrix-variate time series, we seek common nonsingular transformations to
project the rows and columns onto another $p_1$ and $p_2$ directions according
to the strength of the dynamic dependence of the series on the past values.
Consequently, we treat the data as nonsingular linear row and column
transformations of dynamically dependent common factors and white noise
idiosyncratic components. We propose a common orthonormal projection method to
estimate the front and back loading matrices of the matrix-variate factors.
Under the setting that the largest eigenvalues of the covariance of the
vectorized idiosyncratic term diverge for large $p_1$ and $p_2$, we introduce a
two-way projected Principal Component Analysis (PCA) to estimate the associated
loading matrices of the idiosyncratic terms to mitigate such diverging noise
effects. A diagonal-path white noise testing procedure is proposed to estimate
the order of the factor matrix. %under the assumption that the idiosyncratic
term is a matrix-variate white noise process. Asymptotic properties of the
proposed method are established for both fixed and diverging dimensions as the
sample size increases to infinity. We use simulated and real examples to assess
the performance of the proposed method. We also compare our method with some
existing ones in the literature and find that the proposed approach not only
provides interpretable results but also performs well in out-of-sample
forecasting.
arXiv link: http://arxiv.org/abs/2011.09029v1
Policy design in experiments with unknown interference
policies with spillover effects. Units are organized into a finite number of
large clusters and interact in unknown ways within each cluster. First, we
introduce a single-wave experiment that, by varying the randomization across
cluster pairs, estimates the marginal effect of a change in treatment
probabilities, taking spillover effects into account. Using the marginal
effect, we propose a test for policy optimality. Second, we design a
multiple-wave experiment to estimate welfare-maximizing treatment rules. We
provide strong theoretical guarantees and an implementation in a large-scale
field experiment.
arXiv link: http://arxiv.org/abs/2011.08174v9
Causal motifs and existence of endogenous cascades in directed networks with application to company defaults
detection framework for an endogenous spreading based on causal motifs we
define in this paper. We assume that the change of state of a vertex can be
triggered by an endogenous or an exogenous event, that the underlying network
is directed and that times when vertices changed their states are available. In
addition to the data of company defaults, we also simulate cascades driven by
different stochastic processes on different synthetic networks. We show that
some of the smallest motifs can robustly detect endogenous spreading events.
Finally, we apply the method to the data of defaults of Croatian companies and
observe the time window in which an endogenous cascade was likely happening.
arXiv link: http://arxiv.org/abs/2011.08148v2
A Framework for Eliciting, Incorporating, and Disciplining Identification Beliefs in Linear Models
must impose beliefs. The instrumental variables exclusion restriction, for
example, represents the belief that the instrument has no direct effect on the
outcome of interest. Yet beliefs about instrument validity do not exist in
isolation. Applied researchers often discuss the likely direction of selection
and the potential for measurement error in their articles but lack formal tools
for incorporating this information into their analyses. Failing to use all
relevant information not only leaves money on the table; it runs the risk of
leading to a contradiction in which one holds mutually incompatible beliefs
about the problem at hand. To address these issues, we first characterize the
joint restrictions relating instrument invalidity, treatment endogeneity, and
non-differential measurement error in a workhorse linear model, showing how
beliefs over these three dimensions are mutually constrained by each other and
the data. Using this information, we propose a Bayesian framework to help
researchers elicit their beliefs, incorporate them into estimation, and ensure
their mutual coherence. We conclude by illustrating our framework in a number
of examples drawn from the empirical microeconomics literature.
arXiv link: http://arxiv.org/abs/2011.07276v1
Identifying the effect of a mis-classified, binary, endogenous regressor
endogenous regressor when a discrete-valued instrumental variable is available.
We begin by showing that the only existing point identification result for this
model is incorrect. We go on to derive the sharp identified set under mean
independence assumptions for the instrument and measurement error. The
resulting bounds are novel and informative, but fail to point identify the
effect of interest. This motivates us to consider alternative and slightly
stronger assumptions: we show that adding second and third moment independence
assumptions suffices to identify the model.
arXiv link: http://arxiv.org/abs/2011.07272v1
Rank Determination in Tensor Factor Model
time series, with a wide range of applications in economics, finance and
statistics. This paper develops two criteria for the determination of the
number of factors for tensor factor models where the signal part of an observed
tensor time series assumes a Tucker decomposition with the core tensor as the
factor tensor. The task is to determine the dimensions of the core tensor. One
of the proposed criteria is similar to information based criteria of model
selection, and the other is an extension of the approaches based on the ratios
of consecutive eigenvalues often used in factor analysis for panel time series.
Theoretically results, including sufficient conditions and convergence rates,
are established. The results include the vector factor models as special cases,
with an additional convergence rates. Simulation studies provide promising
finite sample performance for the two criteria.
arXiv link: http://arxiv.org/abs/2011.07131v3
A Generalized Focused Information Criterion for GMM
selection: the generalized focused information criterion (GFIC). Rather than
attempting to identify the "true" specification, the GFIC chooses from a set of
potentially mis-specified moment conditions and parameter restrictions to
minimize the mean-squared error (MSE) of a user-specified target parameter. The
intent of the GFIC is to formalize a situation common in applied practice. An
applied researcher begins with a set of fairly weak "baseline" assumptions,
assumed to be correct, and must decide whether to impose any of a number of
stronger, more controversial "suspect" assumptions that yield parameter
restrictions, additional moment conditions, or both. Provided that the baseline
assumptions identify the model, we show how to construct an asymptotically
unbiased estimator of the asymptotic MSE to select over these suspect
assumptions: the GFIC. We go on to provide results for post-selection inference
and model averaging that can be applied both to the GFIC and various
alternative selection criteria. To illustrate how our criterion can be used in
practice, we specialize the GFIC to the problem of selecting over exogeneity
assumptions and lag lengths in a dynamic panel model, and show that it performs
well in simulations. We conclude by applying the GFIC to a dynamic panel data
model for the price elasticity of cigarette demand.
arXiv link: http://arxiv.org/abs/2011.07085v1
Identifying Causal Effects in Experiments with Spillovers and Non-compliance
identify and estimate causal effects in the presence of spillovers--one
person's treatment may affect another's outcome--and one-sided
non-compliance--subjects can only be offered treatment, not compelled to take
it up. Two distinct causal effects are of interest in this setting: direct
effects quantify how a person's own treatment changes her outcome, while
indirect effects quantify how her peers' treatments change her outcome. We
consider the case in which spillovers occur within known groups, and take-up
decisions are invariant to peers' realized offers. In this setting we point
identify the effects of treatment-on-the-treated, both direct and indirect, in
a flexible random coefficients model that allows for heterogeneous treatment
effects and endogenous selection into treatment. We go on to propose a feasible
estimator that is consistent and asymptotically normal as the number and size
of groups increases. We apply our estimator to data from a large-scale job
placement services experiment, and find negative indirect treatment effects on
the likelihood of employment for those willing to take up the program. These
negative spillovers are offset by positive direct treatment effects from own
take-up.
arXiv link: http://arxiv.org/abs/2011.07051v3
Dynamic factor, leverage and realized covariances in multivariate stochastic volatility
has been found that the estimates of parameters become unstable as the
dimension of returns increases. To solve this problem, we focus on the factor
structure of multiple returns and consider two additional sources of
information: first, the realized stock index associated with the market factor,
and second, the realized covariance matrix calculated from high frequency data.
The proposed dynamic factor model with the leverage effect and realized
measures is applied to ten of the top stocks composing the exchange traded fund
linked with the investment return of the SP500 index and the model is shown to
have a stable advantage in portfolio performance.
arXiv link: http://arxiv.org/abs/2011.06909v2
Population synthesis for urban resident modeling using deep generative models
population distribution (types and compositions of households, incomes, social
demographics) conditioned on aspects such as dwelling typology, price,
location, and floor level. This paper presents a Machine Learning based method
to model the population distribution of upcoming developments of new buildings
within larger neighborhood/condo settings.
We use a real data set from Ecopark Township, a real estate development
project in Hanoi, Vietnam, where we study two machine learning algorithms from
the deep generative models literature to create a population of synthetic
agents: Conditional Variational Auto-Encoder (CVAE) and Conditional Generative
Adversarial Networks (CGAN). A large experimental study was performed, showing
that the CVAE outperforms both the empirical distribution, a non-trivial
baseline model, and the CGAN in estimating the population distribution of new
real estate development projects.
arXiv link: http://arxiv.org/abs/2011.06851v1
Weak Identification in Discrete Choice Models
provide insights into the determinants of identification strength in these
models. Using these insights, we propose a novel test that can consistently
detect weak identification in commonly applied discrete choice models, such as
probit, logit, and many of their extensions. Furthermore, we demonstrate that
when the null hypothesis of weak identification is rejected, Wald-based
inference can be carried out using standard formulas and critical values. A
Monte Carlo study compares our proposed testing approach against commonly
applied weak identification tests. The results simultaneously demonstrate the
good performance of our approach and the fundamental failure of using
conventional weak identification tests for linear models in the discrete choice
model context. Furthermore, we compare our approach against those commonly
applied in the literature in two empirical examples: married women labor force
participation, and US food aid and civil conflicts.
arXiv link: http://arxiv.org/abs/2011.06753v2
When Should We (Not) Interpret Linear IV Estimands as LATE?
variables (IV) estimand as a weighted average of conditional local average
treatment effects (LATEs). I focus on a situation in which additional
covariates are required for identification while the reduced-form and
first-stage regressions may be misspecified due to an implicit homogeneity
restriction on the effects of the instrument. I show that the weights on some
conditional LATEs are negative and the IV estimand is no longer interpretable
as a causal effect under a weaker version of monotonicity, i.e. when there are
compliers but no defiers at some covariate values and defiers but no compliers
elsewhere. The problem of negative weights disappears in the interacted
specification of Angrist and Imbens (1995), which avoids misspecification and
seems to be underused in applied work. I illustrate my findings in an
application to the causal effects of pretrial detention on case outcomes. In
this setting, I reject the stronger version of monotonicity, demonstrate that
the interacted instruments are sufficiently strong for consistent estimation
using the jackknife methodology, and present several estimates that are
economically and statistically different, depending on whether the interacted
instruments are used.
arXiv link: http://arxiv.org/abs/2011.06695v7
Treatment Allocation with Strategic Agents
individual characteristics: examples include targeted marketing, individualized
credit offers, and heterogeneous pricing. Treatment personalization introduces
incentives for individuals to modify their behavior to obtain a better
treatment. Strategic behavior shifts the joint distribution of covariates and
potential outcomes. The optimal rule without strategic behavior allocates
treatments only to those with a positive Conditional Average Treatment Effect.
With strategic behavior, we show that the optimal rule can involve
randomization, allocating treatments with less than 100% probability even to
those who respond positively on average to the treatment. We propose a
sequential experiment based on Bayesian Optimization that converges to the
optimal treatment rule without parametric assumptions on individual strategic
behavior.
arXiv link: http://arxiv.org/abs/2011.06528v5
Gaussian Transforms Modeling and the Estimation of Distributional Regression Functions
distribution functions and give a concave likelihood criterion for their
estimation. Optimal representations satisfy the monotonicity property of
conditional cumulative distribution functions, including in finite samples and
under general misspecification. We use these representations to provide a
unified framework for the flexible Maximum Likelihood estimation of conditional
density, cumulative distribution, and quantile functions at parametric rate.
Our formulation yields substantial simplifications and finite sample
improvements over related methods. An empirical application to the gender wage
gap in the United States illustrates our framework.
arXiv link: http://arxiv.org/abs/2011.06416v2
Mostly Harmless Machine Learning: Learning Optimal Instruments in Linear IV Models
machine learning in the standard linear instrumental variable setting. The key
idea is to use machine learning, combined with sample-splitting, to predict the
treatment variable from the instrument and any exogenous covariates, and then
use this predicted treatment and the covariates as technical instruments to
recover the coefficients in the second-stage. This allows the researcher to
extract non-linear co-variation between the treatment and instrument that may
dramatically improve estimation precision and robustness by boosting instrument
strength. Importantly, we constrain the machine-learned predictions to be
linear in the exogenous covariates, thus avoiding spurious identification
arising from non-linear relationships between the treatment and the covariates.
We show that this approach delivers consistent and asymptotically normal
estimates under weak conditions and that it may be adapted to be
semiparametrically efficient (Chamberlain, 1992). Our method preserves standard
intuitions and interpretations of linear instrumental variable methods,
including under weak identification, and provides a simple, user-friendly
upgrade to the applied economics toolbox. We illustrate our method with an
example in law and criminal justice, examining the causal effect of appellate
court reversals on district court sentencing decisions.
arXiv link: http://arxiv.org/abs/2011.06158v3
Testing and Dating Structural Changes in Copula-based Dependence Measures
dependence structure of multivariate time series. We consider a cumulative sum
(CUSUM) type test for constant copula-based dependence measures, such as
Spearman's rank correlation and quantile dependencies. The asymptotic null
distribution is not known in closed form and critical values are estimated by
an i.i.d. bootstrap procedure. We analyze size and power properties in a
simulation study under different dependence measure settings, such as skewed
and fat-tailed distributions. To date break points and to decide whether two
estimated break locations belong to the same break event, we propose a pivot
confidence interval procedure. Finally, we apply the test to the historical
data of ten large financial firms during the last financial crisis from 2002 to
mid-2013.
arXiv link: http://arxiv.org/abs/2011.05036v1
Optimal Policy Learning: From Theory to Practice
maximization, this paper wants to contribute by stressing the policymaker
perspective via a practical illustration of an optimal policy assignment
problem. More specifically, by focusing on the class of threshold-based
policies, we first set up the theoretical underpinnings of the policymaker
selection problem, to then offer a practical solution to this problem via an
empirical illustration using the popular LaLonde (1986) training program
dataset. The paper proposes an implementation protocol for the optimal solution
that is straightforward to apply and easy to program with standard statistical
software.
arXiv link: http://arxiv.org/abs/2011.04993v1
Reducing bias in difference-in-differences models using entropy balancing
difference-in-differences analyses when pre-intervention outcome trends suggest
a possible violation of the parallel trends assumption. We describe a set of
assumptions under which weighting to balance intervention and comparison groups
on pre-intervention outcome trends leads to consistent
difference-in-differences estimates even when pre-intervention outcome trends
are not parallel. Simulated results verify that entropy balancing of
pre-intervention outcomes trends can remove bias when the parallel trends
assumption is not directly satisfied, and thus may enable researchers to use
difference-in-differences designs in a wider range of observational settings
than previously acknowledged.
arXiv link: http://arxiv.org/abs/2011.04826v1
Sparse time-varying parameter VECMs with an application to modeling electricity prices
correction model (VECM) with heteroskedastic disturbances. We propose tools to
carry out dynamic model specification in an automatic fashion. This involves
using global-local priors, and postprocessing the parameters to achieve truly
sparse solutions. Depending on the respective set of coefficients, we achieve
this via minimizing auxiliary loss functions. Our two-step approach limits
overfitting and reduces parameter estimation uncertainty. We apply this
framework to modeling European electricity prices. When considering daily
electricity prices for different markets jointly, our model highlights the
importance of explicitly addressing cointegration and nonlinearities. In a
forecast exercise focusing on hourly prices for Germany, our approach yields
competitive metrics of predictive accuracy.
arXiv link: http://arxiv.org/abs/2011.04577v2
DoWhy: An End-to-End Library for Causal Inference
successful application of causal inference requires specifying assumptions
about the mechanisms underlying observed data and testing whether they are
valid, and to what extent. However, most libraries for causal inference focus
only on the task of providing powerful statistical estimators. We describe
DoWhy, an open-source Python library that is built with causal assumptions as
its first-class citizens, based on the formal framework of causal graphs to
specify and test causal assumptions. DoWhy presents an API for the four steps
common to any causal analysis---1) modeling the data using a causal graph and
structural assumptions, 2) identifying whether the desired effect is estimable
under the causal model, 3) estimating the effect using statistical estimators,
and finally 4) refuting the obtained estimate through robustness checks and
sensitivity analyses. In particular, DoWhy implements a number of robustness
checks including placebo tests, bootstrap tests, and tests for unoberved
confounding. DoWhy is an extensible library that supports interoperability with
other implementations, such as EconML and CausalML for the the estimation step.
The library is available at https://github.com/microsoft/dowhy
arXiv link: http://arxiv.org/abs/2011.04216v1
Inference under Superspreading: Determinants of SARS-CoV-2 Transmission in Germany
model for aggregated case data that accounts for superspreading and improves
statistical inference. In a Bayesian framework, the model is estimated on
German data featuring over 60,000 cases with date of symptom onset and age
group. Several factors were associated with a strong reduction in transmission:
public awareness rising, testing and tracing, information on local incidence,
and high temperature. Immunity after infection, school and restaurant closures,
stay-at-home orders, and mandatory face covering were associated with a smaller
reduction in transmission. The data suggests that public distancing rules
increased transmission in young adults. Information on local incidence was
associated with a reduction in transmission of up to 44% (95%-CI: [40%, 48%]),
which suggests a prominent role of behavioral adaptations to local risk of
infection. Testing and tracing reduced transmission by 15% (95%-CI: [9%,20%]),
where the effect was strongest among the elderly. Extrapolating weather
effects, I estimate that transmission increases by 53% (95%-CI: [43%, 64%]) in
colder seasons.
arXiv link: http://arxiv.org/abs/2011.04002v1
Do We Exploit all Information for Counterfactual Analysis? Benefits of Factor Models and Idiosyncratic Correction
revenue of a given product, is a vital task for the retail industry. To select
such a quantity, one needs first to estimate the price elasticity from the
product demand. Regression methods usually fail to recover such elasticities
due to confounding effects and price endogeneity. Therefore, randomized
experiments are typically required. However, elasticities can be highly
heterogeneous depending on the location of stores, for example. As the
randomization frequently occurs at the municipal level, standard
difference-in-differences methods may also fail. Possible solutions are based
on methodologies to measure the effects of treatments on a single (or just a
few) treated unit(s) based on counterfactuals constructed from artificial
controls. For example, for each city in the treatment group, a counterfactual
may be constructed from the untreated locations. In this paper, we apply a
novel high-dimensional statistical method to measure the effects of price
changes on daily sales from a major retailer in Brazil. The proposed
methodology combines principal components (factors) and sparse regressions,
resulting in a method called Factor-Adjusted Regularized Method for Treatment
evaluation (FarmTreat). The data consist of daily sales and prices of
five different products over more than 400 municipalities. The products
considered belong to the sweet and candies category and experiments have
been conducted over the years of 2016 and 2017. Our results confirm the
hypothesis of a high degree of heterogeneity yielding very different pricing
strategies over distinct municipalities.
arXiv link: http://arxiv.org/abs/2011.03996v3
Robust Forecasting
discrete outcomes when the forecaster is unable to discriminate among a set of
plausible forecast distributions because of partial identification or concerns
about model misspecification or structural breaks. We derive "robust" forecasts
which minimize maximum risk or regret over the set of forecast distributions.
We show that for a large class of models including semiparametric panel data
models for dynamic discrete choice, the robust forecasts depend in a natural
way on a small number of convex optimization problems which can be simplified
using duality methods. Finally, we derive "efficient robust" forecasts to deal
with the problem of first having to estimate the set of forecast distributions
and develop a suitable asymptotic efficiency theory. Forecasts obtained by
replacing nuisance parameters that characterize the set of forecast
distributions with efficient first-stage estimators can be strictly dominated
by our efficient robust forecasts.
arXiv link: http://arxiv.org/abs/2011.03153v4
Bias correction for quantile regression estimators
quantile regression estimators. While being asymptotically first-order
unbiased, these estimators can have non-negligible second-order biases. We
derive a higher-order stochastic expansion of these estimators using empirical
process theory. Based on this expansion, we derive an explicit formula for the
second-order bias and propose a feasible bias correction procedure that uses
finite-difference estimators of the bias components. The proposed bias
correction method performs well in simulations. We provide an empirical
illustration using Engel's classical data on household food expenditure.
arXiv link: http://arxiv.org/abs/2011.03073v8
A Basket Half Full: Sparse Portfolios
low-dimensional setup when the number of assets is less than the sample size;
(2) lack theoretical analysis of sparse wealth allocations and their impact on
portfolio exposure; (3) are suboptimal due to the bias induced by an
$\ell_1$-penalty. We address these shortcomings and develop an approach to
construct sparse portfolios in high dimensions. Our contribution is twofold:
from the theoretical perspective, we establish the oracle bounds of sparse
weight estimators and provide guidance regarding their distribution. From the
empirical perspective, we examine the merit of sparse portfolios during
different market scenarios. We find that in contrast to non-sparse
counterparts, our strategy is robust to recessions and can be used as a hedging
vehicle during such times.
arXiv link: http://arxiv.org/abs/2011.04278v2
Debiasing classifiers: is reality at variance with expectation?
that debiasers often fail in practice to generalize out-of-sample, and can in
fact make fairness worse rather than better. A rigorous evaluation of the
debiasing treatment effect requires extensive cross-validation beyond what is
usually done. We demonstrate that this phenomenon can be explained as a
consequence of bias-variance trade-off, with an increase in variance
necessitated by imposing a fairness constraint. Follow-up experiments validate
the theoretical prediction that the estimation variance depends strongly on the
base rates of the protected class. Considering fairness--performance trade-offs
justifies the counterintuitive notion that partial debiasing can actually yield
better results in practice on out-of-sample data.
arXiv link: http://arxiv.org/abs/2011.02407v2
Adaptive Combinatorial Allocation
are unknown but can be learned, and decisions are subject to constraints. Our
model covers two-sided and one-sided matching, even with complex constraints.
We propose an approach based on Thompson sampling. Our main result is a
prior-independent finite-sample bound on the expected regret for this
algorithm. Although the number of allocations grows exponentially in the number
of participants, the bound does not depend on this number. We illustrate the
performance of our algorithm using data on refugee resettlement in the United
States.
arXiv link: http://arxiv.org/abs/2011.02330v1
Learning from Forecast Errors: A New Approach to Forecast Combinations
propose a new approach, Factor Graphical Model (FGM), to forecast combinations
that separates idiosyncratic forecast errors from the common errors. FGM
exploits the factor structure of forecast errors and the sparsity of the
precision matrix of the idiosyncratic errors. We prove the consistency of
forecast combination weights and mean squared forecast error estimated using
FGM, supporting the results with extensive simulations. Empirical applications
to forecasting macroeconomic series shows that forecast combination using FGM
outperforms combined forecasts using equal weights and graphical models without
incorporating factor structure of forecast errors.
arXiv link: http://arxiv.org/abs/2011.02077v2
Instrumental Variable Identification of Dynamic Variance Decompositions
causal inference. However, unless such external instruments (proxies) capture
the underlying shock without measurement error, existing methods are silent on
the importance of that shock for macroeconomic fluctuations. We show that, in a
general moving average model with external instruments, variance decompositions
for the instrumented shock are interval-identified, with informative bounds.
Various additional restrictions guarantee point identification of both variance
and historical decompositions. Unlike SVAR analysis, our methods do not require
invertibility. Applied to U.S. data, they give a tight upper bound on the
importance of monetary shocks for inflation dynamics.
arXiv link: http://arxiv.org/abs/2011.01380v2
Coresets for Regressions with Panel Data
panel data settings. We first define coresets for several variants of
regression problems with panel data and then present efficient algorithms to
construct coresets of size that depend polynomially on 1/$\varepsilon$ (where
$\varepsilon$ is the error parameter) and the number of regression parameters -
independent of the number of individuals in the panel data or the time units
each individual is observed for. Our approach is based on the Feldman-Langberg
framework in which a key step is to upper bound the "total sensitivity" that is
roughly the sum of maximum influences of all individual-time pairs taken over
all possible choices of regression parameters. Empirically, we assess our
approach with synthetic and real-world datasets; the coreset sizes constructed
using our approach are much smaller than the full dataset and coresets indeed
accelerate the running time of computing the regression objective.
arXiv link: http://arxiv.org/abs/2011.00981v2
Nowcasting Growth using Google Trends Data: A Bayesian Structural Time Series Model
Google Trends for nowcasting real U.S. GDP growth in real time through the lens
of mixed frequency Bayesian Structural Time Series (BSTS) models. We augment
and enhance both model and methodology to make these better amenable to
nowcasting with large number of potential covariates. Specifically, we allow
shrinking state variances towards zero to avoid overfitting, extend the SSVS
(spike and slab variable selection) prior to the more flexible
normal-inverse-gamma prior which stays agnostic about the underlying model
size, as well as adapt the horseshoe prior to the BSTS. The application to
nowcasting GDP growth as well as a simulation study demonstrate that the
horseshoe prior BSTS improves markedly upon the SSVS and the original BSTS
model with the largest gains in dense data-generating-processes. Our
application also shows that a large dimensional set of search terms is able to
improve nowcasts early in a specific quarter before other macroeconomic data
become available. Search terms with high inclusion probability have good
economic interpretation, reflecting leading signals of economic anxiety and
wealth effects.
arXiv link: http://arxiv.org/abs/2011.00938v2
Optimal Portfolio Using Factor Graphical Lasso
covariance (precision) matrix, which has been applied for a portfolio
allocation problem. The assumption made by these models is a sparsity of the
precision matrix. However, when stock returns are driven by common factors,
such assumption does not hold. We address this limitation and develop a
framework, Factor Graphical Lasso (FGL), which integrates graphical models with
the factor structure in the context of portfolio allocation by decomposing a
precision matrix into low-rank and sparse components. Our theoretical results
and simulations show that FGL consistently estimates the portfolio weights and
risk exposure and also that FGL is robust to heavy-tailed distributions which
makes our method suitable for financial applications. FGL-based portfolios are
shown to exhibit superior performance over several prominent competitors
including equal-weighted and Index portfolios in the empirical application for
the S&P500 constituents.
arXiv link: http://arxiv.org/abs/2011.00435v5
Causal Inference for Spatial Treatments
with researchers interested in their effects on nearby units of interest. I
approach the spatial treatment setting from an experimental perspective: What
ideal experiment would we design to estimate the causal effects of spatial
treatments? This perspective motivates a comparison between individuals near
realized treatment locations and individuals near counterfactual (unrealized)
candidate locations, which differs from current empirical practice. I derive
design-based standard errors that are straightforward to compute irrespective
of spatial correlations in outcomes. Furthermore, I propose machine learning
methods to find counterfactual candidate locations using observational data
under unconfounded assignment of the treatment to locations. I apply the
proposed methods to study the causal effects of grocery stores on foot traffic
to nearby businesses during COVID-19 shelter-in-place policies, finding a
substantial positive effect at a very short distance, with no effect at larger
distances.
arXiv link: http://arxiv.org/abs/2011.00373v2
Estimating County-Level COVID-19 Exponential Growth Rates Using Generalized Random Forests
the threat of resurgent waves of COVID-19. A practical challenge in outbreak
detection is balancing accuracy vs. speed. In particular, while estimation
accuracy improves with longer fitting windows, speed degrades. This paper
presents a machine learning framework to balance this tradeoff using
generalized random forests (GRF), and applies it to detect county level
COVID-19 outbreaks. This algorithm chooses an adaptive fitting window size for
each county based on relevant features affecting the disease spread, such as
changes in social distancing policies. Experiment results show that our method
outperforms any non-adaptive window size choices in 7-day ahead COVID-19
outbreak case number predictions.
arXiv link: http://arxiv.org/abs/2011.01219v4
Nonparametric Identification of Production Function, Total Factor Productivity, and Markup from Revenue Data
that a firm's output quantity can be observed as data, but typical datasets
contain only revenue, not output quantity. We examine the nonparametric
identification of production function and markup from revenue data when a firm
faces a general nonparametri demand function under imperfect competition. Under
standard assumptions, we provide the constructive nonparametric identification
of various firm-level objects: gross production function, total factor
productivity, price markups over marginal costs, output prices, output
quantities, a demand system, and a representative consumer's utility function.
arXiv link: http://arxiv.org/abs/2011.00143v1
Machine Learning for Experimental Design: Methods for Improved Blocking
blocking/stratification, pair-wise matching, or rerandomization) can improve
the treatment-control balance on important covariates and therefore improve the
estimation of the treatment effect, particularly for small- and medium-sized
experiments. Existing guidance on how to identify these variables and implement
the restrictions is incomplete and conflicting. We identify that differences
are mainly due to the fact that what is important in the pre-treatment data may
not translate to the post-treatment data. We highlight settings where there is
sufficient data to provide clear guidance and outline improved methods to
mostly automate the process using modern machine learning (ML) techniques. We
show in simulations using real-world data, that these methods reduce both the
mean squared error of the estimate (14%-34%) and the size of the standard error
(6%-16%).
arXiv link: http://arxiv.org/abs/2010.15966v1
Identification and Estimation of Unconditional Policy Effects of an Endogenous Binary Treatment: An Unconditional MTE Approach
treatment status is binary and endogenous. We introduce a new class of marginal
treatment effects (MTEs) based on the influence function of the functional
underlying the policy target. We show that an unconditional policy effect can
be represented as a weighted average of the newly defined MTEs over the
individuals who are indifferent about their treatment status. We provide
conditions for point identification of the unconditional policy effects. When a
quantile is the functional of interest, we introduce the UNconditional
Instrumental Quantile Estimator (UNIQUE) and establish its consistency and
asymptotic distribution. In the empirical application, we estimate the effect
of changing college enrollment status, induced by higher tuition subsidy, on
the quantiles of the wage distribution.
arXiv link: http://arxiv.org/abs/2010.15864v6
Multiscale characteristics of the emerging global cryptocurrency market
of the blockchain technology behind them. Differences between cryptocurrencies
and the exchanges on which they are traded have been shown. The central part
surveys the analysis of cryptocurrency price changes on various platforms. The
statistical properties of the fluctuations in the cryptocurrency market have
been compared to the traditional markets. With the help of the latest
statistical physics methods the non-linear correlations and multiscale
characteristics of the cryptocurrency market are analyzed. In the last part the
co-evolution of the correlation structure among the 100 cryptocurrencies having
the largest capitalization is retraced. The detailed topology of cryptocurrency
network on the Binance platform from bitcoin perspective is also considered.
Finally, an interesting observation on the Covid-19 pandemic impact on the
cryptocurrency market is presented and discussed: recently we have witnessed a
"phase transition" of the cryptocurrencies from being a hedge opportunity for
the investors fleeing the traditional markets to become a part of the global
market that is substantially coupled to the traditional financial instruments
like the currencies, stocks, and commodities.
The main contribution is an extensive demonstration that structural
self-organization in the cryptocurrency markets has caused the same to attain
complexity characteristics that are nearly indistinguishable from the Forex
market at the level of individual time-series. However, the cross-correlations
between the exchange rates on cryptocurrency platforms differ from it. The
cryptocurrency market is less synchronized and the information flows more
slowly, which results in more frequent arbitrage opportunities. The methodology
used in the review allows the latter to be detected, and lead-lag relationships
to be discovered.
arXiv link: http://arxiv.org/abs/2010.15403v2
Modeling European regional FDI flows using a Bayesian spatial Poisson interaction model
effects of European regional FDI dyads. Recent regional studies primarily focus
on locational determinants, but ignore bilateral origin- and intervening
factors, as well as associated spatial dependence. This paper fills this gap by
using observations on interregional FDI flows within a spatially augmented
Poisson interaction model. We explicitly distinguish FDI activities between
three different stages of the value chain. Our results provide important
insights on drivers of regional FDI activities, both from origin and
destination perspectives. We moreover show that spatial dependence plays a key
role in both dimensions.
arXiv link: http://arxiv.org/abs/2010.14856v1
Deep Learning for Individual Heterogeneity
models to increase flexibility and capture rich heterogeneity while preserving
interpretability. Economic structure and machine learning are complements in
empirical modeling, not substitutes: DNNs provide the capacity to learn
complex, non-linear heterogeneity patterns, while the structural model ensures
the estimates remain interpretable and suitable for decision making and policy
analysis. We start with a standard parametric structural model and then enrich
its parameters into fully flexible functions of observables, which are
estimated using a particular DNN architecture whose structure reflects the
economic model. We illustrate our framework by studying demand estimation in
consumer choice. We show that by enriching a standard demand model we can
capture rich heterogeneity, and further, exploit this heterogeneity to create a
personalized pricing strategy. This type of optimization is not possible
without economic structure, but cannot be heterogeneous without machine
learning. Finally, we provide theoretical justification of each step in our
proposed methodology. We first establish non-asymptotic bounds and convergence
rates of our structural deep learning approach. Next, a novel and quite general
influence function calculation allows for feasible inference via double machine
learning in a wide variety of contexts. These results may be of interest in
many other contexts, as they generalize prior work.
arXiv link: http://arxiv.org/abs/2010.14694v3
E-Commerce Delivery Demand Modeling Framework for An Agent-Based Simulation Platform
such trend has accelerated tremendously due to the ongoing coronavirus
pandemic. Given the situation, the need for predicting e-commerce delivery
demand and evaluating relevant logistics solutions is increasing. However, the
existing simulation models for e-commerce delivery demand are still limited and
do not consider the delivery options and their attributes that shoppers face on
e-commerce order placements. We propose a novel modeling framework which
jointly predicts the average total value of e-commerce purchase, the purchase
amount per transaction, and delivery option choices. The proposed framework can
simulate the changes in e-commerce delivery demand attributable to the changes
in delivery options. We assume the model parameters based on various sources of
relevant information and conduct a demonstrative sensitivity analysis.
Furthermore, we have applied the model to the simulation for the
Auto-Innovative Prototype city. While the calibration of the model using
real-world survey data is required, the result of the analysis highlights the
applicability of the proposed framework.
arXiv link: http://arxiv.org/abs/2010.14375v1
The Efficiency Gap
semiparametric models for one-dimensional functionals due to a one-to-one
relation between corresponding loss and identification functions via
integration and differentiation. For multivariate functionals such as multiple
moments, quantiles, or the pair (Value at Risk, Expected Shortfall), this
one-to-one relation fails and not every identification function possesses an
antiderivative. The most important implication is an efficiency gap: The most
efficient Z-estimator often outperforms the most efficient M-estimator. We
theoretically establish this phenomenon for multiple quantiles at different
levels and for the pair (Value at Risk, Expected Shortfall), and illustrate the
gap numerically. Our results further give guidance for pseudo-efficient
M-estimation for semiparametric models of the Value at Risk and Expected
Shortfall.
arXiv link: http://arxiv.org/abs/2010.14146v3
Consumer Theory with Non-Parametric Taste Uncertainty and Individual Heterogeneity
the stochastic absolute risk aversion (SARA) model, and the stochastic
safety-first (SSF) model. In each model, individual-level heterogeneity is
characterized by a distribution $\pi\in\Pi$ of taste parameters, and
heterogeneity across consumers is introduced using a distribution $F$ over the
distributions in $\Pi$. Demand is non-separable and heterogeneity is
infinite-dimensional. Both models admit corner solutions. We consider two
frameworks for estimation: a Bayesian framework in which $F$ is known, and a
hyperparametric (or empirical Bayesian) framework in which $F$ is a member of a
known parametric family. Our methods are illustrated by an application to a
large U.S. panel of scanner data on alcohol consumption.
arXiv link: http://arxiv.org/abs/2010.13937v4
Modeling Long Cycles
financial history. Cycles found in the data are stochastic, often highly
persistent, and span substantial fractions of the sample size. We refer to such
cycles as "long". In this paper, we develop a novel approach to modeling
cyclical behavior specifically designed to capture long cycles. We show that
existing inferential procedures may produce misleading results in the presence
of long cycles, and propose a new econometric procedure for the inference on
the cycle length. Our procedure is asymptotically valid regardless of the cycle
length. We apply our methodology to a set of macroeconomic and financial
variables for the U.S. We find evidence of long stochastic cycles in the
standard business cycle variables, as well as in credit and house prices.
However, we rule out the presence of stochastic cycles in asset market data.
Moreover, according to our result, financial cycles as characterized by credit
and house prices tend to be twice as long as business cycles.
arXiv link: http://arxiv.org/abs/2010.13877v4
What can be learned from satisfaction assessments?
the company and its services. The received responses are crucial as they allow
companies to assess their respective performances and find ways to make needed
improvements. This study focuses on the non-systematic bias that arises when
customers assign numerical values in ordinal surveys. Using real customer
satisfaction survey data of a large retail bank, we show that the common
practice of segmenting ordinal survey responses into uneven segments limit the
value that can be extracted from the data. We then show that it is possible to
assess the magnitude of the irreducible error under simple assumptions, even in
real surveys, and place the achievable modeling goal in perspective. We finish
the study by suggesting that a thoughtful survey design, which uses either a
careful binning strategy or proper calibration, can reduce the compounding
non-systematic error even in elaborated ordinal surveys. A possible application
of the calibration method we propose is efficiently conducting targeted surveys
using active learning.
arXiv link: http://arxiv.org/abs/2010.13340v1
A Systematic Comparison of Forecasting for Gross Domestic Product in an Emergent Economy
aggregates useful information to assist economic agents and policymakers in
their decision-making process. In this context, GDP forecasting becomes a
powerful decision optimization tool in several areas. In order to contribute in
this direction, we investigated the efficiency of classical time series models,
the state-space models, and the neural network models, applied to Brazilian
gross domestic product. The models used were: a Seasonal Autoregressive
Integrated Moving Average (SARIMA) and a Holt-Winters method, which are
classical time series models; the dynamic linear model, a state-space model;
and neural network autoregression and the multilayer perceptron, artificial
neural network models. Based on statistical metrics of model comparison, the
multilayer perceptron presented the best in-sample and out-sample forecasting
performance for the analyzed period, also incorporating the growth rate
structure significantly.
arXiv link: http://arxiv.org/abs/2010.13259v2
Recurrent Conditional Heteroskedasticity
Conditional Heteroskedastic (RECH) models, to improve both in-sample analysis
and out-ofsample forecasting of the traditional conditional heteroskedastic
models. In particular, we incorporate auxiliary deterministic processes,
governed by recurrent neural networks, into the conditional variance of the
traditional conditional heteroskedastic models, e.g. GARCH-type models, to
flexibly capture the dynamics of the underlying volatility. RECH models can
detect interesting effects in financial volatility overlooked by the existing
conditional heteroskedastic models such as the GARCH, GJR and EGARCH. The new
models often have good out-of-sample forecasts while still explaining well the
stylized facts of financial volatility by retaining the well-established
features of econometric GARCH-type models. These properties are illustrated
through simulation studies and applications to thirty-one stock indices and
exchange rate data. . An user-friendly software package together with the
examples reported in the paper are available at https://github.com/vbayeslab.
arXiv link: http://arxiv.org/abs/2010.13061v2
Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy
historical data obtained via a behavior policy. However, because the contextual
bandit algorithm updates the policy based on past observations, the samples are
not independent and identically distributed (i.i.d.). This paper tackles this
problem by constructing an estimator from a martingale difference sequence
(MDS) for the dependent samples. In the data-generating process, we do not
assume the convergence of the policy, but the policy uses the same conditional
probability of choosing an action during a certain period. Then, we derive an
asymptotically normal estimator of the value of an evaluation policy. As
another advantage of our method, the batch-based approach simultaneously solves
the deficient support problem. Using benchmark and real-world datasets, we
experimentally confirm the effectiveness of the proposed method.
arXiv link: http://arxiv.org/abs/2010.13554v1
A Practical Guide of Off-Policy Evaluation for Bandit Problems
target policy from samples obtained via different policies. Recently, applying
OPE methods for bandit problems has garnered attention. For the theoretical
guarantees of an estimator of the policy value, the OPE methods require various
conditions on the target policy and policy used for generating the samples.
However, existing studies did not carefully discuss the practical situation
where such conditions hold, and the gap between them remains. This paper aims
to show new results for bridging the gap. Based on the properties of the
evaluation policy, we categorize OPE situations. Then, among practical
applications, we mainly discuss the best policy selection. For the situation,
we propose a meta-algorithm based on existing OPE estimators. We investigate
the proposed concepts using synthetic and open real-world datasets in
experiments.
arXiv link: http://arxiv.org/abs/2010.12470v1
Low-Rank Approximations of Nonseparable Panel Models
factor structure approximations. The factor structures are estimated by
matrix-completion methods to deal with the computational challenges of
principal component analysis in the presence of missing data. We show that the
resulting estimators are consistent in large panels, but suffer from
approximation and shrinkage biases. We correct these biases using matching and
difference-in-differences approaches. Numerical examples and an empirical
application to the effect of election day registration on voter turnout in the
U.S. illustrate the properties and usefulness of our methods.
arXiv link: http://arxiv.org/abs/2010.12439v2
Forecasting With Factor-Augmented Quantile Autoregressions: A Model Averaging Approach
the United Kingdom with factor-augmented quantile autoregressions under a model
averaging framework. We investigate model combinations across models using
weights that minimise the Akaike Information Criterion (AIC), the Bayesian
Information Criterion (BIC), the Quantile Regression Information Criterion
(QRIC) as well as the leave-one-out cross validation criterion. The unobserved
factors are estimated by principal components of a large panel with N
predictors over T periods under a recursive estimation scheme. We apply the
aforementioned methods to the UK GDP growth and CPI inflation rate. We find
that, on average, for GDP growth, in terms of coverage and final prediction
error, the equal weights or the weights obtained by the AIC and BIC perform
equally well but are outperformed by the QRIC and the Jackknife approach on the
majority of the quantiles of interest. In contrast, the naive QAR(1) model of
inflation outperforms all model averaging methodologies.
arXiv link: http://arxiv.org/abs/2010.12263v1
Theory-based residual neural networks: A synergy of discrete choice models and deep neural networks
or even conflicting methods in travel behavior analysis. However, the two
methods are highly complementary because data-driven methods are more
predictive but less interpretable and robust, while theory-driven methods are
more interpretable and robust but less predictive. Using their complementary
nature, this study designs a theory-based residual neural network (TB-ResNet)
framework, which synergizes discrete choice models (DCMs) and deep neural
networks (DNNs) based on their shared utility interpretation. The TB-ResNet
framework is simple, as it uses a ($\delta$, 1-$\delta$) weighting to take
advantage of DCMs' simplicity and DNNs' richness, and to prevent underfitting
from the DCMs and overfitting from the DNNs. This framework is also flexible:
three instances of TB-ResNets are designed based on multinomial logit model
(MNL-ResNets), prospect theory (PT-ResNets), and hyperbolic discounting
(HD-ResNets), which are tested on three data sets. Compared to pure DCMs, the
TB-ResNets provide greater prediction accuracy and reveal a richer set of
behavioral mechanisms owing to the utility function augmented by the DNN
component in the TB-ResNets. Compared to pure DNNs, the TB-ResNets can modestly
improve prediction and significantly improve interpretation and robustness,
because the DCM component in the TB-ResNets stabilizes the utility functions
and input gradients. Overall, this study demonstrates that it is both feasible
and desirable to synergize DCMs and DNNs by combining their utility
specifications under a TB-ResNet framework. Although some limitations remain,
this TB-ResNet framework is an important first step to create mutual benefits
between DCMs and DNNs for travel behavior modeling, with joint improvement in
prediction, interpretation, and robustness.
arXiv link: http://arxiv.org/abs/2010.11644v1
Approximation-Robust Inference in Dynamic Discrete Choice
approximation to lower the computational burden of dynamic programming.
Unfortunately, the use of approximation can impart substantial bias in
estimation and results in invalid confidence sets. We present a method for set
estimation and inference that explicitly accounts for the use of approximation
and is thus valid regardless of the approximation error. We show how one can
account for the error from approximation at low computational cost. Our
methodology allows researchers to assess the estimation error due to the use of
approximation and thus more effectively manage the trade-off between bias and
computational expedience. We provide simulation evidence to demonstrate the
practicality of our approach.
arXiv link: http://arxiv.org/abs/2010.11482v1
A Test for Kronecker Product Structure Covariance Matrix
(KPS). KPS implies a reduced rank restriction on a certain transformation of
the covariance matrix and the new procedure is an adaptation of the Kleibergen
and Paap (2006) reduced rank test. To derive the limiting distribution of the
Wald type test statistic proves challenging partly because of the singularity
of the covariance matrix estimator that appears in the weighting matrix. We
show that the test statistic has a chi square limiting null distribution with
degrees of freedom equal to the number of restrictions tested. Local asymptotic
power results are derived. Monte Carlo simulations reveal good size and power
properties of the test. Re-examining fifteen highly cited papers conducting
instrumental variable regressions, we find that KPS is not rejected in 56 out
of 118 specifications at the 5% nominal size.
arXiv link: http://arxiv.org/abs/2010.10961v4
Worst-case sensitivity
rate of increase in the expected cost of a Distributionally Robust Optimization
(DRO) model when the size of the uncertainty set vanishes. We show that
worst-case sensitivity is a Generalized Measure of Deviation and that a large
class of DRO models are essentially mean-(worst-case) sensitivity problems when
uncertainty sets are small, unifying recent results on the relationship between
DRO and regularized empirical optimization with worst-case sensitivity playing
the role of the regularizer. More generally, DRO solutions can be sensitive to
the family and size of the uncertainty set, and reflect the properties of its
worst-case sensitivity. We derive closed-form expressions of worst-case
sensitivity for well known uncertainty sets including smooth $\phi$-divergence,
total variation, "budgeted" uncertainty sets, uncertainty sets corresponding to
a convex combination of expected value and CVaR, and the Wasserstein metric.
These can be used to select the uncertainty set and its size for a given
application.
arXiv link: http://arxiv.org/abs/2010.10794v1
A Simple, Short, but Never-Empty Confidence Interval for Partially Identified Parameters
on a real-valued parameter that is partially identified through upper and lower
bounds with asymptotically normal estimators. A simple confidence interval is
proposed and is shown to have the following properties:
- It is never empty or awkwardly short, including when the sample analog of
the identified set is empty.
- It is valid for a well-defined pseudotrue parameter whether or not the
model is well-specified.
- It involves no tuning parameters and minimal computation.
Computing the interval requires concentrating out one scalar nuisance
parameter. In most cases, the practical result will be simple: To achieve 95%
coverage, report the union of a simple 90% (!) confidence interval for the
identified set and a standard 95% confidence interval for the pseudotrue
parameter.
For uncorrelated estimators -- notably if bounds are estimated from distinct
subsamples -- and conventional coverage levels, validity of this simple
procedure can be shown analytically. The case obtains in the motivating
empirical application (de Quidt, Haushofer, and Roth, 2018), in which
improvement over existing inference methods is demonstrated. More generally,
simulations suggest that the novel confidence interval has excellent length and
size control. This is partly because, in anticipation of never being empty, the
interval can be made shorter than conventional ones in relevant regions of
sample space.
arXiv link: http://arxiv.org/abs/2010.10484v3
Time-varying Forecast Combination for High-Dimensional Data
forecast combination weights. When the number of individual forecasts is small,
we study the asymptotic properties of the local linear estimator. When the
number of candidate forecasts exceeds or diverges with the sample size, we
consider penalized local linear estimation with the group SCAD penalty. We show
that the estimator exhibits the oracle property and correctly selects relevant
forecasts with probability approaching one. Simulations indicate that the
proposed estimators outperform existing combination schemes when structural
changes exist. Two empirical studies on inflation forecasting and equity
premium prediction highlight the merits of our approach relative to other
popular methods.
arXiv link: http://arxiv.org/abs/2010.10435v1
L2-Relaxation: With Applications to Forecast Combination and Portfolio Analysis
variance portfolio selection with many assets. A novel convex problem called
L2-relaxation is proposed. In contrast to standard formulations, L2-relaxation
minimizes the squared Euclidean norm of the weight vector subject to a set of
relaxed linear inequality constraints. The magnitude of relaxation, controlled
by a tuning parameter, balances the bias and variance. When the
variance-covariance (VC) matrix of the individual forecast errors or financial
assets exhibits latent group structures -- a block equicorrelation matrix plus
a VC for idiosyncratic noises, the solution to L2-relaxation delivers roughly
equal within-group weights. Optimality of the new method is established under
the asymptotic framework when the number of the cross-sectional units $N$
potentially grows much faster than the time dimension $T$. Excellent finite
sample performance of our method is demonstrated in Monte Carlo simulations.
Its wide applicability is highlighted in three real data examples concerning
empirical applications of microeconomics, macroeconomics, and finance.
arXiv link: http://arxiv.org/abs/2010.09477v2
A Decomposition Approach to Counterfactual Analysis in Game-Theoretic Models
in non-strategic settings. When the outcome of interest arises from a
game-theoretic setting where agents are better off by deviating from their
strategies after a new policy, such predictions, despite their practical
simplicity, are hard to justify. We present conditions in Bayesian games under
which the decomposition-based predictions coincide with the equilibrium-based
ones. In many games, such coincidence follows from an invariance condition for
equilibrium selection rules. To illustrate our message, we revisit an empirical
analysis in Ciliberto and Tamer (2009) on firms' entry decisions in the airline
industry.
arXiv link: http://arxiv.org/abs/2010.08868v7
Empirical likelihood and uniform convergence rates for dyadic kernel density estimation
methods for kernel density estimation (KDE) for dyadic data. We first establish
uniform convergence rates for dyadic KDE. Secondly, we propose a modified
jackknife empirical likelihood procedure for inference. The proposed test
statistic is asymptotically pivotal regardless of presence of dyadic
clustering. The results are further extended to cover the practically relevant
case of incomplete dyadic data. Simulations show that this modified jackknife
empirical likelihood-based inference procedure delivers precise coverage
probabilities even with modest sample sizes and with incomplete dyadic data.
Finally, we illustrate the method by studying airport congestion in the United
States.
arXiv link: http://arxiv.org/abs/2010.08838v5
Synchronization analysis between exchange rates on the basis of purchasing power parity using the Hilbert transform
rhythms when interacting with each other. We measure the degree of
synchronization between the U.S. dollar (USD) and euro exchange rates and
between the USD and Japanese yen exchange rates on the basis of purchasing
power parity (PPP) over time. We employ a method of synchronization analysis
using the Hilbert transform, which is common in the field of nonlinear science.
We find that the degree of synchronization is high most of the time, suggesting
the establishment of PPP. The degree of synchronization does not remain high
across periods with economic events with asymmetric effects, such as the U.S.
real estate bubble.
arXiv link: http://arxiv.org/abs/2010.08825v2
Binary Choice with Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Racial Justice
loss functions. The econometrics literature covers nonparametric binary choice
problems but does not offer computationally attractive solutions in data-rich
environments. The machine learning literature has many algorithms but is
focused mostly on loss functions that are independent of covariates. We show
that theoretically valid decisions on binary outcomes with general loss
functions can be achieved via a very simple loss-based reweighting of the
logistic regression or state-of-the-art machine learning techniques. We apply
our analysis to racial justice in pretrial detention.
arXiv link: http://arxiv.org/abs/2010.08463v5
Measures of Model Risk in Continuous-time Finance Models
markets. We separate model risk into parameter estimation risk and model
specification risk, and we propose expected shortfall type model risk measures
applied to Levy jump models and affine jump-diffusion models. We investigate
the impact of parameter estimation risk and model specification risk on the
models' ability to capture the joint dynamics of stock and option prices. We
estimate the parameters using Markov chain Monte Carlo techniques, under the
risk-neutral probability measure and the real-world probability measure
jointly. We find strong evidence supporting modeling of price jumps.
arXiv link: http://arxiv.org/abs/2010.08113v2
Estimating Sleep & Work Hours from Alternative Data by Segmented Functional Classification Analysis (SFCA)
behaviour. This paper introduces a new type of alternative data by
re-conceptualising the internet as a data-driven insights platform at global
scale. Using data from a unique internet activity and location dataset drawn
from over 1.5 trillion observations of end-user internet connections, we
construct a functional dataset covering over 1,600 cities during a 7 year
period with temporal resolution of just 15min. To predict accurate temporal
patterns of sleep and work activity from this data-set, we develop a new
technique, Segmented Functional Classification Analysis (SFCA), and compare its
performance to a wide array of linear, functional, and classification methods.
To confirm the wider applicability of SFCA, in a second application we predict
sleep and work activity using SFCA from US city-wide electricity demand
functional data. Across both problems, SFCA is shown to out-perform current
methods.
arXiv link: http://arxiv.org/abs/2010.08102v1
Heteroscedasticity test of high-frequency data with jumps and microstructure noise
constant or not during a given time span by using high-frequency data with the
presence of jumps and microstructure noise. Based on estimators of integrated
volatility and spot volatility, we propose a nonparametric way to depict the
discrepancy between local variation and global variation. We show that our
proposed test estimator converges to a standard normal distribution if the
volatility is constant, otherwise it diverges to infinity. Simulation studies
verify the theoretical results and show a good finite sample performance of the
test procedure. We also apply our test procedure to do the heteroscedasticity
test for some real high-frequency financial data. We observe that in almost
half of the days tested, the assumption of constant volatility within a day is
violated. And this is due to that the stock prices during opening and closing
periods are highly volatile and account for a relative large proportion of
intraday variation.
arXiv link: http://arxiv.org/abs/2010.07659v1
Comment: Individualized Treatment Rules Under Endogeneity
treatment rules using instrumental variables---Cui and Tchetgen Tchetgen (2020)
and Qiu et al. (2020). It also proposes identifying assumptions that are
alternative to what is used in both studies.
arXiv link: http://arxiv.org/abs/2010.07656v1
Interpretable Neural Networks for Panel Data Analysis in Economics
using advanced tools like neural networks in their empirical research. In this
paper, we propose a class of interpretable neural network models that can
achieve both high prediction accuracy and interpretability. The model can be
written as a simple function of a regularized number of interpretable features,
which are outcomes of interpretable functions encoded in the neural network.
Researchers can design different forms of interpretable functions based on the
nature of their tasks. In particular, we encode a class of interpretable
functions named persistent change filters in the neural network to study time
series cross-sectional data. We apply the model to predicting individual's
monthly employment status using high-dimensional administrative data. We
achieve an accuracy of 94.5% in the test set, which is comparable to the best
performed conventional machine learning methods. Furthermore, the
interpretability of the model allows us to understand the mechanism that
underlies the prediction: an individual's employment status is closely related
to whether she pays different types of insurances. Our work is a useful step
towards overcoming the black-box problem of neural networks, and provide a new
tool for economists to study administrative and proprietary big data.
arXiv link: http://arxiv.org/abs/2010.05311v3
Identifying causal channels of policy reforms with multiple treatments and different types of selection
treatments and different types of selection for each treatment. We disentangle
reform effects into policy effects, selection effects, and time effects under
the assumption of conditional independence, common trends, and an additional
exclusion restriction on the non-treated. Furthermore, we show the
identification of direct- and indirect policy effects after imposing additional
sequential conditional independence assumptions on mediating variables. We
illustrate the approach using the German reform of the allocation system of
vocational training for unemployed persons. The reform changed the allocation
of training from a mandatory system to a voluntary voucher system.
Simultaneously, the selection criteria for participants changed, and the reform
altered the composition of course types. We consider the course composition as
a mediator of the policy reform. We show that the empirical evidence from
previous studies reverses when considering the course composition. This has
important implications for policy conclusions.
arXiv link: http://arxiv.org/abs/2010.05221v1
Combining Observational and Experimental Data to Improve Efficiency Using Imperfect Instruments
credibly identify causal effects, but often suffer from limited scale, while
observational datasets are large, but often violate desired identification
assumptions. To improve estimation efficiency, I propose a method that
leverages imperfect instruments - pretreatment covariates that satisfy the
relevance condition but may violate the exclusion restriction. I show that
these imperfect instruments can be used to derive moment restrictions that, in
combination with the experimental data, improve estimation efficiency. I
outline estimators for implementing this strategy, and show that my methods can
reduce variance by up to 50%; therefore, only half of the experimental sample
is required to attain the same statistical precision. I apply my method to a
search listing dataset from Expedia that studies the causal effect of search
rankings on clicks, and show that the method can substantially improve the
precision.
arXiv link: http://arxiv.org/abs/2010.05117v5
Valid t-ratio Inference for IV
exceeding some threshold (e.g., 10) as a criterion for trusting t-ratio
inferences, even though this yields an anti-conservative test. We show that a
true 5 percent test instead requires an F greater than 104.7. Maintaining 10 as
a threshold requires replacing the critical value 1.96 with 3.43. We re-examine
57 AER papers and find that corrected inference causes half of the initially
presumed statistically significant results to be insignificant. We introduce a
more powerful test, the tF procedure, which provides F-dependent adjusted
t-ratio critical values.
arXiv link: http://arxiv.org/abs/2010.05058v1
Asymptotic Properties of the Maximum Likelihood Estimator in Regime-Switching Models with Time-Varying Transition Probabilities
in time-varying transition probability (TVTP) regime-switching models. This
class of models extends the constant regime transition probability in
Markov-switching models to a time-varying probability by including information
from observations. An important feature in this proof is the mixing rate of the
regime process conditional on the observations, which is time varying owing to
the time-varying transition probabilities. Consistency and asymptotic normality
follow from the almost deterministic geometrically decaying bound of the mixing
rate. The assumptions are verified in regime-switching autoregressive models
with widely-applied TVTP specifications. A simulation study examines the
finite-sample distributions of the MLE and compares the estimates of the
asymptotic variance constructed from the Hessian matrix and the outer product
of the score. The simulation results favour the latter. As an empirical
example, we compare three leading economic indicators in terms of describing
U.S. industrial production.
arXiv link: http://arxiv.org/abs/2010.04930v3
Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves
causal functions such as dose, heterogeneous, and incremental response curves.
Treatment and covariates may be discrete or continuous in general spaces. Due
to a decomposition property specific to the RKHS, our estimators have simple
closed form solutions. We prove uniform consistency with finite sample rates
via original analysis of generalized kernel ridge regression. We extend our
main results to counterfactual distributions and to causal functions identified
by front and back door criteria. We achieve state-of-the-art performance in
nonlinear simulations with many covariates, and conduct a policy evaluation of
the US Job Corps training program for disadvantaged youths.
arXiv link: http://arxiv.org/abs/2010.04855v7
When Is Parallel Trends Sensitive to Functional Form?
functional form. We provide a novel characterization: the parallel trends
assumption holds under all strictly monotonic transformations of the outcome if
and only if a stronger “parallel trends”-type condition holds for the
cumulative distribution function of untreated potential outcomes. This
condition for parallel trends to be insensitive to functional form is satisfied
if and essentially only if the population can be partitioned into a subgroup
for which treatment is effectively randomly assigned and a remaining subgroup
for which the distribution of untreated potential outcomes is stable over time.
These conditions have testable implications, and we introduce falsification
tests for the null that parallel trends is insensitive to functional form.
arXiv link: http://arxiv.org/abs/2010.04814v5
Sparse network asymptotics for logistic regression
$M$ different products. This paper considers the properties of the logistic
regression of the $N\times M$ array of i-buys-j purchase decisions,
$\left[Y_{ij}\right]_{1\leq i\leq N,1\leq j\leq M}$, onto known functions of
consumer and product attributes under asymptotic sequences where (i) both $N$
and $M$ grow large and (ii) the average number of products purchased per
consumer is finite in the limit. This latter assumption implies that the
network of purchases is sparse: only a (very) small fraction of all possible
purchases are actually made (concordant with many real-world settings). Under
sparse network asymptotics, the first and last terms in an extended
Hoeffding-type variance decomposition of the score of the logit composite
log-likelihood are of equal order. In contrast, under dense network
asymptotics, the last term is asymptotically negligible. Asymptotic normality
of the logistic regression coefficients is shown using a martingale central
limit theorem (CLT) for triangular arrays. Unlike in the dense case, the
normality result derived here also holds under degeneracy of the network
graphon. Relatedly, when there happens to be no dyadic dependence in the
dataset in hand, it specializes to recently derived results on the behavior of
logistic regression with rare events and iid data. Sparse network asymptotics
may lead to better inference in practice since they suggest variance estimators
which (i) incorporate additional sources of sampling variation and (ii) are
valid under varying degrees of dyadic dependence.
arXiv link: http://arxiv.org/abs/2010.04703v1
Identification of multi-valued treatment effects with unobserved heterogeneity
effects on continuous outcomes in endogenous and multi-valued discrete
treatment settings with unobserved heterogeneity. We employ the monotonicity
assumption for multi-valued discrete treatments and instruments, and our
identification condition has a clear economic interpretation. In addition, we
identify the local treatment effects in multi-valued treatment settings and
derive closed-form expressions of the identified treatment effects. We provide
examples to illustrate the usefulness of our result.
arXiv link: http://arxiv.org/abs/2010.04385v5
Inference with a single treated cluster
research designs with a finite number of heterogeneous clusters where only a
single cluster received treatment. This situation is commonplace in
difference-in-differences estimation but the test developed here applies more
generally. I show that the test controls size and has power under asymptotics
where the number of observations within each cluster is large but the number of
clusters is fixed. The test combines weighted, approximately Gaussian parameter
estimates with a rearrangement procedure to obtain its critical values. The
weights needed for most empirically relevant situations are tabulated in the
paper. Calculation of the critical values is computationally simple and does
not require simulation or resampling. The rearrangement test is highly robust
to situations where some clusters are much more variable than others. Examples
and an empirical application are provided.
arXiv link: http://arxiv.org/abs/2010.04076v1
Prediction intervals for Deep Neural Networks
prediction intervals for the output of neural network models. To do this, we
adapt the extremely randomized trees method originally developed for random
forests to construct ensembles of neural networks. The extra-randomness
introduced in the ensemble reduces the variance of the predictions and yields
gains in out-of-sample accuracy. An extensive Monte Carlo simulation exercise
shows the good performance of this novel method for constructing prediction
intervals in terms of coverage probability and mean square prediction error.
This approach is superior to state-of-the-art methods extant in the literature
such as the widely used MC dropout and bootstrap procedures. The out-of-sample
accuracy of the novel algorithm is further evaluated using experimental
settings already adopted in the literature.
arXiv link: http://arxiv.org/abs/2010.04044v2
Consistent Specification Test of the Quantile Autoregression
specification and no omitted latent factors for the Quantile Autoregression. If
the composite null is rejected we proceed to disentangle the cause of
rejection, i.e., dynamic misspecification or an omitted variable. We establish
the asymptotic distribution of the test statistics under fairly weak conditions
and show that factor estimation error is negligible. A Monte Carlo study shows
that the suggested tests have good finite sample properties. Finally, we
undertake an empirical illustration of modelling GDP growth and CPI inflation
in the United Kingdom, where we find evidence that factor augmented models are
correctly specified in contrast with their non-augmented counterparts when it
comes to GDP growth, while also exploring the asymmetric behaviour of the
growth and inflation distributions.
arXiv link: http://arxiv.org/abs/2010.03898v2
The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive Experiments and a Paradox Concerning Logging Policy
the conditional mean outcome and the logging policy (the probability of
choosing an action), is crucial in causal inference. This paper proposes a DR
estimator for dependent samples obtained from adaptive experiments. To obtain
an asymptotically normal semiparametric estimator from dependent samples with
non-Donsker nuisance estimators, we propose adaptive-fitting as a variant of
sample-splitting. We also report an empirical paradox that our proposed DR
estimator tends to show better performances compared to other estimators
utilizing the true logging policy. While a similar phenomenon is known for
estimators with i.i.d. samples, traditional explanations based on asymptotic
efficiency cannot elucidate our case with dependent samples. We confirm this
hypothesis through simulation studies.
arXiv link: http://arxiv.org/abs/2010.03792v5
Interpreting Unconditional Quantile Regression with Conditional Independence
distribution and corresponding unconditional quantile "effects" defined and
estimated by Firpo, Fortin, and Lemieux (2009) and Chernozhukov,
Fern\'andez-Val, and Melly (2013). With conditional independence of the policy
variable of interest, these methods estimate the policy effect for certain
types of policies, but not others. In particular, they estimate the effect of a
policy change that itself satisfies conditional independence.
arXiv link: http://arxiv.org/abs/2010.03606v2
Further results on the estimation of dynamic panel logit models with fixed effects
AR(1) model with strictly exogenous covariates and fixed effects are estimable
at the root-n rate using the Generalized Method of Moments. Honor\'e and
Weidner (2020) extended his results in various directions: they found
additional moment conditions for the logit AR(1) model and also considered
estimation of logit AR(p) models with p>1. In this note we prove a conjecture
in their paper and show that for given values of the initial condition, the
covariates and the common parameters 2^{T}-2T of their moment functions for the
logit AR(1) model are linearly independent and span the set of valid moment
functions, which is a 2^{T}-2T-dimensional linear subspace of the
2^{T}-dimensional vector space of real valued functions over the outcomes y
element of {0,1}^{T}. We also prove that when p=2 and T element of {3,4,5},
there are, respectively, 2^{T}-4(T-1) and 2^{T}-(3T-2) linearly independent
moment functions for the panel logit AR(2) models with and without covariates.
arXiv link: http://arxiv.org/abs/2010.03382v5
Comment on Gouriéroux, Monfort, Renne (2019): Identification and Estimation in Non-Fundamental Structural VARMA Models
Renne (2019): Identification and Estimation in Non-Fundamental Structural VARMA
Models" with regard to mirroring complex-valued roots with Blaschke polynomial
matrices. Moreover, the (non-) feasibility of the proposed method (if the
handling of Blaschke transformation were not prohibitive) for cross-sectional
dimensions greater than two and vector moving average (VMA) polynomial matrices
of degree greater than one is discussed.
arXiv link: http://arxiv.org/abs/2010.02711v1
A Recursive Logit Model with Choice Aversion and Its Application to Transportation Networks
aversion by imposing a penalty term that accounts for the dimension of the
choice set at each node of the transportation network. We make three
contributions. First, we show that our model overcomes the correlation problem
between routes, a common pitfall of traditional logit models, and that the
choice aversion model can be seen as an alternative to these models. Second, we
show how our model can generate violations of regularity in the path choice
probabilities. In particular, we show that removing edges in the network may
decrease the probability for existing paths. Finally, we show that under the
presence of choice aversion, adding edges to the network can make users worse
off. In other words, a type of Braess's paradox can emerge outside of
congestion and can be characterized in terms of a parameter that measures
users' degree of choice aversion. We validate these contributions by estimating
this parameter over GPS traffic data captured on a real-world transportation
network.
arXiv link: http://arxiv.org/abs/2010.02398v4
Testing homogeneity in dynamic discrete games in finite samples
choice probabilities and the state transition probabilities are homogeneous
across markets and over time. We refer to this as the "homogeneity assumption"
in dynamic discrete games. This assumption enables empirical studies to
estimate the game's structural parameters by pooling data from multiple markets
and from many time periods. In this paper, we propose a hypothesis test to
evaluate whether the homogeneity assumption holds in the data. Our hypothesis
test is the result of an approximate randomization test, implemented via a
Markov chain Monte Carlo (MCMC) algorithm. We show that our hypothesis test
becomes valid as the (user-defined) number of MCMC draws diverges, for any
fixed number of markets, time periods, and players. We apply our test to the
empirical study of the U.S.\ Portland cement industry in Ryan (2012).
arXiv link: http://arxiv.org/abs/2010.02297v3
Deep Distributional Time Series Models and the Probabilistic Forecasting of Intraday Electricity Prices
provide accurate point forecasts for series that exhibit complex serial
dependence. We propose two approaches to constructing deep time series
probabilistic models based on a variant of RNN called an echo state network
(ESN). The first is where the output layer of the ESN has stochastic
disturbances and a shrinkage prior for additional regularization. The second
approach employs the implicit copula of an ESN with Gaussian disturbances,
which is a deep copula process on the feature space. Combining this copula with
a non-parametrically estimated marginal distribution produces a deep
distributional time series model. The resulting probabilistic forecasts are
deep functions of the feature vector and also marginally calibrated. In both
approaches, Bayesian Markov chain Monte Carlo methods are used to estimate the
models and compute forecasts. The proposed models are suitable for the complex
task of forecasting intraday electricity prices. Using data from the Australian
National Electricity Market, we show that our deep time series models provide
accurate short term probabilistic price forecasts, with the copula model
dominating. Moreover, the models provide a flexible framework for incorporating
probabilistic forecasts of electricity demand as additional features, which
increases upper tail forecast accuracy from the copula model significantly.
arXiv link: http://arxiv.org/abs/2010.01844v2
Robust and Efficient Estimation of Potential Outcome Means under Random Assignment
vector of potential outcome means using regression adjustment (RA) when there
are more than two treatment levels. We show that linear RA which estimates
separate slopes for each assignment level is never worse, asymptotically, than
using the subsample averages. We also show that separate RA improves over
pooled RA except in the obvious case where slope parameters in the linear
projections are identical across the different assignment levels. We further
characterize the class of nonlinear RA methods that preserve consistency of the
potential outcome means despite arbitrary misspecification of the conditional
mean functions. Finally, we apply these regression adjustment techniques to
efficiently estimate the lower bound mean willingness to pay for an oil spill
prevention program in California.
arXiv link: http://arxiv.org/abs/2010.01800v2
A Class of Time-Varying Vector Moving Average Models: Nonparametric Kernel Estimation and Application
studies, e.g., modelling policy transmission mechanism and measuring
connectedness between economic agents. To better capture the dynamics, this
paper proposes a wide class of multivariate dynamic models with time-varying
coefficients, which have a general time-varying vector moving average (VMA)
representation, and nest, for instance, time-varying vector autoregression
(VAR), time-varying vector autoregression moving-average (VARMA), and so forth
as special cases. The paper then develops a unified estimation method for the
unknown quantities before an asymptotic theory for the proposed estimators is
established. In the empirical study, we investigate the transmission mechanism
of monetary policy using U.S. data, and uncover a fall in the volatilities of
exogenous shocks. In addition, we find that (i) monetary policy shocks have
less influence on inflation before and during the so-called Great Moderation,
(ii) inflation is more anchored recently, and (iii) the long-run level of
inflation is below, but quite close to the Federal Reserve's target of two
percent after the beginning of the Great Moderation period.
arXiv link: http://arxiv.org/abs/2010.01492v1
On Statistical Discrimination as a Failure of Social Learning: A Multi-Armed Bandit Approach
bandit model. Myopic firms face workers arriving with heterogeneous observable
characteristics. The association between the worker's skill and characteristics
is unknown ex ante; thus, firms need to learn it. Laissez-faire causes
perpetual underestimation: minority workers are rarely hired, and therefore,
the underestimation tends to persist. Even a marginal imbalance in the
population ratio frequently results in perpetual underestimation. We propose
two policy solutions: a novel subsidy rule (the hybrid mechanism) and the
Rooney Rule. Our results indicate that temporary affirmative actions
effectively alleviate discrimination stemming from insufficient data.
arXiv link: http://arxiv.org/abs/2010.01079v6
Local Regression Distribution Estimators
distribution estimators, which include a class of boundary adaptive density
estimators as a prime example. First, we establish a pointwise Gaussian large
sample distributional approximation in a unified way, allowing for both
boundary and interior evaluation points simultaneously. Using this result, we
study the asymptotic efficiency of the estimators, and show that a carefully
crafted minimum distance implementation based on "redundant" regressors can
lead to efficiency gains. Second, we establish uniform linearizations and
strong approximations for the estimators, and employ these results to construct
valid confidence bands. Third, we develop extensions to weighted distributions
with estimated weights and to local $L^{2}$ least squares estimation. Finally,
we illustrate our methods with two applications in program evaluation:
counterfactual density testing, and IV specification and heterogeneity density
analysis. Companion software packages in Stata and R are available.
arXiv link: http://arxiv.org/abs/2009.14367v2
Online Action Learning in High Dimensions: A Conservative Perspective
practical applications. Examples include dynamic pricing and assortment, design
of auctions and incentives and permeate a large number of sequential treatment
experiments. In this paper, we extend one of the most popular learning
solutions, the $\epsilon_t$-greedy heuristics, to high-dimensional contexts
considering a conservative directive. We do this by allocating part of the time
the original rule uses to adopt completely new actions to a more focused search
in a restrictive set of promising actions. The resulting rule might be useful
for practical applications that still values surprises, although at a
decreasing rate, while also has restrictions on the adoption of unusual
actions. With high probability, we find reasonable bounds for the cumulative
regret of a conservative high-dimensional decaying $\epsilon_t$-greedy rule.
Also, we provide a lower bound for the cardinality of the set of viable actions
that implies in an improved regret bound for the conservative version when
compared to its non-conservative counterpart. Additionally, we show that
end-users have sufficient flexibility when establishing how much safety they
want, since it can be tuned without impacting theoretical properties. We
illustrate our proposal both in a simulation exercise and using a real dataset.
arXiv link: http://arxiv.org/abs/2009.13961v4
A Computational Approach to Identification of Treatment Effects for Policy Evaluation
treatment parameters are relevant to policies in question. This is especially
challenging under unobserved heterogeneity, as is well featured in the
definition of the local average treatment effect (LATE). Being intrinsically
local, the LATE is known to lack external validity in counterfactual
environments. This paper investigates the possibility of extrapolating local
treatment effects to different counterfactual settings when instrumental
variables are only binary. We propose a novel framework to systematically
calculate sharp nonparametric bounds on various policy-relevant treatment
parameters that are defined as weighted averages of the marginal treatment
effect (MTE). Our framework is flexible enough to fully incorporate statistical
independence (rather than mean independence) of instruments and a large menu of
identifying assumptions beyond the shape restrictions on the MTE that have been
considered in prior studies. We apply our method to understand the effects of
medical insurance policies on the use of medical services.
arXiv link: http://arxiv.org/abs/2009.13861v4
Lockdown effects in US states: an artificial counterfactual approach
lockdowns on the short-run evolution of the number of cases and deaths in some
US states. To do so, we explore the different timing in which US states adopted
lockdown policies, and divide them among treated and control groups. For each
treated state, we construct an artificial counterfactual. On average, and in
the very short-run, the counterfactual accumulated number of cases would be two
times larger if lockdown policies were not implemented.
arXiv link: http://arxiv.org/abs/2009.13484v2
Difference-in-Differences for Ordinal Outcomes: Application to the Effect of Mass Shootings on Attitudes toward Gun Control
studies to estimate the causal effect of a treatment when repeated observations
over time are available. Yet, almost all existing methods assume linearity in
the potential outcome (parallel trends assumption) and target the additive
effect. In social science research, however, many outcomes of interest are
measured on an ordinal scale. This makes the linearity assumption inappropriate
because the difference between two ordinal potential outcomes is not well
defined. In this paper, I propose a method to draw causal inferences for
ordinal outcomes under the DID design. Unlike existing methods, the proposed
method utilizes the latent variable framework to handle the non-numeric nature
of the outcome, enabling identification and estimation of causal effects based
on the assumption on the quantile of the latent continuous variable. The paper
also proposes an equivalence-based test to assess the plausibility of the key
identification assumption when additional pre-treatment periods are available.
The proposed method is applied to a study estimating the causal effect of mass
shootings on the public's support for gun control. I find little evidence for a
uniform shift toward pro-gun control policies as found in the previous study,
but find that the effect is concentrated on left-leaning respondents who
experienced the shooting for the first time in more than a decade.
arXiv link: http://arxiv.org/abs/2009.13404v1
Learning Classifiers under Delayed Feedback with a Time Window Assumption
learning). For example, in the conversion prediction in online ads, we
initially receive negative samples that clicked the ads but did not buy an
item; subsequently, some samples among them buy an item then change to
positive. In the setting of DF learning, we observe samples over time, then
learn a classifier at some point. We initially receive negative samples;
subsequently, some samples among them change to positive. This problem is
conceivable in various real-world applications such as online advertisements,
where the user action takes place long after the first click. Owing to the
delayed feedback, naive classification of the positive and negative samples
returns a biased classifier. One solution is to use samples that have been
observed for more than a certain time window assuming these samples are
correctly labeled. However, existing studies reported that simply using a
subset of all samples based on the time window assumption does not perform
well, and that using all samples along with the time window assumption improves
empirical performance. We extend these existing studies and propose a method
with the unbiased and convex empirical risk that is constructed from all
samples under the time window assumption. To demonstrate the soundness of the
proposed method, we provide experimental results on a synthetic and open
dataset that is the real traffic log datasets in online advertising.
arXiv link: http://arxiv.org/abs/2009.13092v2
Nonclassical Measurement Error in the Outcome Variable
nonclassical measurement error in the outcome variable. We show equivalence of
this model to a generalized regression model. Our main identifying assumptions
are a special regressor type restriction and monotonicity in the nonlinear
relationship between the observed and unobserved true outcome. Nonparametric
identification is then obtained under a normalization of the unknown link
function, which is a natural extension of the classical measurement error case.
We propose a novel sieve rank estimator for the regression function and
establish its rate of convergence.
In Monte Carlo simulations, we find that our estimator corrects for biases
induced by nonclassical measurement error and provides numerically stable
results. We apply our method to analyze belief formation of stock market
expectations with survey data from the German Socio-Economic Panel (SOEP) and
find evidence for nonclassical measurement error in subjective belief data.
arXiv link: http://arxiv.org/abs/2009.12665v2
A step-by-step guide to design, implement, and analyze a discrete choice experiment
environmental valuation, and other disciplines. However, there is a lack of
resources disclosing the whole procedure of carrying out a DCE. This document
aims to assist anyone wishing to use the power of DCEs to understand people's
behavior by providing a comprehensive guide to the procedure. This guide
contains all the code needed to design, implement, and analyze a DCE using only
free software.
arXiv link: http://arxiv.org/abs/2009.11235v1
Recent Developments on Factor Models and its Applications in Econometric Learning
model and its application on statistical learnings. We focus on the perspective
of the low-rank structure of factor models, and particularly draws attentions
to estimating the model from the low-rank recovery point of view. The survey
mainly consists of three parts: the first part is a review on new factor
estimations based on modern techniques on recovering low-rank structures of
high-dimensional models. The second part discusses statistical inferences of
several factor-augmented models and applications in econometric learning
models. The final part summarizes new developments dealing with unbalanced
panels from the matrix completion perspective.
arXiv link: http://arxiv.org/abs/2009.10103v1
On the Existence of Conditional Maximum Likelihood Estimates of the Binary Logit Model with Fixed Effects
show that there exists a one-to-one mapping between existence and uniqueness of
conditional maximum likelihood estimates of the binary logit model with fixed
effects and the configuration of data points. Our results extend those in
Albert and Anderson (1984) for the cross-sectional case and can be used to
build a simple algorithm that detects spurious estimates in finite samples. As
an illustration, we exhibit an artificial dataset for which the STATA's command
clogit returns spurious estimates.
arXiv link: http://arxiv.org/abs/2009.09998v3
Spillovers of Program Benefits with Missing Network Links
frequently neglected in empirical studies. This paper addresses this issue when
investigating the spillovers of program benefits in the presence of network
interactions. Our method is flexible enough to account for non-i.i.d. missing
links. It relies on two network measures that can be easily constructed based
on the incoming and outgoing links of the same observed network. The treatment
and spillover effects can be point identified and consistently estimated if
network degrees are bounded for all units. We also demonstrate the bias
reduction property of our method if network degrees of some units are
unbounded. Monte Carlo experiments and a naturalistic simulation on real-world
network data are implemented to verify the finite-sample performance of our
method. We also re-examine the spillover effects of home computer use on
children's self-empowered learning.
arXiv link: http://arxiv.org/abs/2009.09614v3
Optimal probabilistic forecasts: When do they work?
probabilistic forecasts, with different scoring rules rewarding distinct
aspects of forecast performance. Herein, we re-investigate the practice of
using proper scoring rules to produce probabilistic forecasts that are
`optimal' according to a given score, and assess when their out-of-sample
accuracy is superior to alternative forecasts, according to that score.
Particular attention is paid to relative predictive performance under
misspecification of the predictive model. Using numerical illustrations, we
document several novel findings within this paradigm that highlight the
important interplay between the true data generating process, the assumed
predictive model and the scoring rule. Notably, we show that only when a
predictive model is sufficiently compatible with the true process to allow a
particular score criterion to reward what it is designed to reward, will this
approach to forecasting reap benefits. Subject to this compatibility however,
the superiority of the optimal forecast will be greater, the greater is the
degree of misspecification. We explore these issues under a range of different
scenarios, and using both artificially simulated and empirical data.
arXiv link: http://arxiv.org/abs/2009.09592v1
Inference for Large-Scale Linear Systems with Known Coefficients
non-negative solution to a possibly under-determined system of linear equations
with known coefficients. This hypothesis testing problem arises naturally in a
number of settings, including random coefficient, treatment effect, and
discrete choice models, as well as a class of linear programming problems. As a
first contribution, we obtain a novel geometric characterization of the null
hypothesis in terms of identified parameters satisfying an infinite set of
inequality restrictions. Using this characterization, we devise a test that
requires solving only linear programs for its implementation, and thus remains
computationally feasible in the high-dimensional applications that motivate our
analysis. The asymptotic size of the proposed test is shown to equal at most
the nominal level uniformly over a large class of distributions that permits
the number of linear equations to grow with the sample size.
arXiv link: http://arxiv.org/abs/2009.08568v2
Semiparametric Testing with Highly Persistent Predictors
problem with a highly persistent predictor, where the joint distribution of the
innovations is regarded an infinite-dimensional nuisance parameter. Using a
structural representation of the limit experiment and exploiting invariance
relationships therein, we construct invariant point-optimal tests for the
regression coefficient of interest. This approach naturally leads to a family
of feasible tests based on the component-wise ranks of the innovations that can
gain considerable power relative to existing tests under non-Gaussian
innovation distributions, while behaving equivalently under Gaussianity. When
an i.i.d. assumption on the innovations is appropriate for the data at hand,
our tests exploit the efficiency gains possible. Moreover, we show by
simulation that our test remains well behaved under some forms of conditional
heteroskedasticity.
arXiv link: http://arxiv.org/abs/2009.08291v1
Fixed Effects Binary Choice Models with Three or More Periods
$T$ and regressors without a large support. If the time-varying unobserved
terms are i.i.d. with known distribution $F$, chamberlain2010 shows that
the common slope parameter is point identified if and only if $F$ is logistic.
However, he only considers in his proof $T=2$. We show that the result does not
generalize to $T\geq 3$: the common slope parameter can be identified when $F$
belongs to a family including the logit distribution. Identification is based
on a conditional moment restriction. Under restrictions on the covariates,
these moment conditions lead to point identification of relative effects. If
$T=3$ and mild conditions hold, GMM estimators based on these conditional
moment restrictions reach the semiparametric efficiency bound. Finally, we
illustrate our method by revisiting Brender and Drazen (2008).
arXiv link: http://arxiv.org/abs/2009.08108v4
Identification and Estimation of A Rational Inattention Discrete Choice Model with Bayesian Persuasion
rational inattention model with Bayesian persuasion. The identification
requires the observation of a cross-section of market-level outcomes. The
empirical content of the model can be characterized by three moment conditions.
A two-step estimation procedure is proposed to avoid computation complexity in
the structural model. In the empirical application, I study the persuasion
effect of Fox News in the 2000 presidential election. Welfare analysis shows
that persuasion will not influence voters with high school education but will
generate higher dispersion in the welfare of voters with a partial college
education and decrease the dispersion in the welfare of voters with a bachelors
degree.
arXiv link: http://arxiv.org/abs/2009.08045v1
Manipulation-Robust Regression Discontinuity Designs
discontinuity designs using a potential outcome framework for the manipulation
of the running variable. Using this framework, we replace the existing
identification statement with two restrictions on manipulation. Our framework
highlights the critical role of the continuous density of the running variable
in identification. In particular, we establish the low-level auxiliary
assumption of the diagnostic density test under which the design may detect
manipulation against identification and hence is manipulation-robust.
arXiv link: http://arxiv.org/abs/2009.07551v7
Encompassing Tests for Value at Risk and Expected Shortfall Multi-Step Forecasts based on Inference on the Boundary
jointly with the Value at Risk (VaR) based on flexible link (or combination)
functions. Our setup allows testing encompassing for convex forecast
combinations and for link functions which preclude crossings of the combined
VaR and ES forecasts. As the tests based on these link functions involve
parameters which are on the boundary of the parameter space under the null
hypothesis, we derive and base our tests on nonstandard asymptotic theory on
the boundary. Our simulation study shows that the encompassing tests based on
our new link functions outperform tests based on unrestricted linear link
functions for one-step and multi-step forecasts. We further illustrate the
potential of the proposed tests in a real data analysis for forecasting VaR and
ES of the S&P 500 index.
arXiv link: http://arxiv.org/abs/2009.07341v1
The Frisch--Waugh--Lovell Theorem for Standard Errors
from the full and partial regressions. I further show the equivalence between
various standard errors. Applying the new result to stratified experiments
reveals the discrepancy between model-based and design-based standard errors.
arXiv link: http://arxiv.org/abs/2009.06621v1
Spatial Differencing for Sample Selection Models with Unobserved Heterogeneity
spatial differencing in sample selection models with unobserved heterogeneity.
We show that under the assumption of smooth changes across space of the
unobserved sub-location specific heterogeneities and inverse Mills ratio, key
parameters of a sample selection model are identified. The smoothness of the
sub-location specific heterogeneities implies a correlation in the outcomes. We
assume that the correlation is restricted within a location or cluster and
derive asymptotic results showing that as the number of independent clusters
increases, the estimators are consistent and asymptotically normal. We also
propose a formula for standard error estimation. A Monte-Carlo experiment
illustrates the small sample properties of our estimator. The application of
our procedure to estimate the determinants of the municipality tax rate in
Finland shows the importance of accounting for unobserved heterogeneity.
arXiv link: http://arxiv.org/abs/2009.06570v1
Vector copulas
distributions with given multivariate marginals, based on the theory of measure
transportation, and establishes a vector version of Sklar's theorem. The latter
provides a theoretical justification for the use of vector copulas to
characterize nonlinear or rank dependence between a finite number of random
vectors (robust to within vector dependence), and to construct multivariate
distributions with any given non overlapping multivariate marginals. We
construct Elliptical and Kendall families of vector copulas, derive their
densities, and present algorithms to generate data from them. The use of vector
copulas is illustrated with a stylized analysis of international financial
contagion.
arXiv link: http://arxiv.org/abs/2009.06558v2
Robust discrete choice models with t-distributed kernel errors
and misreporting of the response variable and from choice behaviour that is
inconsistent with modelling assumptions (e.g. random utility maximisation). In
the presence of outliers, standard discrete choice models produce biased
estimates and suffer from compromised predictive accuracy. Robust statistical
models are less sensitive to outliers than standard non-robust models. This
paper analyses two robust alternatives to the multinomial probit (MNP) model.
The two models are robit models whose kernel error distributions are
heavy-tailed t-distributions to moderate the influence of outliers. The first
model is the multinomial robit (MNR) model, in which a generic degrees of
freedom parameter controls the heavy-tailedness of the kernel error
distribution. The second model, the generalised multinomial robit (Gen-MNR)
model, is more flexible than MNR, as it allows for distinct heavy-tailedness in
each dimension of the kernel error distribution. For both models, we derive
Gibbs samplers for posterior inference. In a simulation study, we illustrate
the excellent finite sample properties of the proposed Bayes estimators and
show that MNR and Gen-MNR produce more accurate estimates if the choice data
contain outliers through the lens of the non-robust MNP model. In a case study
on transport mode choice behaviour, MNR and Gen-MNR outperform MNP by
substantial margins in terms of in-sample fit and out-of-sample predictive
accuracy. The case study also highlights differences in elasticity estimates
across models.
arXiv link: http://arxiv.org/abs/2009.06383v3
Bayesian modelling of time-varying conditional heteroscedasticity
financial datasets. The classical models such as ARCH-GARCH with time-invariant
coefficients are often inadequate to describe frequent changes over time due to
market variability. However we can achieve significantly better insight by
considering the time-varying analogues of these models. In this paper, we
propose a Bayesian approach to the estimation of such models and develop
computationally efficient MCMC algorithm based on Hamiltonian Monte Carlo (HMC)
sampling. We also established posterior contraction rates with increasing
sample size in terms of the average Hellinger metric. The performance of our
method is compared with frequentist estimates and estimates from the time
constant analogues. To conclude the paper we obtain time-varying parameter
estimates for some popular Forex (currency conversion rate) and stock market
datasets.
arXiv link: http://arxiv.org/abs/2009.06007v2
Regularized Solutions to Linear Rational Expectations Models
linear rational expectations models. The algorithm allows for regularization
cross-sectionally as well as across frequencies. A variety of numerical
examples illustrate the advantage of regularization.
arXiv link: http://arxiv.org/abs/2009.05875v3
Inferring hidden potentials in analytical regions: uncovering crime suspect communities in Medellín
size of hidden populations at analytical region using reported statistics. To
do so, we propose a specification taking into account one-sided error
components and spatial effects within a panel data structure. Our simulation
exercises suggest good finite sample performance. We analyze rates of crime
suspects living per neighborhood in Medell\'in (Colombia) associated with four
crime activities. Our proposal seems to identify hot spots or "crime
communities", potential neighborhoods where under-reporting is more severe, and
also drivers of crime schools. Statistical evidence suggests a high level of
interaction between homicides and drug dealing in one hand, and motorcycle and
car thefts on the other hand.
arXiv link: http://arxiv.org/abs/2009.05360v1
Inference for high-dimensional exchangeable arrays
exchangeable arrays where the dimensions may be much larger than the sample
sizes. For both exchangeable arrays, we first derive high-dimensional central
limit theorems over the rectangles and subsequently develop novel multiplier
bootstraps with theoretical guarantees. These theoretical results rely on new
technical tools such as Hoeffding-type decomposition and maximal inequalities
for the degenerate components in the Hoeffiding-type decomposition for the
exchangeable arrays. We exhibit applications of our methods to uniform
confidence bands for density estimation under joint exchangeability and penalty
choice for $\ell_1$-penalized regression under separate exchangeability.
Extensive simulations demonstrate precise uniform coverage rates. We illustrate
by constructing uniform confidence bands for international trade network
densities.
arXiv link: http://arxiv.org/abs/2009.05150v4
Capital Flows and the Stabilizing Role of Macroprudential Policies in CESEE
measures to respond to cross-border risks arising from capital flows, this
paper tries to quantify to what extent macroprudential policies (MPPs) have
been able to stabilize capital flows in Central, Eastern and Southeastern
Europe (CESEE) -- a region that experienced a substantial boom-bust cycle in
capital flows amid the global financial crisis and where policymakers had been
quite active in adopting MPPs already before that crisis. To study the dynamic
responses of capital flows to MPP shocks, we propose a novel regime-switching
factor-augmented vector autoregressive (FAVAR) model. It allows to capture
potential structural breaks in the policy regime and to control -- besides
domestic macroeconomic quantities -- for the impact of global factors such as
the global financial cycle. Feeding into this model a novel intensity-adjusted
macroprudential policy index, we find that tighter MPPs may be effective in
containing domestic private sector credit growth and the volumes of gross
capital inflows in a majority of the countries analyzed. However, they do not
seem to generally shield CESEE countries from capital flow volatility.
arXiv link: http://arxiv.org/abs/2009.06391v1
A Framework for Crop Price Forecasting in Emerging Economies by Analyzing the Quality of Time-series Data
the supply chain planners and government bodies to take appropriate actions by
estimating market factors such as demand and supply. In emerging economies such
as India, the crop prices at marketplaces are manually entered every day, which
can be prone to human-induced errors like the entry of incorrect data or entry
of no data for many days. In addition to such human prone errors, the
fluctuations in the prices itself make the creation of stable and robust
forecasting solution a challenging task. Considering such complexities in crop
price forecasting, in this paper, we present techniques to build robust crop
price prediction models considering various features such as (i) historical
price and market arrival quantity of crops, (ii) historical weather data that
influence crop production and transportation, (iii) data quality-related
features obtained by performing statistical analysis. We additionally propose a
framework for context-based model selection and retraining considering factors
such as model stability, data quality metrics, and trend analysis of crop
prices. To show the efficacy of the proposed approach, we show experimental
results on two crops - Tomato and Maize for 14 marketplaces in India and
demonstrate that the proposed approach not only improves accuracy metrics
significantly when compared against the standard forecasting techniques but
also provides robust models.
arXiv link: http://arxiv.org/abs/2009.04171v1
Exact Computation of Maximum Rank Correlation Estimator
the maximum rank correlation estimator using the mixed integer programming
(MIP) approach. We construct a new constrained optimization problem by
transforming all indicator functions into binary parameters to be estimated and
show that it is equivalent to the original problem. We also consider an
application of the best subset rank prediction and show that the original
optimization problem can be reformulated as MIP. We derive the non-asymptotic
bound for the tail probability of the predictive performance measure. We
investigate the performance of the MIP algorithm by an empirical example and
Monte Carlo simulations.
arXiv link: http://arxiv.org/abs/2009.03844v2
Local Composite Quantile Regression for Regression Discontinuity
inference in regression discontinuity (RD) designs. Kai et al. (2010) study the
efficiency property of LCQR, while we show that its nice boundary performance
translates to accurate estimation of treatment effects in RD under a variety of
data generating processes. Moreover, we propose a bias-corrected and standard
error-adjusted t-test for inference, which leads to confidence intervals with
good coverage probabilities. A bandwidth selector is also discussed. For
illustration, we conduct a simulation study and revisit a classic example from
Lee (2008). A companion R package rdcqr is developed.
arXiv link: http://arxiv.org/abs/2009.03716v3
Counterfactual and Welfare Analysis with an Approximate Model
approximate models. Our key assumption is that model approximation error is the
same magnitude at new choices as the observed data. Applying the framework to
quasilinear utility, we obtain bounds on quantities at new prices using an
approximate law of demand. We then bound utility differences between bundles
and welfare differences between prices. All bounds are computable as linear
programs. We provide detailed analytical results describing how the data map to
the bounds including shape restrictions that provide a foundation for plug-in
estimation. An application to gasoline demand illustrates the methodology.
arXiv link: http://arxiv.org/abs/2009.03379v1
Dimension Reduction for High Dimensional Vector Autoregressive Models
model into two components, the first one being generated by a small-scale VAR
and the second one being a white noise sequence. Hence, a reduced number of
common components generates the entire dynamics of the large system through a
VAR structure. This modelling, which we label as the dimension-reducible VAR,
extends the common feature approach to high dimensional systems, and it differs
from the dynamic factor model in which the idiosyncratic component can also
embed a dynamic pattern. We show the conditions under which this decomposition
exists. We provide statistical tools to detect its presence in the data and to
estimate the parameters of the underlying small-scale VAR model. Based on our
methodology, we propose a novel approach to identify the shock that is
responsible for most of the common variability at the business cycle
frequencies. We evaluate the practical value of the proposed methods by
simulations as well as by an empirical application to a large set of US
economic variables.
arXiv link: http://arxiv.org/abs/2009.03361v3
Doubly Robust Semiparametric Difference-in-Differences Estimators with High-Dimensional Data
difference-in-difference estimator for estimating heterogeneous treatment
effects with high-dimensional data. Our new estimator is robust to model
miss-specifications and allows for, but does not require, many more regressors
than observations. The first stage allows a general set of machine learning
methods to be used to estimate the propensity score. In the second stage, we
derive the rates of convergence for both the parametric parameter and the
unknown function under a partially linear specification for the outcome
equation. We also provide bias correction procedures to allow for valid
inference for the heterogeneous treatment effects. We evaluate the finite
sample performance with extensive simulation studies. Additionally, a real data
analysis on the effect of Fair Minimum Wage Act on the unemployment rate is
performed as an illustration of our method. An R package for implementing the
proposed method is available on Github.
arXiv link: http://arxiv.org/abs/2009.03151v1
Two-Stage Maximum Score Estimator
that is generally applicable to models that satisfy a monotonicity condition in
one or several parametric indexes. We call the estimator two-stage maximum
score (TSMS) estimator since our estimator involves a first-stage nonparametric
regression when applied to the binary choice model of Manski (1975, 1985). We
characterize the asymptotic distribution of the TSMS estimator, which features
phase transitions depending on the dimension and thus the convergence rate of
the first-stage estimation. Effectively, the first-stage nonparametric
estimator serves as an imperfect smoothing function on a non-smooth criterion
function, leading to the pivotality of the first-stage estimation error with
respect to the second-stage convergence rate and asymptotic distribution
arXiv link: http://arxiv.org/abs/2009.02854v4
Decomposing Identification Gains and Evaluating Instrument Identification Power for Partially Identified Average Treatment Effects
for average treatment effect (ATE) in partially identified models. We decompose
the ATE identification gains into components of contributions driven by IV
relevancy, IV strength, direction and degree of treatment endogeneity, and
matching via exogenous covariates. Our decomposition is demonstrated with
graphical illustrations, simulation studies and an empirical example of
childbearing and women's labour supply. Our analysis offers insights for
understanding the complex role of IVs in ATE identification and for selecting
IVs in practical policy designs. Simulations also suggest potential uses of our
analysis for detecting irrelevant instruments.
arXiv link: http://arxiv.org/abs/2009.02642v3
COVID-19: Tail Risk and Predictive Regressions
of the COVID-19 pandemic on financial markets in different countries across the
World. It provides the results of robust estimation and inference on predictive
regressions for returns on major stock indexes in 23 countries in North and
South America, Europe, and Asia incorporating the time series of reported
infections and deaths from COVID-19. We also present a detailed study of
persistence, heavy-tailedness and tail risk properties of the time series of
the COVID-19 infections and death rates that motivate the necessity in
applications of robust inference methods in the analysis. Econometrically
justified analysis is based on heteroskedasticity and autocorrelation
consistent (HAC) inference methods, recently developed robust $t$-statistic
inference approaches and robust tail index estimation.
arXiv link: http://arxiv.org/abs/2009.02486v3
Heterogeneous Coefficients, Control Variables, and Identification of Multiple Treatment Effects
models with multiple treatments. We consider a heterogeneous coefficients model
where the outcome is a linear combination of dummy treatment variables, with
each variable representing a different kind of treatment. We use control
variables to give necessary and sufficient conditions for identification of
average treatment effects. With mutually exclusive treatments we find that,
provided the heterogeneous coefficients are mean independent from treatments
given the controls, a simple identification condition is that the generalized
propensity scores (Imbens, 2000) be bounded away from zero and that their sum
be bounded away from one, with probability one. Our analysis extends to
distributional and quantile treatment effects, as well as corresponding
treatment effects on the treated. These results generalize the classical
identification result of Rosenbaum and Rubin (1983) for binary treatments.
arXiv link: http://arxiv.org/abs/2009.02314v3
Cointegrating Polynomial Regressions with Power Law Trends: Environmental Kuznets Curve or Omitted Time Effects?
between environmental pollution and economic growth. Current analyses
frequently employ models which restrict nonlinearities in the data to be
explained by the economic growth variable only. We propose a Generalized
Cointegrating Polynomial Regression (GCPR) to allow for an alternative source
of nonlinearity. More specifically, the GCPR is a seemingly unrelated
regression with (1) integer powers of deterministic and stochastic trends for
the individual units, and (2) a common flexible global trend. We estimate this
GCPR by nonlinear least squares and derive its asymptotic distribution.
Endogeneity of the regressors will introduce nuisance parameters into the
limiting distribution but a simulation-based approach nevertheless enables us
to conduct valid inference. A multivariate subsampling KPSS test is proposed to
verify the correct specification of the cointegrating relation. Our simulation
study shows good performance of the simulated inference approach and
subsampling KPSS test. We illustrate the GCPR approach using data for Austria,
Belgium, Finland, the Netherlands, Switzerland, and the UK. A single global
trend accurately captures all nonlinearities leading to a linear cointegrating
relation between GDP and CO2 for all countries. This suggests that the
environmental improvement of the last years is due to economic factors
different from GDP.
arXiv link: http://arxiv.org/abs/2009.02262v2
Unlucky Number 13? Manipulating Evidence Subject to Snooping
considerable recent interest throughout and beyond the scientific community. We
subsume such practices involving secret data snooping that influences
subsequent statistical inference under the term MESSing (manipulating evidence
subject to snooping) and discuss, illustrate and quantify the possibly dramatic
effects of several forms of MESSing using an empirical and a simple theoretical
example. The empirical example uses numbers from the most popular German
lottery, which seem to suggest that 13 is an unlucky number.
arXiv link: http://arxiv.org/abs/2009.02198v1
Instrument Validity for Heterogeneous Causal Effects
heterogeneous causal effect models. The generalization includes the cases where
the treatment can be multivalued ordered or unordered. Based on a series of
testable implications, we propose a nonparametric test which is proved to be
asymptotically size controlled and consistent. Compared to the tests in the
literature, our test can be applied in more general settings and may achieve
power improvement. Refutation of instrument validity by the test helps detect
invalid instruments that may yield implausible results on causal effects.
Evidence that the test performs well on finite samples is provided via
simulations. We revisit the empirical study on return to schooling to
demonstrate application of the proposed test in practice. An extended
continuous mapping theorem and an extended delta method, which may be of
independent interest, are provided to establish the asymptotic distribution of
the test statistic under null.
arXiv link: http://arxiv.org/abs/2009.01995v6
The role of parallel trends in event study settings: An application to environmental economics
treatment timing such that, after making an appropriate parallel trends
assumption, one can identify, estimate, and make inference about causal
effects. In practice, however, different DID procedures rely on different
parallel trends assumptions (PTA), and recover different causal parameters. In
this paper, we focus on staggered DID (also referred as event-studies) and
discuss the role played by the PTA in terms of identification and estimation of
causal parameters. We document a “robustness” vs. “efficiency” trade-off in
terms of the strength of the underlying PTA, and argue that practitioners
should be explicit about these trade-offs whenever using DID procedures. We
propose new DID estimators that reflect these trade-offs and derived their
large sample properties. We illustrate the practical relevance of these results
by assessing whether the transition from federal to state management of the
Clean Water Act affects compliance rates.
arXiv link: http://arxiv.org/abs/2009.01963v1
Deep Learning in Science
on by impressive achievements within a broader family of machine learning
methods, commonly referred to as Deep Learning (DL). This paper provides
insights on the diffusion and impact of DL in science. Through a Natural
Language Processing (NLP) approach on the arXiv.org publication corpus, we
delineate the emerging DL technology and identify a list of relevant search
terms. These search terms allow us to retrieve DL-related publications from Web
of Science across all sciences. Based on that sample, we document the DL
diffusion process in the scientific system. We find i) an exponential growth in
the adoption of DL as a research tool across all sciences and all over the
world, ii) regional differentiation in DL application domains, and iii) a
transition from interdisciplinary DL applications to disciplinary research
within application domains. In a second step, we investigate how the adoption
of DL methods affects scientific development. Therefore, we empirically assess
how DL adoption relates to re-combinatorial novelty and scientific impact in
the health sciences. We find that DL adoption is negatively correlated with
re-combinatorial novelty, but positively correlated with expectation as well as
variance of citation performance. Our findings suggest that DL does not (yet?)
work as an autopilot to navigate complex knowledge landscapes and overthrow
their structure. However, the 'DL principle' qualifies for its versatility as
the nucleus of a general scientific method that advances science in a
measurable way.
arXiv link: http://arxiv.org/abs/2009.01575v2
A Robust Score-Driven Filter for Multivariate Time Series
vector processes. By assuming that the conditional location vector from a
multivariate Student's t distribution changes over time, we construct a robust
filter which is able to overcome several issues that naturally arise when
modeling heavy-tailed phenomena and, more in general, vectors of dependent
non-Gaussian time series. We derive conditions for stationarity and
invertibility and estimate the unknown parameters by maximum likelihood (ML).
Strong consistency and asymptotic normality of the estimator are proved and the
finite sample properties are illustrated by a Monte-Carlo study. From a
computational point of view, analytical formulae are derived, which consent to
develop estimation procedures based on the Fisher scoring method. The theory is
supported by a novel empirical illustration that shows how the model can be
effectively applied to estimate consumer prices from home scanner data.
arXiv link: http://arxiv.org/abs/2009.01517v3
Hidden Group Time Profiles: Heterogeneous Drawdown Behaviours in Retirement
Grouped Fixed-Effects (GFE) estimator applied to Australian panel data on
drawdowns from phased withdrawal retirement income products. Behaviours
exhibited by the distinct latent groups identified suggest that retirees may
adopt simple heuristics determining how they draw down their accumulated
wealth. Two extensions to the original GFE methodology are proposed: a latent
group label-matching procedure which broadens bootstrap inference to include
the time profile estimates, and a modified estimation procedure for models with
time-invariant additive fixed effects estimated using unbalanced data.
arXiv link: http://arxiv.org/abs/2009.01505v3
A Vector Monotonicity Assumption for Multiple Instruments
binary treatment, the monotonicity assumption of the local average treatment
effects (LATE) framework can become restrictive: it requires that all units
share a common direction of response even when separate instruments are shifted
in opposing directions. What I call vector monotonicity, by contrast, simply
assumes treatment uptake to be monotonic in all instruments. I characterize the
class of causal parameters that are point identified under vector monotonicity,
when the instruments are binary. This class includes, for example, the average
treatment effect among units that are in any way responsive to the collection
of instruments, or those that are responsive to a given subset of them. The
identification results are constructive and yield a simple estimator for the
identified treatment effect parameters. An empirical application revisits the
labor market returns to college.
arXiv link: http://arxiv.org/abs/2009.00553v6
Time-Varying Parameters as Ridge Regressions
capture structural change. I highlight a rather underutilized fact -- that
these are actually ridge regressions. Instantly, this makes computations,
tuning, and implementation much easier than in the state-space paradigm. Among
other things, solving the equivalent dual ridge problem is computationally very
fast even in high dimensions, and the crucial "amount of time variation" is
tuned by cross-validation. Evolving volatility is dealt with using a two-step
ridge regression. I consider extensions that incorporate sparsity (the
algorithm selects which parameters vary and which do not) and reduced-rank
restrictions (variation is tied to a factor model). To demonstrate the
usefulness of the approach, I use it to study the evolution of monetary policy
in Canada using large time-varying local projections. The application requires
the estimation of about 4600 TVPs, a task well within the reach of the new
method.
arXiv link: http://arxiv.org/abs/2009.00401v4
An optimal test for strategic interaction in social and economic network formation between heterogeneous agents
form a directed network. Agents' preferences over the form of the network
consist of an arbitrary network benefit function (e.g., agents may have
preferences over their network centrality) and a private component which is
additively separable in own links. This latter component allows for unobserved
heterogeneity in the costs of sending and receiving links across agents
(respectively out- and in- degree heterogeneity) as well as
homophily/heterophily across the $K$ types of agents. In contrast, the network
benefit function allows agents' preferences over links to vary with the
presence or absence of links elsewhere in the network (and hence with the link
formation behavior of their peers). In the null model which excludes the
network benefit function, links form independently across dyads in the manner
described by Charbonneau_EJ17. Under the alternative there is
interdependence across linking decisions (i.e., strategic interaction). We show
how to test the null with power optimized in specific directions. These
alternative directions include many common models of strategic network
formation (e.g., "connections" models, "structural hole" models etc.). Our
random utility specification induces an exponential family structure under the
null which we exploit to construct a similar test which exactly controls size
(despite the the null being a composite one with many nuisance parameters). We
further show how to construct locally best tests for specific alternatives
without making any assumptions about equilibrium selection. To make our tests
feasible we introduce a new MCMC algorithm for simulating the null
distributions of our test statistics.
arXiv link: http://arxiv.org/abs/2009.00212v2
InClass Nets: Independent Classifier Networks for Nonparametric Estimation of Conditional Independence Mixture Models and Unsupervised Classification
Independent Classifier networks (InClass nets) technique, for the
nonparameteric estimation of conditional independence mixture models (CIMMs).
We approach the estimation of a CIMM as a multi-class classification problem,
since dividing the dataset into different categories naturally leads to the
estimation of the mixture model. InClass nets consist of multiple independent
classifier neural networks (NNs), each of which handles one of the variates of
the CIMM. Fitting the CIMM to the data is performed by simultaneously training
the individual NNs using suitable cost functions. The ability of NNs to
approximate arbitrary functions makes our technique nonparametric. Further
leveraging the power of NNs, we allow the conditionally independent variates of
the model to be individually high-dimensional, which is the main advantage of
our technique over existing non-machine-learning-based approaches. We derive
some new results on the nonparametric identifiability of bivariate CIMMs, in
the form of a necessary and a (different) sufficient condition for a bivariate
CIMM to be identifiable. We provide a public implementation of InClass nets as
a Python package called RainDancesVI and validate our InClass nets technique
with several worked out examples. Our method also has applications in
unsupervised and semi-supervised classification problems.
arXiv link: http://arxiv.org/abs/2009.00131v1
Identification of Semiparametric Panel Multinomial Choice Models with Infinite-Dimensional Fixed Effects
estimation in panel multinomial choice models, where we allow for
infinite-dimensional fixed effects that enter into consumer utilities in an
additively nonseparable way, thus incorporating rich forms of unobserved
heterogeneity. Our identification strategy exploits multivariate monotonicity
in parametric indexes, and uses the logical contraposition of an intertemporal
inequality on choice probabilities to obtain identifying restrictions. We
provide a consistent estimation procedure, and demonstrate the practical
advantages of our method with Monte Carlo simulations and an empirical
illustration on popcorn sales with the Nielsen data.
arXiv link: http://arxiv.org/abs/2009.00085v2
Causal Inference in Possibly Nonlinear Factor Models
models with noisily measured confounders. The key feature is that a large set
of noisy measurements are linked with the underlying latent confounders through
an unknown, possibly nonlinear factor structure. The main building block is a
local principal subspace approximation procedure that combines $K$-nearest
neighbors matching and principal component analysis. Estimators of many causal
parameters, including average treatment effects and counterfactual
distributions, are constructed based on doubly-robust score functions.
Large-sample properties of these estimators are established, which only require
relatively mild conditions on the principal subspace approximation. The results
are illustrated with an empirical application studying the effect of political
connections on stock returns of financial firms, and a Monte Carlo experiment.
The main technical and methodological results regarding the general local
principal subspace approximation method may be of independent interest.
arXiv link: http://arxiv.org/abs/2008.13651v3
Efficiency Loss of Asymptotically Efficient Tests in an Instrumental Variables Regression
alternative in parts of the parameter space. These regions involve a constraint
on the first-stage regression coefficients and the reduced-form covariance
matrix. Consequently, the Lagrange Multiplier test can have power close to
size, despite being efficient under standard asymptotics. This information loss
limits the power of conditional tests which use only the Anderson-Rubin and the
score statistic. The conditional quasi-likelihood ratio test also suffers
severe losses because it can be bounded for any alternative.
A necessary condition for drastic power loss to occur is that the Hermitian
of the reduced-form covariance matrix has eigenvalues of opposite signs. These
cases are denoted impossibility designs (ID). We show this happens in practice,
by applying our theory to the problem of inference on the intertemporal
elasticity of substitution (IES). Of eleven countries studied by Yogo (2004}
and Andrews (2016), nine are consistent with ID at the 95% level.
arXiv link: http://arxiv.org/abs/2008.13042v2
The Identity Fragmentation Bias
machines; these interactions are often recorded with different identifiers for
the same consumer. The failure to correctly match different identities leads to
a fragmented view of exposures and behaviors. This paper studies the identity
fragmentation bias, referring to the estimation bias resulted from using
fragmented data. Using a formal framework, we decompose the contributing
factors of the estimation bias caused by data fragmentation and discuss the
direction of bias. Contrary to conventional wisdom, this bias cannot be signed
or bounded under standard assumptions. Instead, upward biases and sign
reversals can occur even in experimental settings. We then compare several
corrective measures, and discuss their respective advantages and caveats.
arXiv link: http://arxiv.org/abs/2008.12849v2
Instrumental Variable Quantile Regression
Chernozhukov and Hansen (2005). We discuss the key conditions used for
identification of structural quantile effects within this model which include
the availability of instruments and a restriction on the ranks of structural
disturbances. We outline several approaches to obtaining point estimates and
performing statistical inference for model parameters. Finally, we point to
possible directions for future research.
arXiv link: http://arxiv.org/abs/2009.00436v1
Generalized Lee Bounds
presence of selection bias, assuming the treatment effect on selection has the
same sign for all subjects. This paper generalizes Lee bounds to allow the sign
of this effect to be identified by pretreatment covariates, relaxing the
standard (unconditional) monotonicity to its conditional analog. Asymptotic
theory for generalized Lee bounds is proposed in low-dimensional smooth and
high-dimensional sparse designs. The paper also generalizes Lee bounds to
accommodate multiple outcomes. Focusing on JobCorps job training program, I
first show that unconditional monotonicity is unlikely to hold, and then
demonstrate the use of covariates to tighten the bounds.
arXiv link: http://arxiv.org/abs/2008.12720v4
Nowcasting in a Pandemic using Non-Parametric Mixed Frequency VARs
non-parametric mixed frequency VARs using additive regression trees. We argue
that regression tree models are ideally suited for macroeconomic nowcasting in
the face of extreme observations, for instance those produced by the COVID-19
pandemic of 2020. This is due to their flexibility and ability to model
outliers. In an application involving four major euro area countries, we find
substantial improvements in nowcasting performance relative to a linear mixed
frequency VAR.
arXiv link: http://arxiv.org/abs/2008.12706v3
How is Machine Learning Useful for Macroeconomic Forecasting?
adding the "how". The current forecasting literature has focused on matching
specific variables and horizons with a particularly successful algorithm. In
contrast, we study the usefulness of the underlying features driving ML gains
over standard macroeconometric methods. We distinguish four so-called features
(nonlinearities, regularization, cross-validation and alternative loss
function) and study their behavior in both the data-rich and data-poor
environments. To do so, we design experiments that allow to identify the
"treatment" effects of interest. We conclude that (i) nonlinearity is the true
game changer for macroeconomic prediction, (ii) the standard factor model
remains the best regularization, (iii) K-fold cross-validation is the best
practice and (iv) the $L_2$ is preferred to the $\bar \epsilon$-insensitive
in-sample loss. The forecasting gains of nonlinear techniques are associated
with high macroeconomic uncertainty, financial stress and housing bubble
bursts. This suggests that Machine Learning is useful for macroeconomic
forecasting by mostly capturing important nonlinearities that arise in the
context of uncertainty and financial frictions.
arXiv link: http://arxiv.org/abs/2008.12477v1
Efficient closed-form estimation of large spatial autoregressions
autoregressive models with a large number of parameters are examined, in the
sense that the parameter space grows slowly as a function of sample size. These
have the same asymptotic efficiency properties as maximum likelihood under
Gaussianity but are of closed form. Hence they are computationally simple and
free from compactness assumptions, thereby avoiding two notorious pitfalls of
implicitly defined estimates of large spatial autoregressions. For an initial
least squares estimate, the Newton step can also lead to weaker regularity
conditions for a central limit theorem than those extant in the literature. A
simulation study demonstrates excellent finite sample gains from Newton
iterations, especially in large multiparameter models for which grid search is
costly. A small empirical illustration shows improvements in estimation
precision with real data.
arXiv link: http://arxiv.org/abs/2008.12395v4
Inference for parameters identified by conditional moment restrictions using a generalized Bierens maximum statistic
equations, imply that the parameters of interest are identified by conditional
moment restrictions. We introduce a novel inference method without any prior
information about which conditioning instruments are weak or irrelevant.
Building on Bierens (1990), we propose penalized maximum statistics and combine
bootstrap inference with model selection. Our method optimizes asymptotic power
by solving a data-dependent max-min problem for tuning parameter selection.
Extensive Monte Carlo experiments, based on an empirical example, demonstrate
the extent to which our inference procedure is superior to those available in
the literature.
arXiv link: http://arxiv.org/abs/2008.11140v7
On the equivalence between the Kinetic Ising Model and discrete autoregressive processes
variety of systems, from magnetic spins to financial time series and neuron
activity. In Statistical Physics the Kinetic Ising Model has been introduced to
describe the dynamics of the magnetic moments of a spin lattice, while in time
series analysis discrete autoregressive processes have been designed to capture
the multivariate dependence structure across binary time series. In this
article we provide a rigorous proof of the equivalence between the two models
in the range of a unique and invertible map unambiguously linking one model
parameters set to the other. Our result finds further justification
acknowledging that both models provide maximum entropy distributions of binary
time series with given means, auto-correlations, and lagged cross-correlations
of order one. We further show that the equivalence between the two models
permits to exploit the inference methods originally developed for one model in
the inference of the other.
arXiv link: http://arxiv.org/abs/2008.10666v3
Finite-Sample Average Bid Auction
auctioneer accesses the knowledge of the valuation distribution only through
statistical samples. A new framework is established that combines the
statistical decision theory with mechanism design. Two optimality criteria,
maxmin, and equivariance, are studied along with their implications on the form
of auctions. The simplest form of the equivariant auction is the average bid
auction, which set individual reservation prices proportional to the average of
other bids and historical samples. This form of auction can be motivated by the
Gamma distribution, and it sheds new light on the estimation of the optimal
price, an irregular parameter. Theoretical results show that it is often
possible to use the regular parameter population mean to approximate the
optimal price. An adaptive average bid estimator is developed under this idea,
and it has the same asymptotic properties as the empirical Myerson estimator.
The new proposed estimator has a significantly better performance in terms of
value at risk and expected shortfall when the sample size is small.
arXiv link: http://arxiv.org/abs/2008.10217v2
Empirical Likelihood Covariate Adjustment for Regression Discontinuity Designs
incorporates covariate balance in regression discontinuity (RD) designs. The
new empirical entropy balancing method reweights the standard local polynomial
RD estimator by using the entropy balancing weights that minimize the
Kullback--Leibler divergence from the uniform weights while satisfying the
covariate balance constraints. Our estimator can be formulated as an empirical
likelihood estimator that efficiently incorporates the information from the
covariate balance condition as correctly specified over-identifying moment
restrictions, and thus has an asymptotic variance no larger than that of the
standard estimator without covariates. We demystify the asymptotic efficiency
gain of Calonico, Cattaneo, Farrell, and Titiunik (2019)'s regression-based
covariate-adjusted estimator, as their estimator has the same asymptotic
variance as ours. Further efficiency improvement from balancing over sieve
spaces is possible if our entropy balancing weights are computed using stronger
covariate balance constraints that are imposed on functions of covariates. We
then show that our method enjoys favorable second-order properties from
empirical likelihood estimation and inference: the estimator has a small
(bounded) nonlinearity bias, and the likelihood ratio based confidence set
admits a simple analytical correction that can be used to improve coverage
accuracy. The coverage accuracy of our confidence set is robust against slight
perturbation to the covariate balance condition, which may happen in cases such
as data contamination and misspecified "unaffected" outcomes used as
covariates. The proposed entropy balancing approach for covariate adjustment is
applicable to other RD-related settings.
arXiv link: http://arxiv.org/abs/2008.09263v3
Inference for Moment Inequalities: A Constrained Moment Selection Procedure
of interest in many areas of economics. This paper develops a new method for
improving the performance of generalized moment selection (GMS) testing
procedures in finite-samples. The method modifies GMS tests by tilting the
empirical distribution in its moment selection step by an amount that maximizes
the empirical likelihood subject to the restrictions of the null hypothesis. We
characterize sets of population distributions on which a modified GMS test is
(i) asymptotically equivalent to its non-modified version to first-order, and
(ii) superior to its non-modified version according to local power when the
sample size is large enough. An important feature of the proposed modification
is that it remains computationally feasible even when the number of moment
inequalities is large. We report simulation results that show the modified
tests control size well, and have markedly improved local power over their
non-modified counterparts.
arXiv link: http://arxiv.org/abs/2008.09021v2
A Novel Approach to Predictive Accuracy Testing in Nested Environments
nested models that bypasses the difficulties caused by the degeneracy of the
asymptotic variance of forecast error loss differentials used in the
construction of commonly used predictive comparison statistics. Our approach
continues to rely on the out of sample MSE loss differentials between the two
competing models, leads to nuisance parameter free Gaussian asymptotics and is
shown to remain valid under flexible assumptions that can accommodate
heteroskedasticity and the presence of mixed predictors (e.g. stationary and
local to unit root). A local power analysis also establishes its ability to
detect departures from the null in both stationary and persistent settings.
Simulations calibrated to common economic and financial applications indicate
that our methods have strong power with good size control across commonly
encountered sample sizes.
arXiv link: http://arxiv.org/abs/2008.08387v3
Bounds on Distributional Treatment Effect Parameters using Panel Data with an Application on Job Displacement
parameters that depend on the joint distribution of potential outcomes -- an
object not identified by standard identifying assumptions such as selection on
observables or even when treatment is randomly assigned. I show that panel data
and an additional assumption on the dependence between untreated potential
outcomes for the treated group over time (i) provide more identifying power for
distributional treatment effect parameters than existing bounds and (ii)
provide a more plausible set of conditions than existing methods that obtain
point identification. I apply these bounds to study heterogeneity in the effect
of job displacement during the Great Recession. Using standard techniques, I
find that workers who were displaced during the Great Recession lost on average
34% of their earnings relative to their counterfactual earnings had they not
been displaced. Using the methods developed in the current paper, I also show
that the average effect masks substantial heterogeneity across workers.
arXiv link: http://arxiv.org/abs/2008.08117v1
Learning Structure in Nested Logit Models
structure discovery. Nested logit models allow the modeling of positive
correlations between the error terms of the utility specifications of the
different alternatives in a discrete choice scenario through the specification
of a nesting structure. Current nested logit model estimation practices require
an a priori specification of a nesting structure by the modeler. In this we
work we optimize over all possible specifications of the nested logit model
that are consistent with rational utility maximization. We formulate the
problem of learning an optimal nesting structure from the data as a mixed
integer nonlinear programming (MINLP) optimization problem and solve it using a
variant of the linear outer approximation algorithm. We exploit the tree
structure of the problem and utilize the latest advances in integer
optimization to bring practical tractability to the optimization problem we
introduce. We demonstrate the ability of our algorithm to correctly recover the
true nesting structure from synthetic data in a Monte Carlo experiment. In an
empirical illustration using a stated preference survey on modes of
transportation in the U.S. state of Massachusetts, we use our algorithm to
obtain an optimal nesting tree representing the correlations between the
unobserved effects of the different travel mode choices. We provide our
implementation as a customizable and open-source code base written in the Julia
programming language.
arXiv link: http://arxiv.org/abs/2008.08048v1
Peer effects and endogenous social interactions
linear-in-means model. Contrary to the existing proposals we do not require to
specify a model for how the selection of peers comes about. Rather, we exploit
two restrictions that are inherent to many such specifications to construct
intuitive instrumental variables. These restrictions are that link decisions
that involve a given individual are not all independent of one another, but
that they are independent of the link behavior between other pairs of
individuals. A two-stage least-squares estimator of the linear-in-means model
is then readily obtained.
arXiv link: http://arxiv.org/abs/2008.07886v1
A Relation Analysis of Markov Decision Process Frameworks
frameworks in the machine learning and econometrics literatures, including the
standard MDP, the entropy and general regularized MDP, and stochastic MDP,
where the latter is based on the assumption that the reward function is
stochastic and follows a given distribution. We show that the
entropy-regularized MDP is equivalent to a stochastic MDP model, and is
strictly subsumed by the general regularized MDP. Moreover, we propose a
distributional stochastic MDP framework by assuming that the distribution of
the reward function is ambiguous. We further show that the distributional
stochastic MDP is equivalent to the regularized MDP, in the sense that they
always yield the same optimal policies. We also provide a connection between
stochastic/regularized MDP and constrained MDP. Our work gives a unified view
on several important MDP frameworks, which would lead new ways to interpret the
(entropy/general) regularized MDP frameworks through the lens of stochastic
rewards and vice-versa. Given the recent popularity of regularized MDP in
(deep) reinforcement learning, our work brings new understandings of how such
algorithmic schemes work and suggest ideas to develop new ones.
arXiv link: http://arxiv.org/abs/2008.07820v1
Analysing a built-in advantage in asymmetric darts contests using causal machine learning
contestants enjoys a technical advantage. Using methods from the causal machine
learning literature, we analyse the built-in advantage, which is the
first-mover having potentially more but never less moves. Our empirical
findings suggest that the first-mover has an 8.6 percentage points higher
probability to win the match induced by the technical advantage. Contestants
with low performance measures and little experience have the highest built-in
advantage. With regard to the fairness principle that contestants with equal
abilities should have equal winning probabilities, this contest is ex-ante fair
in the case of equal built-in advantages for both competitors and a randomized
starting right. Nevertheless, the contest design produces unequal probabilities
of winning for equally skilled contestants because of asymmetries in the
built-in advantage associated with social pressure for contestants competing at
home and away.
arXiv link: http://arxiv.org/abs/2008.07165v1
To Bag is to Prune
RF blatantly overfits in-sample without any apparent consequence out-of-sample.
Standard arguments, like the classic bias-variance trade-off or double descent,
cannot rationalize this paradox. I propose a new explanation: bootstrap
aggregation and model perturbation as implemented by RF automatically prune a
latent "true" tree. More generally, randomized ensembles of greedily optimized
learners implicitly perform optimal early stopping out-of-sample. So there is
no need to tune the stopping point. By construction, novel variants of Boosting
and MARS are also eligible for automatic tuning. I empirically demonstrate the
property, with simulated and real data, by reporting that these new completely
overfitting ensembles perform similarly to their tuned counterparts -- or
better.
arXiv link: http://arxiv.org/abs/2008.07063v5
Optimal selection of the number of control units in kNN algorithm to estimate average treatment effects
in k nearest neighbors (kNN) algorithm focusing in minimizing the mean squared
error for the average treatment effects. Our approach is non-parametric where
confidence intervals for the treatment effects were calculated using asymptotic
results with bias correction. Simulation exercises show that our approach gets
relative small mean squared errors, and a balance between confidence intervals
length and type I error. We analyzed the average treatment effects on treated
(ATET) of participation in 401(k) plans on accumulated net financial assets
confirming significant effects on amount and positive probability of net asset.
Our optimal k selection produces significant narrower ATET confidence intervals
compared with common practice of using k=1.
arXiv link: http://arxiv.org/abs/2008.06564v1
Bounding Infection Prevalence by Bounding Selectivity and Accuracy of Tests: With Application to Early COVID-19
information on test rate and test yield. The approach utilizes user-specified
bounds on (i) test accuracy and (ii) the extent to which tests are targeted,
formalized as restriction on the effect of true infection status on the odds
ratio of getting tested and thereby embeddable in logit specifications. The
motivating application is to the COVID-19 pandemic but the strategy may also be
useful elsewhere.
Evaluated on data from the pandemic's early stage, even the weakest of the
novel bounds are reasonably informative. Notably, and in contrast to
speculations that were widely reported at the time, they place the infection
fatality rate for Italy well above the one of influenza by mid-April.
arXiv link: http://arxiv.org/abs/2008.06178v2
"Big Data" and its Origins
variety, I investigate the origins of the term "Big Data". Its origins are a
bit murky and hence intriguing, involving both academics and industry,
statistics and computer science, ultimately winding back to lunch-table
conversations at Silicon Graphics Inc. (SGI) in the mid 1990s. The Big Data
phenomenon continues unabated, and the ongoing development of statistical
machine learning tools continues to help us confront it.
arXiv link: http://arxiv.org/abs/2008.05835v6
A dynamic ordered logit model with fixed effects
accommodates fixed effects and state dependence. We provide identification
results for the autoregressive parameter, regression coefficients, and the
threshold parameters in this model. Our results require only four observations
on the outcome variable. We provide conditions under which a composite
conditional maximum likelihood estimator is consistent and asymptotically
normal. We use our estimator to explore the determinants of self-reported
health in a panel of European countries over the period 2003-2016. We find
that: (i) the autoregressive parameter is positive and analogous to a linear
AR(1) coefficient of about 0.25, indicating persistence in health status; (ii)
the association between income and health becomes insignificant once we control
for unobserved heterogeneity and persistence.
arXiv link: http://arxiv.org/abs/2008.05517v1
Identification of Time-Varying Transformation Models with Fixed Effects, with an Application to Unobserved Heterogeneity in Resource Shares
panel models, where the response variable is an unknown, weakly monotone,
time-varying transformation of a latent linear index of fixed effects,
regressors, and an error term drawn from an unknown stationary distribution.
Our results identify the transformation, the coefficient on regressors, and
features of the distribution of the fixed effects. We then develop a
full-commitment intertemporal collective household model, where the implied
quantity demand equations are time-varying functions of a linear index. The
fixed effects in this index equal logged resource shares, defined as the
fractions of household expenditure enjoyed by each household member. Using
Bangladeshi data, we show that women's resource shares decline with household
budgets and that half of the variation in women's resource shares is due to
unobserved household-level heterogeneity.
arXiv link: http://arxiv.org/abs/2008.05507v2
Convergence rate of estimators of clustered panel models with misclassification
group structure and $N$ units and $T$ time periods under long panel
asymptotics. We show that the group-specific coefficients can be estimated at
the parametric root $NT$ rate even if error variances diverge as $T \to \infty$
and some units are asymptotically misclassified. This limit case approximates
empirically relevant settings and is not covered by existing asymptotic
results.
arXiv link: http://arxiv.org/abs/2008.04708v1
Nonparametric prediction with spatial data
a canonical factorization of the spectral density function. We provide
theoretical results showing that the predictor has desirable asymptotic
properties. Finite sample performance is assessed in a Monte Carlo study that
also compares our algorithm to a rival nonparametric method based on the
infinite AR representation of the dynamics of the data. Finally, we apply our
methodology to predict house prices in Los Angeles.
arXiv link: http://arxiv.org/abs/2008.04269v2
Decision Conflict and Deferral in A Class of Logit Models with a Context-Dependent Outside Option
difficult to make an active choice. Contrary to existing logit models with an
outside option where the latter is assigned a fixed value exogenously, this
paper introduces and analyzes a class of logit models where that option's value
is menu-dependent, may be determined endogenously, and could be interpreted as
proxying the varying dgree of decision difficulty at different menus. We focus
on the *power logit* special class of these models. We show that these predict
some observed choice-deferral effects that are caused by hard decisions,
including non-monotonic "roller-coaster" choice-overload phenomena that are
regulated by the presence or absence of a clearly dominant feasible
alternative. We illustrate the usability, novel insights and explanatory gains
of the proposed framework for empirical discrete choice analysis and
theoretical modelling of imperfectly competitive markets in the presence of
potentially indecisive consumers.
arXiv link: http://arxiv.org/abs/2008.04229v10
Machine Learning Panel Data Regressions with Heavy-tailed Dependent Data: Theory and Application
dependent panel data potentially sampled at different frequencies. We focus on
the sparse-group LASSO regularization. This type of regularization can take
advantage of the mixed frequency time series panel data structures and improve
the quality of the estimates. We obtain oracle inequalities for the pooled and
fixed effects sparse-group LASSO panel data estimators recognizing that
financial and economic data can have fat tails. To that end, we leverage on a
new Fuk-Nagaev concentration inequality for panel data consisting of
heavy-tailed $\tau$-mixing processes.
arXiv link: http://arxiv.org/abs/2008.03600v2
An Upper Bound for Functions of Estimators in High Dimensions
estimators in high dimensions. This upper bound may help establish the rate of
convergence of functions in high dimensions. The upper bound random variable
may converge faster, slower, or at the same rate as estimators depending on the
behavior of the partial derivative of the function. We illustrate this via
three examples. The first two examples use the upper bound for testing in high
dimensions, and third example derives the estimated out-of-sample variance of
large portfolios. All our results allow for a larger number of parameters, p,
than the sample size, n.
arXiv link: http://arxiv.org/abs/2008.02636v1
On the Size Control of the Hybrid Test for Predictive Ability
predictability. We demonstrate with a simple example that the test may not be
pointwise asymptotically of level $\alpha$ at commonly used significance levels
and may lead to rejection rates over $11%$ when the significance level
$\alpha$ is $5%$. Generalizing this observation, we provide a formal result
that pointwise asymptotic invalidity of the hybrid test persists in a setting
under reasonable conditions. As an easy alternative, we propose a modified
hybrid test based on the generalized moment selection method and show that the
modified test enjoys pointwise asymptotic validity. Monte Carlo simulations
support the theoretical findings.
arXiv link: http://arxiv.org/abs/2008.02318v2
Macroeconomic Data Transformations Matter
transformations/combinations of predictors does not alter predictions. However,
when the forecasting technology either uses shrinkage or is nonlinear, it does.
This is precisely the fabric of the machine learning (ML) macroeconomic
forecasting environment. Pre-processing of the data translates to an alteration
of the regularization -- explicit or implicit -- embedded in ML algorithms. We
review old transformations and propose new ones, then empirically evaluate
their merits in a substantial pseudo-out-sample exercise. It is found that
traditional factors should almost always be included as predictors and moving
average rotations of the data can provide important gains for various
forecasting targets. Also, we note that while predicting directly the average
growth rate is equivalent to averaging separate horizon forecasts when using
OLS-based techniques, the latter can substantially improve on the former when
regularization and/or nonparametric nonlinearities are involved.
arXiv link: http://arxiv.org/abs/2008.01714v2
Testing error distribution by kernelized Stein discrepancy in multivariate time series models
applications. To alleviate the risk of error distribution mis-specification,
testing methodologies are needed to detect whether the chosen error
distribution is correct. However, the majority of the existing tests only deal
with the multivariate normal distribution for some special multivariate time
series models, and they thus can not be used to testing for the often observed
heavy-tailed and skewed error distributions in applications. In this paper, we
construct a new consistent test for general multivariate time series models,
based on the kernelized Stein discrepancy. To account for the estimation
uncertainty and unobserved initial values, a bootstrap method is provided to
calculate the critical values. Our new test is easy-to-implement for a large
scope of multivariate error distributions, and its importance is illustrated by
simulated and real data.
arXiv link: http://arxiv.org/abs/2008.00747v1
Estimating TVP-VAR models with time invariant long-run multipliers
varying parameter vector auto-regression (TVP-VAR) models with a timeinvariant
long-run relationship between endogenous variables and changes in exogenous
variables. We propose a Gibbs sampling scheme for estimation of model
parameters as well as time-invariant long-run multiplier parameters. Further we
demonstrate the applicability of the proposed method by analyzing examples of
the Norwegian and Russian economies based on the data on real GDP, real
exchange rate and real oil prices. Our results show that incorporating the time
invariance constraint on the long-run multipliers in TVP-VAR model helps to
significantly improve the forecasting performance.
arXiv link: http://arxiv.org/abs/2008.00718v1
A spatial multinomial logit model for analysing urban expansion
patterns of urban expansion. The specification assumes that the log-odds of
each class follow a spatial autoregressive process. Using recent advances in
Bayesian computing, our model allows for a computationally efficient treatment
of the spatial multinomial logit model. This allows us to assess spillovers
between regions and across land use classes. In a series of Monte Carlo
studies, we benchmark our model against other competing specifications. The
paper also showcases the performance of the proposed specification using
European regional data. Our results indicate that spatial dependence plays a
key role in land sealing process of cropland and grassland. Moreover, we
uncover land sealing spillovers across multiple classes of arable land.
arXiv link: http://arxiv.org/abs/2008.00673v1
Design-Based Uncertainty for Quasi-Experiments
the treatment is (conditionally) randomly assigned. This paper develops a
design-based framework suitable for analyzing quasi-experimental settings in
the social sciences, in which the treatment assignment can be viewed as the
realization of some stochastic process but there is concern about unobserved
selection into treatment. In our framework, treatments are stochastic, but
units may differ in their probabilities of receiving treatment, thereby
allowing for rich forms of selection. We provide conditions under which the
estimands of popular quasi-experimental estimators correspond to interpretable
finite-population causal parameters. We characterize the biases and distortions
to inference that arise when these conditions are violated. These results can
be used to conduct sensitivity analyses when there are concerns about selection
into treatment. Taken together, our results establish a rigorous foundation for
quasi-experimental analyses that more closely aligns with the way empirical
researchers discuss the variation in the data.
arXiv link: http://arxiv.org/abs/2008.00602v8
What can we learn about SARS-CoV-2 prevalence from testing and hospital data?
population is difficult because tests are conducted on a small and non-random
segment of the population. However, people admitted to the hospital for
non-COVID reasons are tested at very high rates, even though they do not appear
to be at elevated risk of infection. This sub-population may provide valuable
evidence on prevalence in the general population. We estimate upper and lower
bounds on the prevalence of the virus in the general population and the
population of non-COVID hospital patients under weak assumptions on who gets
tested, using Indiana data on hospital inpatient records linked to SARS-CoV-2
virological tests. The non-COVID hospital population is tested fifty times as
often as the general population, yielding much tighter bounds on prevalence. We
provide and test conditions under which this non-COVID hospitalization bound is
valid for the general population. The combination of clinical testing data and
hospital records may contain much more information about the state of the
epidemic than has been previously appreciated. The bounds we calculate for
Indiana could be constructed at relatively low cost in many other states.
arXiv link: http://arxiv.org/abs/2008.00298v2
Simpler Proofs for Approximate Factor Models of Large Dimensions
work. Their theoretical properties, studied some twenty years ago, also laid
the ground work for analysis on large dimensional panel data models with
cross-section dependence. This paper presents simplified proofs for the
estimates by using alternative rotation matrices, exploiting properties of low
rank matrices, as well as the singular value decomposition of the data in
addition to its covariance structure. These simplifications facilitate
interpretation of results and provide a more friendly introduction to
researchers new to the field. New results are provided to allow linear
restrictions to be imposed on factor models.
arXiv link: http://arxiv.org/abs/2008.00254v1
Measuring the Effectiveness of US Monetary Policy during the COVID-19 Recession
decline in economic activity across the globe. To fight this recession, policy
makers in central banks engaged in expansionary monetary policy. This paper
asks whether the measures adopted by the US Federal Reserve (Fed) have been
effective in boosting real activity and calming financial markets. To measure
these effects at high frequencies, we propose a novel mixed frequency vector
autoregressive (MF-VAR) model. This model allows us to combine weekly and
monthly information within an unified framework. Our model combines a set of
macroeconomic aggregates such as industrial production, unemployment rates and
inflation with high frequency information from financial markets such as stock
prices, interest rate spreads and weekly information on the Feds balance sheet
size. The latter set of high frequency time series is used to dynamically
interpolate the monthly time series to obtain weekly macroeconomic measures. We
use this setup to simulate counterfactuals in absence of monetary stimulus. The
results show that the monetary expansion caused higher output growth and stock
market returns, more favorable long-term financing conditions and a
depreciation of the US dollar compared to a no-policy benchmark scenario.
arXiv link: http://arxiv.org/abs/2007.15419v1
Local Projection Inference is Simpler and More Robust Than You Think
responses using local projections, i.e., direct linear regressions of future
outcomes on current covariates. This paper proves that local projection
inference robustly handles two issues that commonly arise in applications:
highly persistent data and the estimation of impulse responses at long
horizons. We consider local projections that control for lags of the variables
in the regression. We show that lag-augmented local projections with normal
critical values are asymptotically valid uniformly over (i) both stationary and
non-stationary data, and also over (ii) a wide range of response horizons.
Moreover, lag augmentation obviates the need to correct standard errors for
serial correlation in the regression residuals. Hence, local projection
inference is arguably both simpler than previously thought and more robust than
standard autoregressive inference, whose validity is known to depend
sensitively on the persistence of the data and on the length of the horizon.
arXiv link: http://arxiv.org/abs/2007.13888v3
The Spectral Approach to Linear Rational Expectations Models
domain. The paper characterizes existence and uniqueness of solutions to
particular as well as generic systems. The set of all solutions to a given
system is shown to be a finite dimensional affine space in the frequency
domain. It is demonstrated that solutions can be discontinuous with respect to
the parameters of the models in the context of non-uniqueness, invalidating
mainstream frequentist and Bayesian methods. The ill-posedness of the problem
motivates regularized solutions with theoretically guaranteed uniqueness,
continuity, and even differentiability properties.
arXiv link: http://arxiv.org/abs/2007.13804v6
Unconditional Quantile Regression with High Dimensional Data
counterfactual effects with high-dimensional data. We propose a novel robust
score for debiased estimation of the unconditional quantile regression (Firpo,
Fortin, and Lemieux, 2009) as a measure of heterogeneous counterfactual
marginal effects. We propose a multiplier bootstrap inference and develop
asymptotic theories to guarantee the size control in large sample. Simulation
studies support our theories. Applying the proposed method to Job Corps survey
data, we find that a policy which counterfactually extends the duration of
exposures to the Job Corps training program will be effective especially for
the targeted subpopulations of lower potential wage earners.
arXiv link: http://arxiv.org/abs/2007.13659v4
Total Error and Variability Measures for the Quarterly Workforce Indicators and LEHD Origin-Destination Employment Statistics in OnTheMap
five major indicators in the U.S. Census Bureau's Longitudinal
Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators
(QWI): total flow-employment, beginning-of-quarter employment, full-quarter
employment, average monthly earnings of full-quarter employees, and total
quarterly payroll. Beginning-of-quarter employment is also the main tabulation
variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace
reports as displayed in OnTheMap (OTM), including OnTheMap for Emergency
Management. We account for errors due to coverage; record-level non-response;
edit and imputation of item missing data; and statistical disclosure
limitation. The analysis reveals that the five publication variables under
study are estimated very accurately for tabulations involving at least 10 jobs.
Tabulations involving three to nine jobs are a transition zone, where cells may
be fit for use with caution. Tabulations involving one or two jobs, which are
generally suppressed on fitness-for-use criteria in the QWI and synthesized in
LODES, have substantial total variability but can still be used to estimate
statistics for untabulated aggregates as long as the job count in the aggregate
is more than 10.
arXiv link: http://arxiv.org/abs/2007.13275v1
Scalable Bayesian estimation in the multinomial probit model
as it allows for correlation between choice alternatives. Because current model
specifications employ a full covariance matrix of the latent utilities for the
choice alternatives, they are not scalable to a large number of choice
alternatives. This paper proposes a factor structure on the covariance matrix,
which makes the model scalable to large choice sets. The main challenge in
estimating this structure is that the model parameters require identifying
restrictions. We identify the parameters by a trace-restriction on the
covariance matrix, which is imposed through a reparametrization of the factor
structure. We specify interpretable prior distributions on the model parameters
and develop an MCMC sampler for parameter estimation. The proposed approach
significantly improves performance in large choice sets relative to existing
multinomial probit specifications. Applications to purchase data show the
economic importance of including a large number of choice alternatives in
consumer choice analysis.
arXiv link: http://arxiv.org/abs/2007.13247v2
The role of global economic policy uncertainty in predicting crude oil futures volatility: Evidence from a two-factor GARCH-MIDAS model
(GEPU) and uncertainty changes have different impacts on crude oil futures
volatility. We establish single-factor and two-factor models under the
GARCH-MIDAS framework to investigate the predictive power of GEPU and GEPU
changes excluding and including realized volatility. The findings show that the
models with rolling-window specification perform better than those with
fixed-span specification. For single-factor models, the GEPU index and its
changes, as well as realized volatility, are consistent effective factors in
predicting the volatility of crude oil futures. Specially, GEPU changes have
stronger predictive power than the GEPU index. For two-factor models, GEPU is
not an effective forecast factor for the volatility of WTI crude oil futures or
Brent crude oil futures. The two-factor model with GEPU changes contains more
information and exhibits stronger forecasting ability for crude oil futures
market volatility than the single-factor models. The GEPU changes are indeed
the main source of long-term volatility of the crude oil futures.
arXiv link: http://arxiv.org/abs/2007.12838v1
Applying Data Synthesis for Longitudinal Business Data across Three Countries
protect. Many businesses have unique characteristics, and distributions of
employment, sales, and profits are highly skewed. Attackers wishing to conduct
identification attacks often have access to much more information than for any
individual. As a consequence, most disclosure avoidance mechanisms fail to
strike an acceptable balance between usefulness and confidentiality protection.
Detailed aggregate statistics by geography or detailed industry classes are
rare, public-use microdata on businesses are virtually inexistant, and access
to confidential microdata can be burdensome. Synthetic microdata have been
proposed as a secure mechanism to publish microdata, as part of a broader
discussion of how to provide broader access to such data sets to researchers.
In this article, we document an experiment to create analytically valid
synthetic data, using the exact same model and methods previously employed for
the United States, for data from two different countries: Canada (LEAP) and
Germany (BHP). We assess utility and protection, and provide an assessment of
the feasibility of extending such an approach in a cost-effective way to other
data.
arXiv link: http://arxiv.org/abs/2008.02246v1
Are low frequency macroeconomic variables important for high frequency electricity prices?
In many applications, it might be interesting to predict daily electricity
prices by using their own lags or renewable energy sources. However, the recent
turmoil of energy prices and the Russian-Ukrainian war increased attention in
evaluating the relevance of industrial production and the Purchasing Managers'
Index output survey in forecasting the daily electricity prices. We develop a
Bayesian reverse unrestricted MIDAS model which accounts for the mismatch in
frequency between the daily prices and the monthly macro variables in Germany
and Italy. We find that the inclusion of macroeconomic low frequency variables
is more important for short than medium term horizons by means of point and
density measures. In particular, accuracy increases by combining hard and soft
information, while using only surveys gives less accurate forecasts than using
only industrial production data.
arXiv link: http://arxiv.org/abs/2007.13566v2
bootUR: An R Package for Bootstrap Unit Root Tests
provide practitioners with a single, unified framework for comprehensive and
reliable unit root testing in the R package bootUR.The package's backbone is
the popular augmented Dickey-Fuller test paired with a union of rejections
principle, which can be performed directly on single time series or multiple
(including panel) time series. Accurate inference is ensured through the use of
bootstrap methods. The package addresses the needs of both novice users, by
providing user-friendly and easy-to-implement functions with sensible default
options, as well as expert users, by giving full user-control to adjust the
tests to one's desired settings. Our parallelized C++ implementation ensures
that all unit root tests are scalable to datasets containing many time series.
arXiv link: http://arxiv.org/abs/2007.12249v5
Deep Dynamic Factor Models
Factor Model (D$^2$FM) --, is able to encode the information available, from
hundreds of macroeconomic and financial time-series into a handful of
unobserved latent states. While similar in spirit to traditional dynamic factor
models (DFMs), differently from those, this new class of models allows for
nonlinearities between factors and observables due to the autoencoder neural
network structure. However, by design, the latent states of the model can still
be interpreted as in a standard factor model. Both in a fully real-time
out-of-sample nowcasting and forecasting exercise with US data and in a Monte
Carlo experiment, the D$^2$FM improves over the performances of a
state-of-the-art DFM.
arXiv link: http://arxiv.org/abs/2007.11887v2
The Mode Treatment Effect
probability distributions. In program evaluation, the average treatment effect
(mean) and the quantile treatment effect (median) have been intensively studied
in the past decades. The mode treatment effect, however, has long been
neglected in program evaluation. This paper fills the gap by discussing both
the estimation and inference of the mode treatment effect. I propose both
traditional kernel and machine learning methods to estimate the mode treatment
effect. I also derive the asymptotic properties of the proposed estimators and
find that both estimators follow the asymptotic normality but with the rate of
convergence slower than the regular rate $N$, which is different from
the rates of the classical average and quantile treatment effect estimators.
arXiv link: http://arxiv.org/abs/2007.11606v1
Lasso Inference for High-Dimensional Time Series
extend the desparsified lasso to a time series setting under Near-Epoch
Dependence (NED) assumptions allowing for non-Gaussian, serially correlated and
heteroskedastic processes, where the number of regressors can possibly grow
faster than the time dimension. We first derive an error bound under weak
sparsity, which, coupled with the NED assumption, means this inequality can
also be applied to the (inherently misspecified) nodewise regressions performed
in the desparsified lasso. This allows us to establish the uniform asymptotic
normality of the desparsified lasso under general conditions, including for
inference on parameters of increasing dimensions. Additionally, we show
consistency of a long-run variance estimator, thus providing a complete set of
tools for performing inference in high-dimensional linear time series models.
Finally, we perform a simulation exercise to demonstrate the small sample
properties of the desparsified lasso in common time series settings.
arXiv link: http://arxiv.org/abs/2007.10952v6
The impact of economic policy uncertainties on the volatility of European carbon market
trading system designed by Europe to achieve emission reduction targets. The
amount of carbon emission caused by production activities is closely related to
the socio-economic environment. Therefore, from the perspective of economic
policy uncertainty, this article constructs the GARCH-MIDAS-EUEPU and
GARCH-MIDAS-GEPU models for investigating the impact of European and global
economic policy uncertainty on carbon price fluctuations. The results show that
both European and global economic policy uncertainty will exacerbate the
long-term volatility of European carbon spot return, with the latter having a
stronger impact when the change is the same. Moreover, the volatility of the
European carbon spot return can be forecasted better by the predictor, global
economic policy uncertainty. This research can provide some implications for
market managers in grasping carbon market trends and helping participants
control the risk of fluctuations in carbon allowances.
arXiv link: http://arxiv.org/abs/2007.10564v2
Treatment Effects with Targeting Instruments
discrete-valued instruments to control for selection bias in this setting. Our
discussion revolves around the concept of targeting: which instruments target
which treatments. It allows us to establish conditions under which
counterfactual averages and treatment effects are point- or
partially-identified for composite complier groups. We illustrate the
usefulness of our framework by applying it to data from the Head Start Impact
Study. Under a plausible positive selection assumption, we derive informative
bounds that suggest less beneficial effects of Head Start expansions than the
parametric estimates of Kline and Walters (2016).
arXiv link: http://arxiv.org/abs/2007.10432v5
Variable Selection in Macroeconomic Forecasting with Many Predictors
few key variables has become a new trend in econometrics. The commonly used
approach is factor augment (FA) approach. In this paper, we pursue another
direction, variable selection (VS) approach, to handle high-dimensional
predictors. VS is an active topic in statistics and computer science. However,
it does not receive as much attention as FA in economics. This paper introduces
several cutting-edge VS methods to economic forecasting, which includes: (1)
classical greedy procedures; (2) l1 regularization; (3) gradient descent with
sparsification and (4) meta-heuristic algorithms. Comprehensive simulation
studies are conducted to compare their variable selection accuracy and
prediction performance under different scenarios. Among the reviewed methods, a
meta-heuristic algorithm called sequential Monte Carlo algorithm performs the
best. Surprisingly the classical forward selection is comparable to it and
better than other more sophisticated algorithms. In addition, we apply these VS
methods on economic forecasting and compare with the popular FA approach. It
turns out for employment rate and CPI inflation, some VS methods can achieve
considerable improvement over FA, and the selected predictors can be well
explained by economic theories.
arXiv link: http://arxiv.org/abs/2007.10160v1
Permutation-based tests for discontinuities in event studies
underlying economic model at a known cutoff point. Relative to the existing
literature, we show that this test is well suited for event studies based on
time-series data. The test statistic measures the distance between the
empirical distribution functions of observed data in two local subsamples on
the two sides of the cutoff. Critical values are computed via a standard
permutation algorithm. Under a high-level condition that the observed data can
be coupled by a collection of conditionally independent variables, we establish
the asymptotic validity of the permutation test, allowing the sizes of the
local subsamples to be either be fixed or grow to infinity. In the latter case,
we also establish that the permutation test is consistent. We demonstrate that
our high-level condition can be verified in a broad range of problems in the
infill asymptotic time-series setting, which justifies using the permutation
test to detect jumps in economic variables such as volatility, trading
activity, and liquidity. These potential applications are illustrated in an
empirical case study for selected FOMC announcements during the ongoing
COVID-19 pandemic.
arXiv link: http://arxiv.org/abs/2007.09837v4
How Flexible is that Functional Form? Quantifying the Restrictiveness of Theories
they fit synthetic data from a pre-defined class. This measure, together with a
measure for how well the model fits real data, outlines a Pareto frontier,
where models that rule out more regularities, yet capture the regularities that
are present in real data, are preferred. To illustrate our approach, we
evaluate the restrictiveness of popular models in two laboratory settings --
certainty equivalents and initial play -- and in one field setting -- takeup of
microfinance in Indian villages. The restrictiveness measure reveals new
insights about each of the models, including that some economic models with
only a few parameters are very flexible.
arXiv link: http://arxiv.org/abs/2007.09213v4
Tractable Profit Maximization over Multiple Attributes under Discrete Choice Models
attributes of products, such that the total profit or revenue or market share
is maximized. Usually, these attributes can affect both a product's market
share (probability to be chosen) and its profit margin. For example, if a smart
phone has a better battery, then it is more costly to be produced, but is more
likely to be purchased by a customer. The decision maker then needs to choose
an optimal vector of attributes for each product that balances this trade-off.
In spite of the importance of such problems, there is not yet a method to solve
it efficiently in general. Past literature in revenue management and discrete
choice models focus on pricing problems, where price is the only attribute to
be chosen for each product. Existing approaches to solve pricing problems
tractably cannot be generalized to the optimization problem with multiple
product attributes as decision variables. On the other hand, papers studying
product line design with multiple attributes all result in intractable
optimization problems. Then we found a way to reformulate the static
multi-attribute optimization problem, as well as the multi-stage fluid
optimization problem with both resource constraints and upper and lower bounds
of attributes, as a tractable convex conic optimization problem. Our result
applies to optimization problems under the multinomial logit (MNL) model, the
Markov chain (MC) choice model, and with certain conditions, the nested logit
(NL) model.
arXiv link: http://arxiv.org/abs/2007.09193v3
Government spending and multi-category treatment effects:The modified conditional independence assumption
the short run with multi-category treatment effects and inverse probability
weighting based on the potential outcome framework. This study's main
contribution to the literature is the proposed modified conditional
independence assumption to improve the evaluation of fiscal policy. Using this
approach, I analyze the effects of government spending on the US economy from
1992 to 2019. The empirical study indicates that large fiscal contraction
generates a negative effect on the economic growth rate, and small and large
fiscal expansions realize a positive effect. However, these effects are not
significant in the traditional multiple regression approach. I conclude that
this new approach significantly improves the evaluation of fiscal policy.
arXiv link: http://arxiv.org/abs/2007.08396v3
Global Representation of the Conditional LATE Model: A Separability Result
model, making explicit the role of covariates in treatment selection. We find
that if the directions of the monotonicity condition are the same across all
values of the conditioning covariate, which is often assumed in the literature,
then the treatment choice equation has to satisfy a separability condition
between the instrument and the covariate. This global representation result
establishes testable restrictions imposed on the way covariates enter the
treatment choice equation. We later extend the representation theorem to
incorporate multiple ordered levels of treatment.
arXiv link: http://arxiv.org/abs/2007.08106v3
Least Squares Estimation Using Sketched Data with Heteroskedastic Errors
instead of the full sample of size $n$ for a variety of reasons. This paper
considers the case when the regression errors do not have constant variance and
heteroskedasticity robust standard errors would normally be needed for test
statistics to provide accurate inference. We show that estimates using data
sketched by random projections will behave `as if' the errors were
homoskedastic. Estimation by random sampling would not have this property. The
result arises because the sketched estimates in the case of random projections
can be expressed as degenerate $U$-statistics, and under certain conditions,
these statistics are asymptotically normal with homoskedastic variance. We
verify that the conditions hold not only in the case of least squares
regression when the covariates are exogenous, but also in instrumental
variables estimation when the covariates are endogenous. The result implies
that inference, including first-stage F tests for instrument relevance, can be
simpler than the full sample case if the sketching scheme is appropriately
chosen.
arXiv link: http://arxiv.org/abs/2007.07781v3
Understanding fluctuations through Multivariate Circulant Singular Spectrum Analysis
provide a comprehensive framework to analyze fluctuations, extracting the
underlying components of a set of time series, disentangling their sources of
variation and assessing their relative phase or cyclical position at each
frequency. Our novel method is non-parametric and can be applied to series out
of phase, highly nonlinear and modulated both in frequency and amplitude. We
prove a uniqueness theorem that in the case of common information and without
the need of fitting a factor model, allows us to identify common sources of
variation. This technique can be quite useful in several fields such as
climatology, biometrics, engineering or economics among others. We show the
performance of M-CiSSA through a synthetic example of latent signals modulated
both in amplitude and frequency and through the real data analysis of energy
prices to understand the main drivers and co-movements of primary energy
commodity prices at various frequencies that are key to assess energy policy at
different time horizons.
arXiv link: http://arxiv.org/abs/2007.07561v5
Persistence in Financial Connectedness and Systemic Risk
heterogeneous degrees of persistence. Using frequency domain techniques, we
introduce measures that identify smoothly varying links of a transitory and
persistent nature. Our approach allows us to test for statistical differences
in such dynamic links. We document substantial differences in transitory and
persistent linkages among US financial industry volatilities, argue that they
track heterogeneously persistent sources of systemic risk, and thus may serve
as a useful tool for market participants.
arXiv link: http://arxiv.org/abs/2007.07842v4
A More Robust t-Test
applying a t-test to a particular set of observations. If the number of
observations is not very large, then moderately heavy tails can lead to poor
behavior of the t-test. This is a particular problem under clustering, since
the number of observations then corresponds to the number of clusters, and
heterogeneity in cluster sizes induces a form of heavy tails. This paper
combines extreme value theory for the smallest and largest observations with a
normal approximation for the average of the remaining observations to construct
a more robust alternative to the t-test. The new test is found to control size
much more successfully in small samples compared to existing methods.
Analytical results in the canonical inference for the mean problem demonstrate
that the new test provides a refinement over the full sample t-test under more
than two but less than three moments, while the bootstrapped t-test does not.
arXiv link: http://arxiv.org/abs/2007.07065v1
An Adversarial Approach to Structural Estimation
for structural models. The estimator is formulated as the solution to a minimax
problem between a generator (which generates simulated observations using the
structural model) and a discriminator (which classifies whether an observation
is simulated). The discriminator maximizes the accuracy of its classification
while the generator minimizes it. We show that, with a sufficiently rich
discriminator, the adversarial estimator attains parametric efficiency under
correct specification and the parametric rate under misspecification. We
advocate the use of a neural network as a discriminator that can exploit
adaptivity properties and attain fast rates of convergence. We apply our method
to the elderly's saving decision model and show that our estimator uncovers the
bequest motive as an important source of saving across the wealth distribution,
not only for the rich.
arXiv link: http://arxiv.org/abs/2007.06169v3
A Semiparametric Network Formation Model with Unobserved Linear Heterogeneity
presence of unobserved agent-specific heterogeneity. The objective is to
identify and estimate the preference parameters associated with homophily on
observed attributes when the distributions of the unobserved factors are not
parametrically specified. This paper offers two main contributions to the
literature on network formation. First, it establishes a new point
identification result for the vector of parameters that relies on the existence
of a special repressor. The identification proof is constructive and
characterizes a closed-form for the parameter of interest. Second, it
introduces a simple two-step semiparametric estimator for the vector of
parameters with a first-step kernel estimator. The estimator is computationally
tractable and can be applied to both dense and sparse networks. Moreover, I
show that the estimator is consistent and has a limiting normal distribution as
the number of individuals in the network increases. Monte Carlo experiments
demonstrate that the estimator performs well in finite samples and in networks
with different levels of sparsity.
arXiv link: http://arxiv.org/abs/2007.05403v2
Intelligent Credit Limit Management in Consumer Loans Based on Causal Inference
growth, and credit cards are the most popular consumer loan. One of the most
essential parts in credit cards is the credit limit management. Traditionally,
credit limits are adjusted based on limited heuristic strategies, which are
developed by experienced professionals. In this paper, we present a data-driven
approach to manage the credit limit intelligently. Firstly, a conditional
independence testing is conducted to acquire the data for building models.
Based on these testing data, a response model is then built to measure the
heterogeneous treatment effect of increasing credit limits (i.e. treatments)
for different customers, who are depicted by several control variables (i.e.
features). In order to incorporate the diminishing marginal effect, a carefully
selected log transformation is introduced to the treatment variable. Moreover,
the model's capability can be further enhanced by applying a non-linear
transformation on features via GBDT encoding. Finally, a well-designed metric
is proposed to properly measure the performances of compared methods. The
experimental results demonstrate the effectiveness of the proposed approach.
arXiv link: http://arxiv.org/abs/2007.05188v1
Structural Gaussian mixture vector autoregressive model with application to the asymmetric effects of monetary policy shocks
shocks are identified by combining simultaneous diagonalization of the reduced
form error covariance matrices with constraints on the time-varying impact
matrix. This leads to flexible identification conditions, and some of the
constraints are also testable. The empirical application studies asymmetries in
the effects of the U.S. monetary policy shock and finds strong asymmetries with
respect to the sign and size of the shock and to the initial state of the
economy. The accompanying CRAN distributed R package gmvarkit provides a
comprehensive set of tools for numerical analysis.
arXiv link: http://arxiv.org/abs/2007.04713v7
Time Series Analysis of COVID-19 Infection Curve: A Change-Point Perspective
deaths of COVID-19 (in log scale) via a piecewise linear trend model. The model
naturally captures the phase transitions of the epidemic growth rate via
change-points and further enjoys great interpretability due to its
semiparametric nature. On the methodological front, we advance the nascent
self-normalization (SN) technique (Shao, 2010) to testing and estimation of a
single change-point in the linear trend of a nonstationary time series. We
further combine the SN-based change-point test with the NOT algorithm
(Baranowski et al., 2019) to achieve multiple change-point estimation. Using
the proposed method, we analyze the trajectory of the cumulative COVID-19 cases
and deaths for 30 major countries and discover interesting patterns with
potentially relevant implications for effectiveness of the pandemic responses
by different countries. Furthermore, based on the change-point detection
algorithm and a flexible extrapolation function, we design a simple two-stage
forecasting scheme for COVID-19 and demonstrate its promising performance in
predicting cumulative deaths in the U.S.
arXiv link: http://arxiv.org/abs/2007.04553v1
Efficient Covariate Balancing for the Local Average Treatment Effect
treatment effects under two-sided noncompliance using a binary conditionally
independent instrumental variable. The method weighs both treatment and outcome
information with inverse probabilities to produce exact finite sample balance
across instrument level groups. It is free of functional form assumptions on
the outcome or the treatment selection step. By tailoring the loss function for
the instrument propensity scores, the resulting treatment effect estimates
exhibit both low bias and a reduced variance in finite samples compared to
conventional inverse probability weighting methods. The estimator is
automatically weight normalized and has similar bias properties compared to
conventional two-stage least squares estimation under constant causal effects
for the compliers. We provide conditions for asymptotic normality and
semiparametric efficiency and demonstrate how to utilize additional information
about the treatment selection step for bias reduction in finite samples. The
method can be easily combined with regularization or other statistical learning
approaches to deal with a high-dimensional number of observed confounding
variables. Monte Carlo simulations suggest that the theoretical advantages
translate well to finite samples. The method is illustrated in an empirical
example.
arXiv link: http://arxiv.org/abs/2007.04346v1
Difference-in-Differences Estimators of Intertemporal Treatment Effects
non-binary, non-absorbing, and the outcome may be affected by treatment lags.
We make a parallel-trends assumption, and propose event-study estimators of the
effect of being exposed to a weakly higher treatment dose for $\ell$ periods.
We also propose normalized estimators, that estimate a weighted average of the
effects of the current treatment and its lags. We also analyze commonly-used
two-way-fixed-effects regressions. Unlike our estimators, they can be biased in
the presence of heterogeneous treatment effects. A local-projection version of
those regressions is biased even with homogeneous effects.
arXiv link: http://arxiv.org/abs/2007.04267v13
Talents from Abroad. Foreign Managers and Productivity in the United Kingdom
competitiveness. We use a novel dataset on the careers of 165,084 managers
employed by 13,106 companies in the United Kingdom in the period 2009-2017. We
find that domestic manufacturing firms become, on average, between 7% and 12%
more productive after hiring the first foreign managers, whereas foreign-owned
firms register no significant improvement. In particular, we test that previous
industry-specific experience is the primary driver of productivity gains in
domestic firms (15.6%), in a way that allows the latter to catch up with
foreign-owned firms. Managers from the European Union are highly valuable, as
they represent about half of the recruits in our data. Our identification
strategy combines matching techniques, difference-in-difference, and
pre-recruitment trends to challenge reverse causality. Results are robust to
placebo tests and to different estimators of Total Factor Productivity.
Eventually, we argue that upcoming limits to the mobility of foreign talents
after the Brexit event can hamper the allocation of productive managerial
resources.
arXiv link: http://arxiv.org/abs/2007.04055v1
Optimal Decision Rules for Weak GMM
for weakly identified GMM models. We derive the limit experiment for weakly
identified GMM, and propose a theoretically-motivated class of priors which
give rise to quasi-Bayes decision rules as a limiting case. Together with
results in the previous literature, this establishes desirable properties for
the quasi-Bayes approach regardless of model identification status, and we
recommend quasi-Bayes for settings where identification is a concern. We
further propose weighted average power-optimal identification-robust
frequentist tests and confidence sets, and prove a Bernstein-von Mises-type
result for the quasi-Bayes posterior under weak identification.
arXiv link: http://arxiv.org/abs/2007.04050v7
Max-sum tests for cross-sectional dependence of high-demensional panel data
high-dimensional panel data, where the number of cross-sectional units is
potentially much larger than the number of observations. The cross-sectional
dependence is described through a linear regression model. We study three tests
named the sum test, the max test and the max-sum test, where the latter two are
new. The sum test is initially proposed by Breusch and Pagan (1980). We design
the max and sum tests for sparse and non-sparse residuals in the linear
regressions, respectively.And the max-sum test is devised to compromise both
situations on the residuals. Indeed, our simulation shows that the max-sum test
outperforms the previous two tests. This makes the max-sum test very useful in
practice where sparsity or not for a set of data is usually vague. Towards the
theoretical analysis of the three tests, we have settled two conjectures
regarding the sum of squares of sample correlation coefficients asked by
Pesaran (2004 and 2008). In addition, we establish the asymptotic theory for
maxima of sample correlations coefficients appeared in the linear regression
model for panel data, which is also the first successful attempt to our
knowledge. To study the max-sum test, we create a novel method to show
asymptotic independence between maxima and sums of dependent random variables.
We expect the method itself is useful for other problems of this nature.
Finally, an extensive simulation study as well as a case study are carried out.
They demonstrate advantages of our proposed methods in terms of both empirical
powers and robustness for residuals regardless of sparsity or not.
arXiv link: http://arxiv.org/abs/2007.03911v1
A Dynamic Choice Model with Heterogeneous Decision Rules: Application in Estimating the User Cost of Rail Crowding
supply-side decisions of transit operators. The crowding cost perceived by a
transit rider is generally estimated by capturing the trade-off that the rider
makes between crowding and travel time while choosing a route. However,
existing studies rely on static compensatory choice models and fail to account
for inertia and the learning behaviour of riders. To address these challenges,
we propose a new dynamic latent class model (DLCM) which (i) assigns riders to
latent compensatory and inertia/habit classes based on different decision
rules, (ii) enables transitions between these classes over time, and (iii)
adopts instance-based learning theory to account for the learning behaviour of
riders. We use the expectation-maximisation algorithm to estimate DLCM, and the
most probable sequence of latent classes for each rider is retrieved using the
Viterbi algorithm. The proposed DLCM can be applied in any choice context to
capture the dynamics of decision rules used by a decision-maker. We demonstrate
its practical advantages in estimating the crowding valuation of an Asian
metro's riders. To calibrate the model, we recover the daily route preferences
and in-vehicle crowding experiences of regular metro riders using a
two-month-long smart card and vehicle location data. The results indicate that
the average rider follows the compensatory rule on only 25.5% of route choice
occasions. DLCM estimates also show an increase of 47% in metro riders'
valuation of travel time under extremely crowded conditions relative to that
under uncrowded conditions.
arXiv link: http://arxiv.org/abs/2007.03682v1
Semi-nonparametric Latent Class Choice Model with a Flexible Class Membership Component: A Mixture Model Approach
with a flexible class membership component. The proposed model formulates the
latent classes using mixture models as an alternative approach to the
traditional random utility specification with the aim of comparing the two
approaches on various measures including prediction accuracy and representation
of heterogeneity in the choice process. Mixture models are parametric
model-based clustering techniques that have been widely used in areas such as
machine learning, data mining and patter recognition for clustering and
classification problems. An Expectation-Maximization (EM) algorithm is derived
for the estimation of the proposed model. Using two different case studies on
travel mode choice behavior, the proposed model is compared to traditional
discrete choice models on the basis of parameter estimates' signs, value of
time, statistical goodness-of-fit measures, and cross-validation tests. Results
show that mixture models improve the overall performance of latent class choice
models by providing better out-of-sample prediction accuracy in addition to
better representations of heterogeneity without weakening the behavioral and
economic interpretability of the choice models.
arXiv link: http://arxiv.org/abs/2007.02739v1
Teacher-to-classroom assignment and student achievement
average student achievement in elementary and middle schools in the US. We use
the Measures of Effective Teaching (MET) experiment to semiparametrically
identify the average reallocation effects (AREs) of such assignments. Our
findings suggest that changes in within-district teacher assignments could have
appreciable effects on student achievement. Unlike policies which require
hiring additional teachers (e.g., class-size reduction measures), or those
aimed at changing the stock of teachers (e.g., VAM-guided teacher tenure
policies), alternative teacher-to-classroom assignments are resource neutral;
they raise student achievement through a more efficient deployment of existing
teachers.
arXiv link: http://arxiv.org/abs/2007.02653v2
Spectral Targeting Estimation of $λ$-GARCH models
combines (eigenvalue and -vector) targeting estimation with stepwise
(univariate) estimation. We denote this the spectral targeting estimator. This
two-step estimator is consistent under finite second order moments, while
asymptotic normality holds under finite fourth order moments. The estimator is
especially well suited for modelling larger portfolios: we compare the
empirical performance of the spectral targeting estimator to that of the quasi
maximum likelihood estimator for five portfolios of 25 assets. The spectral
targeting estimator dominates in terms of computational complexity, being up to
57 times faster in estimation, while both estimators produce similar
out-of-sample forecasts, indicating that the spectral targeting estimator is
well suited for high-dimensional empirical applications.
arXiv link: http://arxiv.org/abs/2007.02588v1
Forecasting with Bayesian Grouped Random Effects in Panel Data
generate the point, set, and density forecasts for short dynamic panel data. We
implement a nonparametric Bayesian approach to simultaneously identify
coefficients and group membership in the random effects which are heterogeneous
across groups but fixed within a group. This method allows us to flexibly
incorporate subjective prior knowledge on the group structure that potentially
improves the predictive accuracy. In Monte Carlo experiments, we demonstrate
that our Bayesian grouped random effects (BGRE) estimators produce accurate
estimates and score predictive gains over standard panel data estimators. With
a data-driven group structure, the BGRE estimators exhibit comparable accuracy
of clustering with the Kmeans algorithm and outperform a two-step Bayesian
grouped estimator whose group structure relies on Kmeans. In the empirical
analysis, we apply our method to forecast the investment rate across a broad
range of firms and illustrate that the estimated latent group structure
improves forecasts relative to standard panel data estimators.
arXiv link: http://arxiv.org/abs/2007.02435v8
Assessing External Validity Over Worst-case Subpopulations
time, and marginalized groups are underrepresented. To assess the external
validity of randomized and observational studies, we propose and evaluate the
worst-case treatment effect (WTE) across all subpopulations of a given size,
which guarantees positive findings remain valid over subpopulations. We develop
a semiparametrically efficient estimator for the WTE that analyzes the external
validity of the augmented inverse propensity weighted estimator for the average
treatment effect. Our cross-fitting procedure leverages flexible nonparametric
and machine learning-based estimates of nuisance parameters and is a regular
root-$n$ estimator even when nuisance estimates converge more slowly. On real
examples where external validity is of core concern, our proposed framework
guards against brittle findings that are invalidated by unanticipated
population shifts.
arXiv link: http://arxiv.org/abs/2007.02411v3
Off-Policy Exploitability-Evaluation in Two-Player Zero-Sum Markov Games
historical data obtained from a different policy. In the recent OPE context,
most studies have focused on single-player cases, and not on multi-player
cases. In this study, we propose OPE estimators constructed by the doubly
robust and double reinforcement learning estimators in two-player zero-sum
Markov games. The proposed estimators project exploitability that is often used
as a metric for determining how close a policy profile (i.e., a tuple of
policies) is to a Nash equilibrium in two-player zero-sum games. We prove the
exploitability estimation error bounds for the proposed estimators. We then
propose the methods to find the best candidate policy profile by selecting the
policy profile that minimizes the estimated exploitability from a given policy
profile class. We prove the regret bounds of the policy profiles selected by
our methods. Finally, we demonstrate the effectiveness and performance of the
proposed estimators through experiments.
arXiv link: http://arxiv.org/abs/2007.02141v2
Bridging the COVID-19 Data and the Epidemiological Model using Time Varying Parameter SIRD Model
for time varying parameters for real-time measurement of the stance of the
COVID-19 pandemic. Time variation in model parameters is captured using the
generalized autoregressive score modelling structure designed for the typically
daily count data related to pandemic. The resulting specification permits a
flexible yet parsimonious model structure with a very low computational cost.
This is especially crucial at the onset of the pandemic when the data is scarce
and the uncertainty is abundant. Full sample results show that countries
including US, Brazil and Russia are still not able to contain the pandemic with
the US having the worst performance. Furthermore, Iran and South Korea are
likely to experience the second wave of the pandemic. A real-time exercise show
that the proposed structure delivers timely and precise information on the
current stance of the pandemic ahead of the competitors that use rolling
window. This, in turn, transforms into accurate short-term predictions of the
active cases. We further modify the model to allow for unreported cases.
Results suggest that the effects of the presence of these cases on the
estimation results diminish towards the end of sample with the increasing
number of testing.
arXiv link: http://arxiv.org/abs/2007.02726v2
When are Google data useful to nowcast GDP? An approach via pre-selection and shrinkage
with machine learning--based tools. The latter are often applied without a
complete picture of their theoretical nowcasting properties. Against this
background, this paper proposes a theoretically grounded nowcasting methodology
that allows researchers to incorporate alternative Google Search Data (GSD)
among the predictors and that combines targeted preselection, Ridge
regularization, and Generalized Cross Validation. Breaking with most existing
literature, which focuses on asymptotic in-sample theoretical properties, we
establish the theoretical out-of-sample properties of our methodology and
support them by Monte-Carlo simulations. We apply our methodology to GSD to
nowcast GDP growth rate of several countries during various economic periods.
Our empirical findings support the idea that GSD tend to increase nowcasting
accuracy, even after controlling for official variables, but that the gain
differs between periods of recessions and of macroeconomic stability.
arXiv link: http://arxiv.org/abs/2007.00273v3
Regression Discontinuity Design with Multivalued Treatments
(RDD) with a multivalued treatment variable. We also allow for the inclusion of
covariates. We show that without additional information, treatment effects are
not identified. We give necessary and sufficient conditions that lead to
identification of LATEs as well as of weighted averages of the conditional
LATEs. We show that if the first stage discontinuities of the multiple
treatments conditional on covariates are linearly independent, then it is
possible to identify multivariate weighted averages of the treatment effects
with convenient identifiable weights. If, moreover, treatment effects do not
vary with some covariates or a flexible parametric structure can be assumed, it
is possible to identify (in fact, over-identify) all the treatment effects. The
over-identification can be used to test these assumptions. We propose a simple
estimator, which can be programmed in packaged software as a Two-Stage Least
Squares regression, and packaged standard errors and tests can also be used.
Finally, we implement our approach to identify the effects of different types
of insurance coverage on health care utilization, as in Card, Dobkin and
Maestas (2008).
arXiv link: http://arxiv.org/abs/2007.00185v1
Inference in Difference-in-Differences with Few Treated Units and Spatial Correlation
there are few treated units and errors are spatially correlated. We first show
that, when there is a single treated unit, some existing inference methods
designed for settings with few treated and many control units remain
asymptotically valid when errors are weakly dependent. However, these methods
may be invalid with more than one treated unit. We propose alternatives that
are asymptotically valid in this setting, even when the relevant distance
metric across units is unavailable.
arXiv link: http://arxiv.org/abs/2006.16997v7
Inference in Bayesian Additive Vector Autoregressive Tree Models
variables and their lags. This assumption might be overly restrictive and could
have a deleterious impact on forecasting accuracy. As a solution, we propose
combining VAR with Bayesian additive regression tree (BART) models. The
resulting Bayesian additive vector autoregressive tree (BAVART) model is
capable of capturing arbitrary non-linear relations between the endogenous
variables and the covariates without much input from the researcher. Since
controlling for heteroscedasticity is key for producing precise density
forecasts, our model allows for stochastic volatility in the errors. We apply
our model to two datasets. The first application shows that the BAVART model
yields highly competitive forecasts of the US term structure of interest rates.
In a second application, we estimate our model using a moderately sized
Eurozone dataset to investigate the dynamic effects of uncertainty on the
economy.
arXiv link: http://arxiv.org/abs/2006.16333v2
Estimation of Covid-19 Prevalence from Serology Tests: A Partial Identification Approach
from serology studies. Our data are results from antibody tests in some
population sample, where the test parameters, such as the true/false positive
rates, are unknown. Our method scans the entire parameter space, and rejects
parameter values using the joint data density as the test statistic. The
proposed method is conservative for marginal inference, in general, but its key
advantage over more standard approaches is that it is valid in finite samples
even when the underlying model is not point identified. Moreover, our method
requires only independence of serology test results, and does not rely on
asymptotic arguments, normality assumptions, or other approximations. We use
recent Covid-19 serology studies in the US, and show that the parameter
confidence set is generally wide, and cannot support definite conclusions.
Specifically, recent serology studies from California suggest a prevalence
anywhere in the range 0%-2% (at the time of study), and are therefore
inconclusive. However, this range could be narrowed down to 0.7%-1.5% if the
actual false positive rate of the antibody test was indeed near its empirical
estimate ( 0.5%). In another study from New York state, Covid-19 prevalence is
confidently estimated in the range 13%-17% in mid-April of 2020, which also
suggests significant geographic variation in Covid-19 exposure across the US.
Combining all datasets yields a 5%-8% prevalence range. Our results overall
suggest that serology testing on a massive scale can give crucial information
for future policy design, even when such tests are imperfect and their
parameters unknown.
arXiv link: http://arxiv.org/abs/2006.16214v1
Visualizing and comparing distributions with half-disk density strips
(HDDS), for visualizing and comparing probability density functions. The HDDS
exploits color shading for representing a distribution in an intuitive way. In
univariate settings, the half-disk density strip allows to immediately discern
the key characteristics of a density, such as symmetry, dispersion, and
multi-modality. In the multivariate settings, we define HDDS tables to
generalize the concept of contingency tables. It is an array of half-disk
density strips, which compactly displays the univariate marginal and
conditional densities of a variable of interest, together with the joint and
marginal densities of the conditioning variables. Moreover, HDDSs are by
construction well suited to easily compare pairs of densities. To highlight the
concrete benefits of the proposed methods, we show how to use HDDSs for
analyzing income distribution and life-satisfaction, conditionally on
continuous and categorical controls, from survey data. The code for
implementing HDDS methods is made available through a dedicated R package.
arXiv link: http://arxiv.org/abs/2006.16063v1
Treatment Effects in Interactive Fixed Effects Models with a Small Number of Time Periods
on the Treated (ATT) when untreated potential outcomes are generated by an
interactive fixed effects model. That is, in addition to time-period and
individual fixed effects, we consider the case where there is an unobserved
time invariant variable whose effect on untreated potential outcomes may change
over time and which can therefore cause outcomes (in the absence of
participating in the treatment) to follow different paths for the treated group
relative to the untreated group. The models that we consider in this paper
generalize many commonly used models in the treatment effects literature
including difference in differences and individual-specific linear trend
models. Unlike the majority of the literature on interactive fixed effects
models, we do not require the number of time periods to go to infinity to
consistently estimate the ATT. Our main identification result relies on having
the effect of some time invariant covariate (e.g., race or sex) not vary over
time. Using our approach, we show that the ATT can be identified with as few as
three time periods and with panel or repeated cross sections data.
arXiv link: http://arxiv.org/abs/2006.15780v3
Quantitative Statistical Robustness for Tail-Dependent Law Invariant Risk Measures
Carlo simulations via a tail-dependent law invariant risk measure such as the
Conditional Value-at-Risk (CVaR), it is important to ensure the robustness of
the statistical estimator particularly when the data contain noise. Kratscher
et al. [1] propose a new framework to examine the qualitative robustness of
estimators for tail-dependent law invariant risk measures on Orlicz spaces,
which is a step further from earlier work for studying the robustness of risk
measurement procedures by Cont et al. [2]. In this paper, we follow the stream
of research to propose a quantitative approach for verifying the statistical
robustness of tail-dependent law invariant risk measures. A distinct feature of
our approach is that we use the Fortet-Mourier metric to quantify the variation
of the true underlying probability measure in the analysis of the discrepancy
between the laws of the plug-in estimators of law invariant risk measure based
on the true data and perturbed data, which enables us to derive an explicit
error bound for the discrepancy when the risk functional is Lipschitz
continuous with respect to a class of admissible laws. Moreover, the newly
introduced notion of Lipschitz continuity allows us to examine the degree of
robustness for tail-dependent risk measures. Finally, we apply our quantitative
approach to some well-known risk measures to illustrate our theory.
arXiv link: http://arxiv.org/abs/2006.15491v1
Real-Time Real Economic Activity: Entering and Exiting the Pandemic Recession of 2020
real-activity signals provided by a leading nowcast, the ADS Index of Business
Conditions produced and released in real time by the Federal Reserve Bank of
Philadelphia. I track the evolution of real-time vintage beliefs and compare
them to a later-vintage chronology. Real-time ADS plunges and then swings as
its underlying economic indicators swing, but the ADS paths quickly converge to
indicate a return to brisk positive growth by mid-May. I show, moreover, that
the daily real activity path was highly correlated with the daily COVID-19
cases. Finally, I provide a comparative assessment of the real-time ADS signals
provided when exiting the Great Recession.
arXiv link: http://arxiv.org/abs/2006.15183v4
Endogenous Treatment Effect Estimation with some Invalid and Irrelevant Instruments
of the endogenous treatment effects. Conventional IV methods require all the
instruments are relevant and valid. However, this is impractical especially in
high-dimensional models when we consider a large set of candidate IVs. In this
paper, we propose an IV estimator robust to the existence of both the invalid
and irrelevant instruments (called R2IVE) for the estimation of endogenous
treatment effects. This paper extends the scope of Kang et al. (2016) by
considering a true high-dimensional IV model and a nonparametric reduced form
equation. It is shown that our procedure can select the relevant and valid
instruments consistently and the proposed R2IVE is root-n consistent and
asymptotically normal. Monte Carlo simulations demonstrate that the R2IVE
performs favorably compared to the existing high-dimensional IV estimators
(such as, NAIVE (Fan and Zhong, 2018) and sisVIVE (Kang et al., 2016)) when
invalid instruments exist. In the empirical study, we revisit the classic
question of trade and growth (Frankel and Romer, 1999).
arXiv link: http://arxiv.org/abs/2006.14998v1
Identification and Formal Privacy Guarantees
datasets. At the same time, increasing availability of public individual-level
data makes it possible for adversaries to potentially de-identify anonymized
records in sensitive research datasets. Most commonly accepted formal
definition of an individual non-disclosure guarantee is referred to as
differential privacy. It restricts the interaction of researchers with the data
by allowing them to issue queries to the data. The differential privacy
mechanism then replaces the actual outcome of the query with a randomised
outcome.
The impact of differential privacy on the identification of empirical
economic models and on the performance of estimators in nonlinear empirical
Econometric models has not been sufficiently studied. Since privacy protection
mechanisms are inherently finite-sample procedures, we define the notion of
identifiability of the parameter of interest under differential privacy as a
property of the limit of experiments. It is naturally characterized by the
concepts from the random sets theory.
We show that particular instances of regression discontinuity design may be
problematic for inference with differential privacy as parameters turn out to
be neither point nor partially identified. The set of differentially private
estimators converges weakly to a random set. Our analysis suggests that many
other estimators that rely on nuisance parameters may have similar properties
with the requirement of differential privacy. We show that identification
becomes possible if the target parameter can be deterministically located
within the random set. In that case, a full exploration of the random set of
the weak limits of differentially private estimators can allow the data curator
to select a sequence of instances of differentially private estimators
converging to the target parameter in probability.
arXiv link: http://arxiv.org/abs/2006.14732v2
Empirical MSE Minimization to Estimate a Scalar Parameter
available. The first is always consistent. The second is inconsistent in
general, but has a smaller asymptotic variance than the first, and may be
consistent if an assumption is satisfied. We propose to use the weighted sum of
the two estimators with the lowest estimated mean-squared error (MSE). We show
that this third estimator dominates the other two from a minimax-regret
perspective: the maximum asymptotic-MSE-gain one may incur by using this
estimator rather than one of the other estimators is larger than the maximum
asymptotic-MSE-loss.
arXiv link: http://arxiv.org/abs/2006.14667v1
Inference without smoothing for large panels with cross-sectional and temporal dependence
both cross-sectional and temporal dependence of unknown form. We are interested
in making inferences that do not rely on the choice of any smoothing parameter
as is the case with the often employed "HAC" estimator for the covariance
matrix. To that end, we propose a cluster estimator for the asymptotic
covariance of the estimators and valid bootstrap schemes that do not require
the selection of a bandwidth or smoothing parameter and accommodate the
nonparametric nature of both temporal and cross-sectional dependence. Our
approach is based on the observation that the spectral representation of the
fixed effect panel data model is such that the errors become approximately
temporally uncorrelated. Our proposed bootstrap schemes can be viewed as wild
bootstraps in the frequency domain. We present some Monte-Carlo simulations to
shed some light on the small sample performance of our inferential procedure.
arXiv link: http://arxiv.org/abs/2006.14409v1
Matching Multidimensional Types: Theory and Application
describe agents. For this framework, he establishes the conditions under which
positive sorting between agents' attributes is the unique market outcome.
Becker's celebrated sorting result has been applied to address many economic
questions. However, recent empirical studies in the fields of health,
household, and labor economics suggest that agents have multiple
outcome-relevant attributes. In this paper, I study a matching model with
multidimensional types. I offer multidimensional generalizations of concordance
and supermodularity to construct three multidimensional sorting patterns and
two classes of multidimensional complementarities. For each of these sorting
patterns, I identify the sufficient conditions which guarantee its optimality.
In practice, we observe sorting patterns between observed attributes that are
aggregated over unobserved characteristics. To reconcile theory with practice,
I establish the link between production complementarities and the aggregated
sorting patterns. Finally, I examine the relationship between agents' health
status and their spouses' education levels among U.S. households within the
framework for multidimensional matching markets. Preliminary analysis reveals a
weak positive association between agents' health status and their spouses'
education levels. This weak positive association is estimated to be a product
of three factors: (a) an attraction between better-educated individuals, (b) an
attraction between healthier individuals, and (c) a weak positive association
between agents' health status and their education levels. The attraction
channel suggests that the insurance risk associated with a two-person family
plan is higher than the aggregate risk associated with two individual policies.
arXiv link: http://arxiv.org/abs/2006.14243v1
Cointegration in large VARs
for the cases when both the number of coordinates, $N$, and the number of time
periods, $T$, are large and of the same order. We propose a way to examine a
VAR of order $1$ for the presence of cointegration based on a modification of
the Johansen likelihood ratio test. The advantage of our procedure over the
original Johansen test and its finite sample corrections is that our test does
not suffer from over-rejection. This is achieved through novel asymptotic
theorems for eigenvalues of matrices in the test statistic in the regime of
proportionally growing $N$ and $T$. Our theoretical findings are supported by
Monte Carlo simulations and an empirical illustration. Moreover, we find a
surprising connection with multivariate analysis of variance (MANOVA) and
explain why it emerges.
arXiv link: http://arxiv.org/abs/2006.14179v4
Robust and Efficient Approximate Bayesian Computation: A Minimum Distance Approach
hampered by two practical features: 1) the requirement to project the data down
to low-dimensional summary, including the choice of this projection, which
ultimately yields inefficient inference; 2) a possible lack of robustness to
deviations from the underlying model structure. Motivated by these efficiency
and robustness concerns, we construct a new Bayesian method that can deliver
efficient estimators when the underlying model is well-specified, and which is
simultaneously robust to certain forms of model misspecification. This new
approach bypasses the calculation of summaries by considering a norm between
empirical and simulated probability measures. For specific choices of the norm,
we demonstrate that this approach can deliver point estimators that are as
efficient as those obtained using exact Bayesian inference, while also
simultaneously displaying robustness to deviations from the underlying model
assumptions.
arXiv link: http://arxiv.org/abs/2006.14126v1
A Model of the Fed's View on Inflation
dynamics that is consistent with the view - often expressed by central banks -
that three components are important: a trend anchored by long-run expectations,
a Phillips curve and temporary fluctuations in energy prices. We find that a
stable long-term inflation trend and a well identified steep Phillips curve are
consistent with the data, but they imply potential output declining since the
new millennium and energy prices affecting headline inflation not only via the
Phillips curve but also via an independent expectational channel. A
high-frequency energy price cycle can be related to global factors affecting
the commodity market, and often overpowers the Phillips curve thereby
explaining the inflation puzzles of the last ten years.
arXiv link: http://arxiv.org/abs/2006.14110v1
Dynamic Effects of Persistent Shocks
are persistent. We show that the two leading methods to estimate impulse
responses to an independently identified shock (local projections and
distributed lag models) treat persistence differently, hence identifying
different objects. We propose corrections to re-establish the equivalence
between local projections and distributed lag models, providing applied
researchers with methods and guidance to estimate their desired object of
interest. We apply these methods to well-known empirical work and find that how
persistence is treated has a sizable impact on the estimates of dynamic
effects.
arXiv link: http://arxiv.org/abs/2006.14047v1
Asset Prices and Capital Share Risks: Theory and Evidence
been found to successfully explain U.S. stock returns. Our paper adopts a
recursive preference utility framework to derive an heterogeneous asset pricing
model with capital share risks.While modeling capital share risks, we account
for the elevated consumption volatility of high income stockholders. Capital
risks have strong volatility effects in our recursive asset pricing model.
Empirical evidence is presented in which capital share growth is also a source
of risk for stock return volatility. We uncover contrasting unconditional and
conditional asset pricing evidence for capital share risks.
arXiv link: http://arxiv.org/abs/2006.14023v1
Interdependence in active mobility adoption: Joint modelling and motivational spill-over in walking, cycling and bike-sharing
benefits. However, with the proliferation of the sharing economy, new
nonmotorized means of transport are entering the fold, complementing some
existing mobility options while competing with others. The purpose of this
research study is to investigate the adoption of three active travel modes;
namely walking, cycling and bikesharing, in a joint modeling framework. The
analysis is based on an adaptation of the stages of change framework, which
originates from the health behavior sciences. Multivariate ordered probit
modeling drawing on U.S. survey data provides well-needed insights into
individuals preparedness to adopt multiple active modes as a function of
personal, neighborhood and psychosocial factors. The research suggests three
important findings. 1) The joint model structure confirms interdependence among
different active mobility choices. The strongest complementarity is found for
walking and cycling adoption. 2) Each mode has a distinctive adoption path with
either three or four separate stages. We discuss the implications of derived
stage-thresholds and plot adoption contours for selected scenarios. 3)
Psychological and neighborhood variables generate more coupling among active
modes than individual and household factors. Specifically, identifying strongly
with active mobility aspirations, experiences with multimodal travel,
possessing better navigational skills, along with supportive local community
norms are the factors that appear to drive the joint adoption decisions. This
study contributes to the understanding of how decisions within the same
functional domain are related and help to design policies that promote active
mobility by identifying positive spillovers and joint determinants.
arXiv link: http://arxiv.org/abs/2006.16920v2
Unified Principal Component Analysis for Sparse and Dense Functional Data under Spatial Dependency
geostatistics setting, where locations are sampled from a spatial point
process. The functional response is the sum of a spatially dependent functional
effect and a spatially independent functional nugget effect. Observations on
each function are made on discrete time points and contaminated with
measurement errors. Under the assumption of spatial stationarity and isotropy,
we propose a tensor product spline estimator for the spatio-temporal covariance
function. When a coregionalization covariance structure is further assumed, we
propose a new functional principal component analysis method that borrows
information from neighboring functions. The proposed method also generates
nonparametric estimators for the spatial covariance functions, which can be
used for functional kriging. Under a unified framework for sparse and dense
functional data, infill and increasing domain asymptotic paradigms, we develop
the asymptotic convergence rates for the proposed estimators. Advantages of the
proposed approach are demonstrated through simulation studies and two real data
applications representing sparse and dense functional data, respectively.
arXiv link: http://arxiv.org/abs/2006.13489v2
Design and Evaluation of Personalized Free Trials
product for free, are a commonly used customer acquisition strategy in the
Software as a Service (SaaS) industry. We examine how trial length affect
users' responsiveness, and seek to quantify the gains from personalizing the
length of the free trial promotions. Our data come from a large-scale field
experiment conducted by a leading SaaS firm, where new users were randomly
assigned to 7, 14, or 30 days of free trial. First, we show that the 7-day
trial to all consumers is the best uniform policy, with a 5.59% increase in
subscriptions. Next, we develop a three-pronged framework for personalized
policy design and evaluation. Using our framework, we develop seven
personalized targeting policies based on linear regression, lasso, CART, random
forest, XGBoost, causal tree, and causal forest, and evaluate their
performances using the Inverse Propensity Score (IPS) estimator. We find that
the personalized policy based on lasso performs the best, followed by the one
based on XGBoost. In contrast, policies based on causal tree and causal forest
perform poorly. We then link a method's effectiveness in designing policy with
its ability to personalize the treatment sufficiently without over-fitting
(i.e., capture spurious heterogeneity). Next, we segment consumers based on
their optimal trial length and derive some substantive insights on the drivers
of user behavior in this context. Finally, we show that policies designed to
maximize short-run conversions also perform well on long-run outcomes such as
consumer loyalty and profitability.
arXiv link: http://arxiv.org/abs/2006.13420v1
Bootstrapping $\ell_p$-Statistics in High Dimensions
of high-dimensional $\ell_p$-statistics, i.e. the $\ell_p$-norms of the sum of
$n$ independent $d$-dimensional random vectors with $d \gg n$ and $p \in [1,
\infty]$. We provide a non-asymptotic characterization of the sampling
distribution of $\ell_p$-statistics based on Gaussian approximation and show
that the bootstrap procedure is consistent in the Kolmogorov-Smirnov distance
under mild conditions on the covariance structure of the data. As an
application of the general theory we propose a bootstrap hypothesis test for
simultaneous inference on high-dimensional mean vectors. We establish its
asymptotic correctness and consistency under high-dimensional alternatives, and
discuss the power of the test as well as the size of associated confidence
sets. We illustrate the bootstrap and testing procedure numerically on
simulated data.
arXiv link: http://arxiv.org/abs/2006.13099v3
The Macroeconomy as a Random Forest
canonical Machine Learning (ML) tool to flexibly model evolving parameters in a
linear macro equation. Its main output, Generalized Time-Varying Parameters
(GTVPs), is a versatile device nesting many popular nonlinearities
(threshold/switching, smooth transition, structural breaks/change) and allowing
for sophisticated new ones. The approach delivers clear forecasting gains over
numerous alternatives, predicts the 2008 drastic rise in unemployment, and
performs well for inflation. Unlike most ML-based methods, MRF is directly
interpretable -- via its GTVPs. For instance, the successful unemployment
forecast is due to the influence of forward-looking variables (e.g., term
spreads, housing starts) nearly doubling before every recession. Interestingly,
the Phillips curve has indeed flattened, and its might is highly cyclical.
arXiv link: http://arxiv.org/abs/2006.12724v3
Locally trimmed least squares: conventional inference in possibly nonstationary models
developed which yields estimators with (mixed) Gaussian limit distributions in
situations where the data may be weakly or strongly persistent. In particular,
we allow for nonlinear predictive type of regressions where the regressor can
be stationary short/long memory as well as nonstationary long memory process or
a nearly integrated array. The resultant t-tests have conventional limit
distributions (i.e. N(0; 1)) free of (near to unity and long memory) nuisance
parameters. In the case where the regressor is a fractional process, no
preliminary estimator for the memory parameter is required. Therefore, the
practitioner can conduct inference while being agnostic about the exact
dependence structure in the data. The LTLS estimator is obtained by applying
certain chronological trimming to the OLS instrument via the utilisation of
appropriate kernel functions of time trend variables. The finite sample
performance of LTLS based t-tests is investigated with the aid of a simulation
experiment. An empirical application to the predictability of stock returns is
also provided.
arXiv link: http://arxiv.org/abs/2006.12595v1
A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics
particular, we apply a recently introduced aggregation scheme for false
discovery rate (FDR) control to German administrative data to determine the
parts of the individual employment histories that are relevant for the career
outcomes of women. Our results suggest that career outcomes can be predicted
based on a small set of variables, such as daily earnings, wage increases in
combination with a high level of education, employment status, and working
experience.
arXiv link: http://arxiv.org/abs/2006.12296v2
Vocational Training Programs and Youth Labor Market Outcomes: Evidence from Nepal
levels of unemployment and poverty. In response, policymakers often initiate
vocational training programs in effort to enhance skill formation among the
youth. Using a regression-discontinuity design, we examine a large youth
training intervention in Nepal. We find, twelve months after the start of the
training program, that the intervention generated an increase in non-farm
employment of 10 percentage points (ITT estimates) and up to 31 percentage
points for program compliers (LATE estimates). We also detect sizeable gains in
monthly earnings. Women who start self-employment activities inside their homes
largely drive these impacts. We argue that low baseline educational levels and
non-farm employment levels and Nepal's social and cultural norms towards women
drive our large program impacts. Our results suggest that the program enables
otherwise underemployed women to earn an income while staying at home - close
to household errands and in line with the socio-cultural norms that prevent
them from taking up employment outside the house.
arXiv link: http://arxiv.org/abs/2006.13036v1
Unified Discrete-Time Factor Stochastic Volatility and Continuous-Time Ito Models for Combining Inference Based on Low-Frequency and High-Frequency
process, which can accommodate both continuous-time Ito diffusion and
discrete-time stochastic volatility (SV) models by embedding the discrete SV
model in the continuous instantaneous factor volatility process. We call it the
SV-Ito model. Based on the series of daily integrated factor volatility matrix
estimators, we propose quasi-maximum likelihood and least squares estimation
methods. Their asymptotic properties are established. We apply the proposed
method to predict future vast volatility matrix whose asymptotic behaviors are
studied. A simulation study is conducted to check the finite sample performance
of the proposed estimation and prediction method. An empirical analysis is
carried out to demonstrate the advantage of the SV-Ito model in volatility
prediction and portfolio allocation problems.
arXiv link: http://arxiv.org/abs/2006.12039v1
Mitigating Bias in Online Microfinance Platforms: A Case Study on Kiva.org
disintermediation has occurred on a global scale. Traditionally, even for small
supply of funds, banks would act as the conduit between the funds and the
borrowers. It has now been possible to overcome some of the obstacles
associated with such supply of funds with the advent of online platforms like
Kiva, Prosper, LendingClub. Kiva for example, works with Micro Finance
Institutions (MFIs) in developing countries to build Internet profiles of
borrowers with a brief biography, loan requested, loan term, and purpose. Kiva,
in particular, allows lenders to fund projects in different sectors through
group or individual funding. Traditional research studies have investigated
various factors behind lender preferences purely from the perspective of loan
attributes and only until recently have some cross-country cultural preferences
been investigated. In this paper, we investigate lender perceptions of economic
factors of the borrower countries in relation to their preferences towards
loans associated with different sectors. We find that the influence from
economic factors and loan attributes can have substantially different roles to
play for different sectors in achieving faster funding. We formally investigate
and quantify the hidden biases prevalent in different loan sectors using recent
tools from causal inference and regression models that rely on Bayesian
variable selection methods. We then extend these models to incorporate fairness
constraints based on our empirical analysis and find that such models can still
achieve near comparable results with respect to baseline regression models.
arXiv link: http://arxiv.org/abs/2006.12995v1
Valid Causal Inference with (Some) Invalid Instruments
causal effects in the presence of unobserved confounding. But a key challenge
when applying them is the reliance on untestable "exclusion" assumptions that
rule out any relationship between the instrument variable and the response that
is not mediated by the treatment. In this paper, we show how to perform
consistent IV estimation despite violations of the exclusion assumption. In
particular, we show that when one has multiple candidate instruments, only a
majority of these candidates---or, more generally, the modal candidate-response
relationship---needs to be valid to estimate the causal effect. Our approach
uses an estimate of the modal prediction from an ensemble of instrumental
variable estimators. The technique is simple to apply and is "black-box" in the
sense that it may be used with any instrumental variable estimator as long as
the treatment effect is identified for each valid instrument independently. As
such, it is compatible with recent machine-learning based estimators that allow
for the estimation of conditional average treatment effects (CATE) on complex,
high dimensional data. Experimentally, we achieve accurate estimates of
conditional average treatment effects using an ensemble of deep network-based
estimators, including on a challenging simulated Mendelian Randomization
problem.
arXiv link: http://arxiv.org/abs/2006.11386v1
Do Methodological Birds of a Feather Flock Together?
researchers develop causal inference tools for settings in which randomization
is infeasible. Two popular such methods, difference-in-differences (DID) and
comparative interrupted time series (CITS), compare observations before and
after an intervention in a treated group to an untreated comparison group
observed over the same period. Both methods rely on strong, untestable
counterfactual assumptions. Despite their similarities, the methodological
literature on CITS lacks the mathematical formality of DID. In this paper, we
use the potential outcomes framework to formalize two versions of CITS - a
general version described by Bloom (2005) and a linear version often used in
health services research. We then compare these to two corresponding DID
formulations - one with time fixed effects and one with time fixed effects and
group trends. We also re-analyze three previously published studies using these
methods. We demonstrate that the most general versions of CITS and DID impute
the same counterfactuals and estimate the same treatment effects. The only
difference between these two designs is the language used to describe them and
their popularity in distinct disciplines. We also show that these designs
diverge when one constrains them using linearity (CITS) or parallel trends
(DID). We recommend defaulting to the more flexible versions and provide advice
to practitioners on choosing between the more constrained versions by
considering the data-generating mechanism. We also recommend greater attention
to specifying the outcome model and counterfactuals in papers, allowing for
transparent evaluation of the plausibility of causal assumptions.
arXiv link: http://arxiv.org/abs/2006.11346v2
Proper scoring rules for evaluating asymmetry in density forecasting
for evaluating and comparing density forecasts. It extends the proposed score
and defines a weighted version, which emphasizes regions of interest, such as
the tails or the center of a variable's range. A test is also introduced to
statistically compare the predictive ability of different forecasts. The ACPS
is of general use in any situation where the decision maker has asymmetric
preferences in the evaluation of the forecasts. In an artificial experiment,
the implications of varying the level of asymmetry in the ACPS are illustrated.
Then, the proposed score and test are applied to assess and compare density
forecasts of macroeconomic relevant datasets (US employment growth) and of
commodity prices (oil and electricity prices) with particular focus on the
recent COVID-19 crisis period.
arXiv link: http://arxiv.org/abs/2006.11265v2
Sparse Quantile Regression
regression estimators. For the $\ell _{0}$-penalized estimator, we derive an
exponential inequality on the tail probability of excess quantile prediction
risk and apply it to obtain non-asymptotic upper bounds on the mean-square
parameter and regression function estimation errors. We also derive analogous
results for the $\ell _{0}$-constrained estimator. The resulting rates of
convergence are nearly minimax-optimal and the same as those for $\ell
_{1}$-penalized and non-convex penalized estimators. Further, we characterize
expected Hamming loss for the $\ell _{0}$-penalized estimator. We implement the
proposed procedure via mixed integer linear programming and also a more
scalable first-order approximation algorithm. We illustrate the finite-sample
performance of our approach in Monte Carlo experiments and its usefulness in a
real data application concerning conformal prediction of infant birth weights
(with $n\approx 10^{3}$ and up to $p>10^{3}$). In sum, our $\ell _{0}$-based
method produces a much sparser estimator than the $\ell _{1}$-penalized and
non-convex penalized approaches without compromising precision.
arXiv link: http://arxiv.org/abs/2006.11201v4
On the Time Trend of COVID-19: A Panel Data Study
level, and draw attention to some existing econometric tools which are
potentially helpful to understand the trend better in future studies. In our
empirical study, we find that European countries overall flatten the curves
more effectively compared to the other regions, while Asia & Oceania also
achieve some success, but the situations are not as optimistic elsewhere.
Africa and America are still facing serious challenges in terms of managing the
spread of the virus, and reducing the death rate, although in Africa the virus
spreads slower and has a lower death rate than the other regions. By comparing
the performances of different countries, our results incidentally agree with Gu
et al. (2020), though different approaches and models are considered. For
example, both works agree that countries such as USA, UK and Italy perform
relatively poorly; on the other hand, Australia, China, Japan, Korea, and
Singapore perform relatively better.
arXiv link: http://arxiv.org/abs/2006.11060v2
COVID-19 response needs to broaden financial inclusion to curb the rise in poverty
reducing global poverty. In this paper, we explore to what extent financial
inclusion could help mitigate the increase in poverty using cross-country data
across 78 low- and lower-middle-income countries. Unlike other recent
cross-country studies, we show that financial inclusion is a key driver of
poverty reduction in these countries. This effect is not direct, but indirect,
by mitigating the detrimental effect that inequality has on poverty. Our
findings are consistent across all the different measures of poverty used. Our
forecasts suggest that the world's population living on less than $1.90 per day
could increase from 8% to 14% by 2021, pushing nearly 400 million people into
poverty. However, urgent improvements in financial inclusion could
substantially reduce the impact on poverty.
arXiv link: http://arxiv.org/abs/2006.10706v1
Conflict in Africa during COVID-19: social distancing, food vulnerability and welfare response
labour COVID-19 policy responses on riots, violence against civilians and
food-related conflicts. Our analysis uses georeferenced data for 24 African
countries with monthly local prices and real-time conflict data reported in the
Armed Conflict Location and Event Data Project (ACLED) from January 2015 until
early May 2020. Lockdowns and recent welfare policies have been implemented in
light of COVID-19, but in some contexts also likely in response to ongoing
conflicts. To mitigate the potential risk of endogeneity, we use instrumental
variables. We exploit the exogeneity of global commodity prices, and three
variables that increase the risk of COVID-19 and efficiency in response such as
countries colonial heritage, male mortality rate attributed to air pollution
and prevalence of diabetes in adults. We find that the probability of
experiencing riots, violence against civilians, food-related conflicts and food
looting has increased since lockdowns. Food vulnerability has been a
contributing factor. A 10% increase in the local price index is associated with
an increase of 0.7 percentage points in violence against civilians.
Nonetheless, for every additional anti-poverty measure implemented in response
to COVID-19 the probability of experiencing violence against civilians, riots
and food-related conflicts declines by approximately 0.2 percentage points.
These anti-poverty measures also reduce the number of fatalities associated
with these conflicts. Overall, our findings reveal that food vulnerability has
increased conflict risks, but also offer an optimistic view of the importance
of the state in providing an extensive welfare safety net.
arXiv link: http://arxiv.org/abs/2006.10696v1
Sparse HP Filter: Finding Kinks in the COVID-19 Contact Rate
Susceptible-Infected-Recovered (SIR) model. Our measurement of the contact rate
is constructed using data on actively infected, recovered and deceased cases.
We propose a new trend filtering method that is a variant of the
Hodrick-Prescott (HP) filter, constrained by the number of possible kinks. We
term it the $sparse HP filter$ and apply it to daily data from five
countries: Canada, China, South Korea, the UK and the US. Our new method yields
the kinks that are well aligned with actual events in each country. We find
that the sparse HP filter provides a fewer kinks than the $\ell_1$ trend
filter, while both methods fitting data equally well. Theoretically, we
establish risk consistency of both the sparse HP and $\ell_1$ trend filters.
Ultimately, we propose to use time-varying $contact growth rates$ to
document and monitor outbreaks of COVID-19.
arXiv link: http://arxiv.org/abs/2006.10555v2
Approximate Maximum Likelihood for Complex Structural Models
parametric models whose likelihood function is intractable, however, the
statistical efficiency of I-I estimation is questionable. While the efficient
method of moments, Gallant and Tauchen (1996), promises efficiency, the price
to pay for this efficiency is a loss of parsimony and thereby a potential lack
of robustness to model misspecification. This stands in contrast to simpler I-I
estimation strategies, which are known to display less sensitivity to model
misspecification precisely due to their focus on specific elements of the
underlying structural model. In this research, we propose a new
simulation-based approach that maintains the parsimony of I-I estimation, which
is often critical in empirical applications, but can also deliver estimators
that are nearly as efficient as maximum likelihood. This new approach is based
on using a constrained approximation to the structural model, which ensures
identification and can deliver estimators that are nearly efficient. We
demonstrate this approach through several examples, and show that this approach
can deliver estimators that are nearly as efficient as maximum likelihood, when
feasible, but can be employed in many situations where maximum likelihood is
infeasible.
arXiv link: http://arxiv.org/abs/2006.10245v1
Flexible Mixture Priors for Large Time-varying Parameter Models
according to a random walk. This assumption, however, might be questionable
since it implies that coefficients change smoothly and in an unbounded manner.
In this paper, we relax this assumption by proposing a flexible law of motion
for the TVPs in large-scale vector autoregressions (VARs). Instead of imposing
a restrictive random walk evolution of the latent states, we carefully design
hierarchical mixture priors on the coefficients in the state equation. These
priors effectively allow for discriminating between periods where coefficients
evolve according to a random walk and times where the TVPs are better
characterized by a stationary stochastic process. Moreover, this approach is
capable of introducing dynamic sparsity by pushing small parameter changes
towards zero if necessary. The merits of the model are illustrated by means of
two applications. Using synthetic data we show that our approach yields precise
parameter estimates. When applied to US data, the model reveals interesting
patterns of low-frequency dynamics in coefficients and forecasts well relative
to a wide range of competing models.
arXiv link: http://arxiv.org/abs/2006.10088v2
Using Experiments to Correct for Selection in Observational Studies
observational datasets where treatment (e.g., class size) is not randomized but
several primary outcomes (e.g., graduation rates) and secondary outcomes (e.g.,
test scores) are observed and (ii) experimental data in which treatment is
randomized but only secondary outcomes are observed. We develop a new method to
estimate treatment effects on primary outcomes in such settings. We use the
difference between the secondary outcome and its predicted value based on the
experimental treatment effect to measure selection bias in the observational
data. Controlling for this estimate of selection bias yields an unbiased
estimate of the treatment effect on the primary outcome under a new assumption
that we term latent unconfoundedness, which requires that the same confounders
affect the primary and secondary outcomes. Latent unconfoundedness weakens the
assumptions underlying commonly used surrogate estimators. We apply our
estimator to identify the effect of third grade class size on students
outcomes. Estimated impacts on test scores using OLS regressions in
observational school district data have the opposite sign of estimates from the
Tennessee STAR experiment. In contrast, selection-corrected estimates in the
observational data replicate the experimental estimates. Our estimator reveals
that reducing class sizes by 25% increases high school graduation rates by 0.7
percentage points. Controlling for observables does not change the OLS
estimates, demonstrating that experimental selection correction can remove
biases that cannot be addressed with standard controls.
arXiv link: http://arxiv.org/abs/2006.09676v2
Adaptive, Rate-Optimal Hypothesis Testing in Nonparametric IV Models
convexity) and equality (e.g., parametric, semiparametric) restrictions on a
structural function in a nonparametric instrumental variables (NPIV) model. Our
test statistic is based on a modified leave-one-out sample analog of a
quadratic distance between the restricted and unrestricted sieve two-stage
least squares estimators. We provide computationally simple, data-driven
choices of sieve tuning parameters and Bonferroni adjusted chi-squared critical
values. Our test adapts to the unknown smoothness of alternative functions in
the presence of unknown degree of endogeneity and unknown strength of the
instruments. It attains the adaptive minimax rate of testing in $L^{2}$. That
is, the sum of the supremum of type I error over the composite null and the
supremum of type II error over nonparametric alternative models cannot be
minimized by any other tests for NPIV models of unknown regularities.
Confidence sets in $L^{2}$ are obtained by inverting the adaptive test.
Simulations confirm that, across different strength of instruments and sample
sizes, our adaptive test controls size and its finite-sample power greatly
exceeds existing non-adaptive tests for monotonicity and parametric
restrictions in NPIV models. Empirical applications to test for shape
restrictions of differentiated products demand and of Engel curves are
presented.
arXiv link: http://arxiv.org/abs/2006.09587v6
Measuring Macroeconomic Uncertainty: The Labor Channel of Uncertainty from a Cross-Country Perspective
uncertainty. Our econometric framework extracts uncertainty from revisions in
data obtained from standardized national accounts. Applying our model to
post-WWII real-time data, we estimate macroeconomic uncertainty for 39
countries. The cross-country dimension of our uncertainty data allows us to
study the impact of uncertainty shocks under different employment protection
legislation. Our empirical findings suggest that the effects of uncertainty
shocks are stronger and more persistent in countries with low employment
protection compared to countries with high employment protection. These
empirical findings are in line with a theoretical model under varying firing
cost.
arXiv link: http://arxiv.org/abs/2006.09007v2
Nonparametric Tests of Tail Behavior in Stochastic Frontier Models
frontier model, where one component has bounded support on one side, and the
other has unbounded support on both sides. Under weak assumptions on the error
components, we derive nonparametric tests that the unbounded component
distribution has thin tails and that the component tails are equivalent. The
tests are useful diagnostic tools for stochastic frontier analysis. A
simulation study and an application to a stochastic cost frontier for 6,100 US
banks from 1998 to 2005 are provided. The new tests reject the normal or
Laplace distributional assumptions, which are commonly imposed in the existing
literature.
arXiv link: http://arxiv.org/abs/2006.07780v1
Synthetic Interventions
evaluation in panel data applications. Researchers commonly justify the SC
framework with a low-rank matrix factor model that assumes the potential
outcomes are described by low-dimensional unit and time specific latent
factors. In the recent work of [Abadie '20], one of the pioneering authors of
the SC method posed the question of how the SC framework can be extended to
multiple treatments. This article offers one resolution to this open question
that we call synthetic interventions (SI). Fundamental to the SI framework is a
low-rank tensor factor model, which extends the matrix factor model by
including a latent factorization over treatments. Under this model, we propose
a generalization of the standard SC-based estimators. We prove the consistency
for one instantiation of our approach and provide conditions under which it is
asymptotically normal. Moreover, we conduct a representative simulation to
study its prediction performance and revisit the canonical SC case study of
[Abadie-Diamond-Hainmueller '10] on the impact of anti-tobacco legislations by
exploring related questions not previously investigated.
arXiv link: http://arxiv.org/abs/2006.07691v7
Horseshoe Prior Bayesian Quantile Regression
quantile regression (HS-BQR) and provides a fast sampling algorithm for
computation in high dimensions. The performance of the proposed HS-BQR is
evaluated on Monte Carlo simulations and a high dimensional Growth-at-Risk
(GaR) forecasting application for the U.S. The Monte Carlo design considers
several sparsity and error structures. Compared to alternative shrinkage
priors, the proposed HS-BQR yields better (or at worst similar) performance in
coefficient bias and forecast error. The HS-BQR is particularly potent in
sparse designs and in estimating extreme quantiles. As expected, the
simulations also highlight that identifying quantile specific location and
scale effects for individual regressors in dense DGPs requires substantial
data. In the GaR application, we forecast tail risks as well as complete
forecast densities using the McCracken and Ng (2020) database. Quantile
specific and density calibration score functions show that the HS-BQR provides
the best performance, especially at short and medium run horizons. The ability
to produce well calibrated density forecasts and accurate downside risk
measures in large data contexts makes the HS-BQR a promising tool for
nowcasting applications and recession modelling.
arXiv link: http://arxiv.org/abs/2006.07655v2
Detangling robustness in high dimensions: composite versus model-averaged estimation
in the context of regularized estimation and high dimensions. Even simple
questions become challenging very quickly. For example, classical statistical
theory identifies equivalence between model-averaged and composite quantile
estimation. However, little to nothing is known about such equivalence between
methods that encourage sparsity. This paper provides a toolbox to further study
robustness in these settings and focuses on prediction. In particular, we study
optimally weighted model-averaged as well as composite $l_1$-regularized
estimation. Optimal weights are determined by minimizing the asymptotic mean
squared error. This approach incorporates the effects of regularization,
without the assumption of perfect selection, as is often used in practice. Such
weights are then optimal for prediction quality. Through an extensive
simulation study, we show that no single method systematically outperforms
others. We find, however, that model-averaged and composite quantile estimators
often outperform least-squares methods, even in the case of Gaussian model
noise. Real data application witnesses the method's practical use through the
reconstruction of compressed audio signals.
arXiv link: http://arxiv.org/abs/2006.07457v1
Minimax Estimation of Conditional Moment Models
restrictions, with a prototypical application being non-parametric instrumental
variable regression. We introduce a min-max criterion function, under which the
estimation problem can be thought of as solving a zero-sum game between a
modeler who is optimizing over the hypothesis space of the target model and an
adversary who identifies violating moments over a test function space. We
analyze the statistical estimation rate of the resulting estimator for
arbitrary hypothesis spaces, with respect to an appropriate analogue of the
mean squared error metric, for ill-posed inverse problems. We show that when
the minimax criterion is regularized with a second moment penalty on the test
function and the test function space is sufficiently rich, then the estimation
rate scales with the critical radius of the hypothesis and test function
spaces, a quantity which typically gives tight fast rates. Our main result
follows from a novel localized Rademacher analysis of statistical learning
problems defined via minimax objectives. We provide applications of our main
results for several hypothesis spaces used in practice such as: reproducing
kernel Hilbert spaces, high dimensional sparse linear functions, spaces defined
via shape constraints, ensemble estimators such as random forests, and neural
networks. For each of these applications we provide computationally efficient
optimization methods for solving the corresponding minimax problem (e.g.
stochastic first-order heuristics for neural networks). In several
applications, we show how our modified mean squared error rate, combined with
conditions that bound the ill-posedness of the inverse problem, lead to mean
squared error rates. We conclude with an extensive experimental analysis of the
proposed methods.
arXiv link: http://arxiv.org/abs/2006.07201v1
Seemingly Unrelated Regression with Measurement Error: Estimation via Markov chain Monte Carlo and Mean Field Variational Bayes Approximation
studied topic, however, the statistics/econometrics literature is almost silent
to estimating a multi-equation model with measurement error. This paper
considers a seemingly unrelated regression model with measurement error in the
covariates and introduces two novel estimation methods: a pure Bayesian
algorithm (based on Markov chain Monte Carlo techniques) and its mean field
variational Bayes (MFVB) approximation. The MFVB method has the added advantage
of being computationally fast and can handle big data. An issue pertinent to
measurement error models is parameter identification, and this is resolved by
employing a prior distribution on the measurement error variance. The methods
are shown to perform well in multiple simulation studies, where we analyze the
impact on posterior estimates arising due to different values of reliability
ratio or variance of the true unobserved quantity used in the data generating
process. The paper further implements the proposed algorithms in an application
drawn from the health literature and shows that modeling measurement error in
the data can improve model fitting.
arXiv link: http://arxiv.org/abs/2006.07074v1
Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales
dependent samples obtained via the bandit algorithm. The goal of OPE is to
evaluate a new policy using historical data obtained from behavior policies
generated by the bandit algorithm. Because the bandit algorithm updates the
policy based on past observations, the samples are not independent and
identically distributed (i.i.d.). However, several existing methods for OPE do
not take this issue into account and are based on the assumption that samples
are i.i.d. In this study, we address this problem by constructing an estimator
from a standardized martingale difference sequence. To standardize the
sequence, we consider using evaluation data or sample splitting with a two-step
estimation. This technique produces an estimator with asymptotic normality
without restricting a class of behavior policies. In an experiment, the
proposed estimator performs better than existing methods, which assume that the
behavior policy converges to a time-invariant policy.
arXiv link: http://arxiv.org/abs/2006.06982v1
Confidence sets for dynamic poverty indexes
the dynamic Headcount ratio, the dynamic income-gap ratio, the dynamic Gini and
the dynamic Sen, proposed in D'Amico and Regnault (2018). The contribution is
twofold. First, we extend the computation of the dynamic Gini index, thus the
Sen index accordingly, with the inclusion of the inequality within each class
of poverty where people are classified according to their income. Second, for
each poverty index, we establish a central limit theorem that gives us the
possibility to determine the confidence sets. An application to the Italian
income data from 1998 to 2012 confirms the effectiveness of the considered
approach and the possibility to determine the evolution of poverty and
inequality in real economies.
arXiv link: http://arxiv.org/abs/2006.06595v1
Reserve Price Optimization for First Price Auctions
first-price auctions as its primary mechanism for ad allocation and pricing. In
light of this, publishers need to re-evaluate and optimize their auction
parameters, notably reserve prices. In this paper, we propose a gradient-based
algorithm to adaptively update and optimize reserve prices based on estimates
of bidders' responsiveness to experimental shocks in reserves. Our key
innovation is to draw on the inherent structure of the revenue objective in
order to reduce the variance of gradient estimates and improve convergence
rates in both theory and practice. We show that revenue in a first-price
auction can be usefully decomposed into a demand component and a
bidding component, and introduce techniques to reduce the variance of
each component. We characterize the bias-variance trade-offs of these
techniques and validate the performance of our proposed algorithm through
experiments on synthetic data and real display ad auctions data from Google ad
exchange.
arXiv link: http://arxiv.org/abs/2006.06519v2
Text as data: a machine learning-based approach to measuring uncertainty
both academics and policy practitioners. Here, we analyse news feed data to
construct a simple, general measure of uncertainty in the United States using a
highly cited machine learning methodology. Over the period January 1996 through
May 2020, we show that the series unequivocally Granger-causes the EPU and
there is no Granger-causality in the reverse direction
arXiv link: http://arxiv.org/abs/2006.06457v1
What Drives Inflation and How: Evidence from Additive Mixed Models Selected by cAIC
from 1997 to 2015 with 37 regressors. 98 models motivated by economic theory
are compared to a gradient boosting algorithm, non-linearities and structural
breaks are considered. We show that the typical estimation methods are likely
to lead to fallacious policy conclusions which motivates the use of a new
approach that we propose in this paper. The boosting algorithm outperforms
theory-based models. We confirm that energy prices are important but what
really matters for inflation is their non-linear interplay with energy rents.
Demographic developments also make a difference. Globalization and technology,
public debt, central bank independence and political characteristics are less
relevant. GDP per capita is more relevant than the output gap, credit growth
more than M2 growth.
arXiv link: http://arxiv.org/abs/2006.06274v4
Trading Privacy for the Greater Social Good: How Did America React During COVID-19?
location data are two prime examples of non-therapeutic interventions used in
many countries to mitigate the impact of the COVID-19 pandemic. While many
understand the importance of trading personal privacy for the public good,
others have been alarmed at the potential for surveillance via measures enabled
through location tracking on smartphones. In our research, we analyzed massive
yet atomic individual-level location data containing over 22 billion records
from ten Blue (Democratic) and ten Red (Republican) cities in the U.S., based
on which we present, herein, some of the first evidence of how Americans
responded to the increasing concerns that government authorities, the private
sector, and public health experts might use individual-level location data to
track the COVID-19 spread. First, we found a significant decreasing trend of
mobile-app location-sharing opt-out. Whereas areas with more Democrats were
more privacy-concerned than areas with more Republicans before the advent of
the COVID-19 pandemic, there was a significant decrease in the overall opt-out
rates after COVID-19, and this effect was more salient among Democratic than
Republican cities. Second, people who practiced social distancing (i.e., those
who traveled less and interacted with fewer close contacts during the pandemic)
were also less likely to opt-out, whereas the converse was true for people who
practiced less social-distancing. This relationship also was more salient among
Democratic than Republican cities. Third, high-income populations and males,
compared with low-income populations and females, were more
privacy-conscientious and more likely to opt-out of location tracking.
arXiv link: http://arxiv.org/abs/2006.05859v1
Heterogeneous Effects of Job Displacement on Earnings
different individuals. In particular, our interest centers on features of the
distribution of the individual-level effect of job displacement. Identifying
features of this distribution is particularly challenging -- e.g., even if we
could randomly assign workers to be displaced or not, many of the parameters
that we consider would not be point identified. We exploit our access to panel
data, and our approach relies on comparing outcomes of displaced workers to
outcomes the same workers would have experienced if they had not been displaced
and if they maintained the same rank in the distribution of earnings as they
had before they were displaced. Using data from the Displaced Workers Survey,
we find that displaced workers earn about $157 per week less, on average, than
they would have earned if they had not been displaced. We also find that there
is substantial heterogeneity. We estimate that 42% of workers have higher
earnings than they would have had if they had not been displaced and that a
large fraction of workers have experienced substantially more negative effects
than the average effect of displacement. Finally, we also document major
differences in the distribution of the effect of job displacement across
education levels, sex, age, and counterfactual earnings levels. Throughout the
paper, we rely heavily on quantile regression. First, we use quantile
regression as a flexible (yet feasible) first step estimator of conditional
distributions and quantile functions that our main results build on. We also
use quantile regression to study how covariates affect the distribution of the
individual-level effect of job displacement.
arXiv link: http://arxiv.org/abs/2006.04968v2
False (and Missed) Discoveries in Financial Economics
factor selection. We propose a new way to calibrate both Type I and Type II
errors. Next, using a double-bootstrap method, we establish a t-statistic
hurdle that is associated with a specific false discovery rate (e.g., 5%). We
also establish a hurdle that is associated with a certain acceptable ratio of
misses to false discoveries (Type II error scaled by Type I error), which
effectively allows for differential costs of the two types of mistakes.
Evaluating current methods, we find that they lack power to detect
outperforming managers.
arXiv link: http://arxiv.org/abs/2006.04269v1
Ensemble Learning with Statistical and Structural Models
analysis. In this paper, we propose a set of novel methods for combining
statistical and structural models for improved prediction and causal inference.
Our first proposed estimator has the doubly robustness property in that it only
requires the correct specification of either the statistical or the structural
model. Our second proposed estimator is a weighted ensemble that has the
ability to outperform both models when they are both misspecified. Experiments
demonstrate the potential of our estimators in various settings, including
fist-price auctions, dynamic models of entry and exit, and demand estimation
with instrumental variables.
arXiv link: http://arxiv.org/abs/2006.05308v1
Inflation Dynamics of Financial Shocks
using a Bayesian structural vector autoregressive (SVAR) model that exploits
the non-normalities in the data. We use this method to uniquely identify the
model and employ inequality constraints to single out financial shocks. The
results point to the existence of two distinct financial shocks that have
opposing effects on inflation, which supports the idea that financial shocks
are transmitted to the real economy through both demand and supply side
channels.
arXiv link: http://arxiv.org/abs/2006.03301v1
Evaluating the Effectiveness of Regional Lockdown Policies in the Containment of Covid-19: Evidence from Pakistan
imposed complete and partial lockdown restrictions on socio-economic
activities, religious congregations, and human movement. Here we examine the
impact of regional lockdown strategies on Covid-19 outcomes. After conducting
econometric analyses (Regression Discontinuity and Negative Binomial
Regressions) on official data from the National Institute of Health (NIH)
Pakistan, we find that the strategies did not lead to a similar level of
Covid-19 caseload (positive cases and deaths) in all regions. In terms of
reduction in the overall caseload (positive cases and deaths), compared to no
lockdown, complete and partial lockdown appeared to be effective in four
regions: Balochistan, Gilgit Baltistan (GT), Islamabad Capital Territory (ICT),
and Azad Jammu and Kashmir (AJK). Contrarily, complete and partial lockdowns
did not appear to be effective in containing the virus in the three largest
provinces of Punjab, Sindh, and Khyber Pakhtunkhwa (KPK). The observed regional
heterogeneity in the effectiveness of lockdowns advocates for a careful use of
lockdown strategies based on the demographic, social, and economic factors.
arXiv link: http://arxiv.org/abs/2006.02987v1
The pain of a new idea: Do Late Bloomers response to Extension Service in Rural Ethiopia?
chemical fertilisers in Ethiopia between 1994 and 2004. Fertiliser adoption
provides a suitable strategy to ensure and stabilize food production in remote
vulnerable areas. Extension services programs have a long history in supporting
the application of fertiliser. How-ever, their efficiency is questioned. In our
analysis, we focus on seven villages with a considerable time lag in fertiliser
diffusion. Using matching techniques avoids sample selection bias in the
comparison of treated (households received extension service) and controlled
households. Additionally to common factors, measures of culture, proxied by
ethnicity and religion, aim to control for potential tensions between extension
agents and peasants that hamper the efficiency of the program. We find a
considerable impact of extension service on the first fertiliser adoption. The
impact is consistent for five of seven villages.
arXiv link: http://arxiv.org/abs/2006.02846v1
Tensor Factor Model Estimation by Iterative Projection
observations, has become ubiquitous. It typically exhibits high dimensionality.
One approach for dimension reduction is to use a factor model structure, in a
form similar to Tucker tensor decomposition, except that the time dimension is
treated as a dynamic process with a time dependent structure. In this paper we
introduce two approaches to estimate such a tensor factor model by using
iterative orthogonal projections of the original tensor time series. These
approaches extend the existing estimation procedures and improve the estimation
accuracy and convergence rate significantly as proven in our theoretical
investigation. Our algorithms are similar to the higher order orthogonal
projection method for tensor decomposition, but with significant differences
due to the need to unfold tensors in the iterations and the use of
autocorrelation. Consequently, our analysis is significantly different from the
existing ones. Computational and statistical lower bounds are derived to prove
the optimality of the sample size requirement and convergence rate for the
proposed methods. Simulation study is conducted to further illustrate the
statistical properties of these estimators.
arXiv link: http://arxiv.org/abs/2006.02611v3
Testing Finite Moment Conditions for the Consistency and the Root-N Asymptotic Normality of the GMM and M Estimators
empirical economic analysis are based on the consistency and the root-n
asymptotic normality of the GMM and M estimators. The canonical consistency
(respectively, root-n asymptotic normality) for these classes of estimators
requires at least the first (respectively, second) moment of the score to be
finite. In this article, we present a method of testing these conditions for
the consistency and the root-n asymptotic normality of the GMM and M
estimators. The proposed test controls size nearly uniformly over the set of
data generating processes that are compatible with the null hypothesis.
Simulation studies support this theoretical result. Applying the proposed test
to the market share data from the Dominick's Finer Foods retail chain, we find
that a common ad hoc procedure to deal with zero market shares in
analysis of differentiated products markets results in a failure to satisfy the
conditions for both the consistency and the root-n asymptotic normality.
arXiv link: http://arxiv.org/abs/2006.02541v3
Capital and Labor Income Pareto Exponents across Time and Space
observations that span 52 countries over half a century (1967-2018). We
document two stylized facts: (i) capital income is more unequally distributed
than labor income in the tail; namely, the capital exponent (1-3, median 1.46)
is smaller than labor (2-5, median 3.35), and (ii) capital and labor exponents
are nearly uncorrelated. To explain these findings, we build an incomplete
market model with job ladders and capital income risk that gives rise to a
capital income Pareto exponent smaller than but nearly unrelated to the labor
exponent. Our results suggest the importance of distinguishing income and
wealth inequality.
arXiv link: http://arxiv.org/abs/2006.03441v3
A Negative Correlation Strategy for Bracketing in Difference-in-Differences
causal effect of policy interventions in observational studies. DID employs a
before and after comparison of the treated and control units to remove bias due
to time-invariant unmeasured confounders under the parallel trends assumption.
Estimates from DID, however, will be biased if the outcomes for the treated and
control units evolve differently in the absence of treatment, namely if the
parallel trends assumption is violated. We propose a general identification
strategy that leverages two groups of control units whose outcomes relative to
the treated units exhibit a negative correlation, and achieves partial
identification of the average treatment effect for the treated. The identified
set is of a union bounds form that involves the minimum and maximum operators,
which makes the canonical bootstrap generally inconsistent and naive methods
overly conservative. By utilizing the directional inconsistency of the
bootstrap distribution, we develop a novel bootstrap method to construct
uniformly valid confidence intervals for the identified set and parameter of
interest when the identified set is of a union bounds form, and we establish
the method's theoretical properties. We develop a simple falsification test and
sensitivity analysis. We apply the proposed strategy for bracketing to study
whether minimum wage laws affect employment levels.
arXiv link: http://arxiv.org/abs/2006.02423v3
Evaluating Public Supports to the Investment Activities of Business Firms: A Multilevel Meta-Regression Analysis of Italian Studies
evaluations from Italy, considering both published and grey literature on
enterprise and innovation policies. We specify a multilevel model for the
probability of finding positive effect estimates, also assessing correlation
possibly induced by co-authorship networks. We find that the probability of
positive effects is considerable, especially for weaker firms and outcomes that
are directly targeted by public programmes. However, these policies are less
likely to trigger change in the long run.
arXiv link: http://arxiv.org/abs/2006.01880v1
Subjective Complexity Under Uncertainty
feature of many of the environments in which departures from expected utility
theory are observed. I propose and axiomatize a model of choice under
uncertainty in which the size of the partition with respect to which an act is
measurable arises endogenously as a measure of subjective complexity. I derive
a representation of incomplete Simple Bounds preferences in which acts that are
complex from the perspective of the decision maker are bracketed by simple acts
to which they are related by statewise dominance. The key axioms are motivated
by a model of learning from limited data. I then consider choice behavior
characterized by a "cautious completion" of Simple Bounds preferences, and
discuss the relationship between this model and models of ambiguity aversion. I
develop general comparative statics results, and explore applications to
portfolio choice, contracting, and insurance choice.
arXiv link: http://arxiv.org/abs/2006.01852v3
On the plausibility of the latent ignorability assumption
instrumental variable (IV) is often complicated by attrition, sample selection,
or non-response in the outcome of interest. To tackle the latter problem, the
latent ignorability (LI) assumption imposes that attrition/sample selection is
independent of the outcome conditional on the treatment compliance type (i.e.
how the treatment behaves as a function of the instrument), the instrument, and
possibly further observed covariates. As a word of caution, this note formally
discusses the strong behavioral implications of LI in rather standard IV
models. We also provide an empirical illustration based on the Job Corps
experimental study, in which the sensitivity of the estimated program effect to
LI and alternative assumptions about outcome attrition is investigated.
arXiv link: http://arxiv.org/abs/2006.01703v2
Explaining the distribution of energy consumption at slow charging infrastructure for electric vehicles from socio-economic data
activities, function, and characteristics of the environment surrounding the
slow charging infrastructure impact the distribution of the electricity
consumed at slow charging infrastructure. To gain a basic insight, we analysed
the probabilistic distribution of energy consumption and its relation to
indicators characterizing charging events. We collected geospatial datasets and
utilizing statistical methods for data pre-processing, we prepared features
modelling the spatial context in which the charging infrastructure operates. To
enhance the statistical reliability of results, we applied the bootstrap method
together with the Lasso method that combines regression with variable selection
ability. We evaluate the statistical distributions of the selected regression
coefficients. We identified the most influential features correlated with
energy consumption, indicating that the spatial context of the charging
infrastructure affects its utilization pattern. Many of these features are
related to the economic prosperity of residents. Application of the methodology
to a specific class of charging infrastructure enables the differentiation of
selected features, e.g. by the used rollout strategy. Overall, the paper
demonstrates the application of statistical methodologies to energy data and
provides insights on factors potentially shaping the energy consumption that
could be utilized when developing models to inform charging infrastructure
deployment and planning of power grids.
arXiv link: http://arxiv.org/abs/2006.01672v2
Estimates of derivatives of (log) densities and related objects
approximation to the logarithm of an unknown density $f$. The estimator is
guaranteed to be nonnegative and achieves the same optimal rate of convergence
in the interior as well as the boundary of the support of $f$. The estimator is
therefore well-suited to applications in which nonnegative density estimates
are required, such as in semiparametric maximum likelihood estimation. In
addition, we show that our estimator compares favorably with other kernel-based
methods, both in terms of asymptotic performance and computational ease.
Simulation results confirm that our method can perform similarly in finite
samples to these alternative methods when they are used with optimal inputs,
i.e. an Epanechnikov kernel and optimally chosen bandwidth sequence. Further
simulation evidence demonstrates that, if the researcher modifies the inputs
and chooses a larger bandwidth, our approach can even improve upon these
optimized alternatives, asymptotically. We provide code in several languages.
arXiv link: http://arxiv.org/abs/2006.01328v1
Revisiting money and labor for valuing environmental goods and services in developing countries
low willingness to pay (WTP) for a wide range of goods and services. However,
recent studies in these countries indicate that this may partly be a result of
the choice of payment vehicle, not the preference for the good. Thus, low WTP
may not indicate a low welfare effect for public projects in developing
countries. We argue that in a setting where 1) there is imperfect
substitutability between money and other measures of wealth (e.g. labor), and
2) institutions are perceived to be corrupt, including payment vehicles that
are currently available to the individual and less pron to corruption may be
needed to obtain valid welfare estimates. Otherwise, we risk underestimating
the welfare benefit of projects. We demonstrate this through a rural household
contingent valuation (CV) survey designed to elicit the value of access to
reliable irrigation water in Ethiopia. Of the total average annual WTP for
access to reliable irrigation service, cash contribution comprises only 24.41
%. The implication is that socially desirable projects might be rejected based
on cost-benefit analysis as a result of welfare gain underestimation due to
mismatch of payment vehicles choice in valuation study.
arXiv link: http://arxiv.org/abs/2006.01290v3
New Approaches to Robust Inference on Market (Non-)Efficiency, Volatility Clustering and Nonlinear Dependence
nonlinear dependence, heterogeneity and heavy-tailedness. These properties may
make problematic the analysis of (non-)efficiency and volatility clustering in
economic and financial markets using traditional approaches that appeal to
asymptotic normality of sample autocorrelation functions of returns and their
squares.
This paper presents new approaches to deal with the above problems. We
provide the results that motivate the use of measures of market
(non-)efficiency and volatility clustering based on (small) powers of absolute
returns and their signed versions.
We further provide new approaches to robust inference on the measures in the
case of general time series, including GARCH-type processes. The approaches are
based on robust $t-$statistics tests and new results on their applicability are
presented. In the approaches, parameter estimates (e.g., estimates of measures
of nonlinear dependence) are computed for groups of data, and the inference is
based on $t-$statistics in the resulting group estimates. This results in valid
robust inference under heterogeneity and dependence assumptions satisfied in
real-world financial markets. Numerical results and empirical applications
confirm the advantages and wide applicability of the proposed approaches.
arXiv link: http://arxiv.org/abs/2006.01212v4
New robust inference for predictive regressions
predictive regression models under heterogeneous and persistent volatility as
well as endogenous, persistent and/or fat-tailed regressors and errors. The
proposed robust testing approaches are applicable both in the case of discrete
and continuous time models. Both of the methods use the Cauchy estimator to
effectively handle the problems of endogeneity, persistence and/or
fat-tailedness in regressors and errors. The difference between our two methods
is how the heterogeneous volatility is controlled. The first method relies on
robust t-statistic inference using group estimators of a regression parameter
of interest proposed in Ibragimov and Muller, 2010. It is simple to implement,
but requires the exogenous volatility assumption. To relax the exogenous
volatility assumption, we propose another method which relies on the
nonparametric correction of volatility. The proposed methods perform well
compared with widely used alternative inference procedures in terms of their
finite sample properties.
arXiv link: http://arxiv.org/abs/2006.01191v4
Do Public Program Benefits Crowd Out Private Transfers in Developing Countries? A Critical Review of Recent Evidence
and longevity gains, social protection programs have been on the rise in low-
and middle-income countries (LMICs) in the last three decades. However, the
introduction of public benefits could displace informal mechanisms for
risk-protection, which are especially prevalent in LMICs. If the displacement
of private transfers is considerably large, the expansion of social protection
programs could even lead to social welfare loss. In this paper, we critically
survey the recent empirical literature on crowd-out effects in response to
public policies, specifically in the context of LMICs. We review and synthesize
patterns from the behavioral response to various types of social protection
programs. Furthermore, we specifically examine for heterogeneous treatment
effects by important socioeconomic characteristics. We conclude by drawing on
lessons from our synthesis of studies. If poverty reduction objectives are
considered, along with careful program targeting that accounts for potential
crowd-out effects, there may well be a net social gain.
arXiv link: http://arxiv.org/abs/2006.00737v2
Influence via Ethos: On the Persuasive Power of Reputation in Deliberation Online
opinions that drive votes, purchases, donations and other critical offline
behavior. Yet, the determinants of opinion-change via persuasion in
deliberation online remain largely unexplored. Our research examines the
persuasive power of $ethos$ -- an individual's "reputation" -- using a
7-year panel of over a million debates from an argumentation platform
containing explicit indicators of successful persuasion. We identify the causal
effect of reputation on persuasion by constructing an instrument for reputation
from a measure of past debate competition, and by controlling for unstructured
argument text using neural models of language in the double machine-learning
framework. We find that an individual's reputation significantly impacts their
persuasion rate above and beyond the validity, strength and presentation of
their arguments. In our setting, we find that having 10 additional reputation
points causes a 31% increase in the probability of successful persuasion over
the platform average. We also find that the impact of reputation is moderated
by characteristics of the argument content, in a manner consistent with a
theoretical model that attributes the persuasive power of reputation to
heuristic information-processing under cognitive overload. We discuss
managerial implications for platforms that facilitate deliberative
decision-making for public and private organizations online.
arXiv link: http://arxiv.org/abs/2006.00707v1
Lockdown Strategies, Mobility Patterns and COVID-19
variation in the timing, type and level of intensity of various public policies
to study their dynamic effects on the daily incidence of COVID-19 and on
population mobility patterns across 135 countries. We remove concurrent policy
bias by taking into account the contemporaneous presence of multiple
interventions. The main result of the paper is that cancelling public events
and imposing restrictions on private gatherings followed by school closures
have quantitatively the most pronounced effects on reducing the daily incidence
of COVID-19. They are followed by workplace as well as stay-at-home
requirements, whose statistical significance and levels of effect are not as
pronounced. Instead, we find no effects for international travel controls,
public transport closures and restrictions on movements across cities and
regions. We establish that these findings are mediated by their effect on
population mobility patterns in a manner consistent with time-use and
epidemiological factors.
arXiv link: http://arxiv.org/abs/2006.00531v1
Statistical Decision Properties of Imprecise Trials Assessing COVID-19 Drugs
randomized trials comparing standard care with care augmented by experimental
drugs. The trials have small sample sizes, so estimates of treatment effects
are imprecise. Seeing imprecision, clinicians reading research articles may
find it difficult to decide when to treat patients with experimental drugs.
Whatever decision criterion one uses, there is always some probability that
random variation in trial outcomes will lead to prescribing sub-optimal
treatments. A conventional practice when comparing standard care and an
innovation is to choose the innovation only if the estimated treatment effect
is positive and statistically significant. This practice defers to standard
care as the status quo. To evaluate decision criteria, we use the concept of
near-optimality, which jointly considers the probability and magnitude of
decision errors. An appealing decision criterion from this perspective is the
empirical success rule, which chooses the treatment with the highest observed
average patient outcome in the trial. Considering the design of recent and
ongoing COVID-19 trials, we show that the empirical success rule yields
treatment results that are much closer to optimal than those generated by
prevailing decision criteria based on hypothesis tests.
arXiv link: http://arxiv.org/abs/2006.00343v1
Parametric Modeling of Quantile Regression Coefficient Functions with Longitudinal Data
one at a time. An alternative approach, which is referred to as quantile
regression coefficients modeling (QRCM), is to model quantile regression
coefficients as parametric functions of the order of the quantile. In this
paper, we describe how the QRCM paradigm can be applied to longitudinal data.
We introduce a two-level quantile function, in which two different quantile
regression models are used to describe the (conditional) distribution of the
within-subject response and that of the individual effects. We propose a novel
type of penalized fixed-effects estimator, and discuss its advantages over
standard methods based on $\ell_1$ and $\ell_2$ penalization. We provide model
identifiability conditions, derive asymptotic properties, describe
goodness-of-fit measures and model selection criteria, present simulation
results, and discuss an application. The proposed method has been implemented
in the R package qrcm.
arXiv link: http://arxiv.org/abs/2006.00160v1
The impacts of asymmetry on modeling and forecasting realized volatility in Japanese stock markets
forecasting of realized volatility in the Japanese futures and spot stock
markets. We employ heterogeneous autoregressive (HAR) models allowing for three
types of asymmetry: positive and negative realized semivariance (RSV),
asymmetric jumps, and leverage effects. The estimation results show that
leverage effects clearly influence the modeling of realized volatility models.
Leverage effects exist for both the spot and futures markets in the Nikkei 225.
Although realized semivariance aids better modeling, the estimations of RSV
models depend on whether these models have leverage effects. Asymmetric jump
components do not have a clear influence on realized volatility models. While
leverage effects and realized semivariance also improve the out-of-sample
forecast performance of volatility models, asymmetric jumps are not useful for
predictive ability. The empirical results of this study indicate that
asymmetric information, in particular, leverage effects and realized
semivariance, yield better modeling and more accurate forecast performance.
Accordingly, asymmetric information should be included when we model and
forecast the realized volatility of Japanese stock markets.
arXiv link: http://arxiv.org/abs/2006.00158v1
Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the U.S
states on the growth rates of confirmed Covid-19 cases and deaths as well as
social distancing behavior measured by Google Mobility Reports, where we take
into consideration people's voluntarily behavioral response to new information
of transmission risks. Our analysis finds that both policies and information on
transmission risks are important determinants of Covid-19 cases and deaths and
shows that a change in policies explains a large fraction of observed changes
in social distancing behavior. Our counterfactual experiments suggest that
nationally mandating face masks for employees on April 1st could have reduced
the growth rate of cases and deaths by more than 10 percentage points in late
April, and could have led to as much as 17 to 55 percent less deaths nationally
by the end of May, which roughly translates into 17 to 55 thousand saved lives.
Our estimates imply that removing non-essential business closures (while
maintaining school closures, restrictions on movie theaters and restaurants)
could have led to -20 to 60 percent more cases and deaths by the end of May. We
also find that, without stay-at-home orders, cases would have been larger by 25
to 170 percent, which implies that 0.5 to 3.4 million more Americans could have
been infected if stay-at-home orders had not been implemented. Finally, not
having implemented any policies could have led to at least a 7 fold increase
with an uninformative upper bound in cases (and deaths) by the end of May in
the US, with considerable uncertainty over the effects of school closures,
which had little cross-sectional variation.
arXiv link: http://arxiv.org/abs/2005.14168v4
Machine Learning Time Series Regressions with an Application to Nowcasting
high-dimensional time series data potentially sampled at different frequencies.
The sparse-group LASSO estimator can take advantage of such time series data
structures and outperforms the unstructured LASSO. We establish oracle
inequalities for the sparse-group LASSO estimator within a framework that
allows for the mixing processes and recognizes that the financial and the
macroeconomic data may have heavier than exponential tails. An empirical
application to nowcasting US GDP growth indicates that the estimator performs
favorably compared to other alternatives and that text data can be a useful
addition to more traditional numerical data.
arXiv link: http://arxiv.org/abs/2005.14057v4
Breiman's "Two Cultures" Revisited and Reconciled
standoff between two cultures of data modeling: parametric statistical and
algorithmic machine learning. The cultural division between these two
statistical learning frameworks has been growing at a steady pace in recent
years. What is the way forward? It has become blatantly obvious that this
widening gap between "the two cultures" cannot be averted unless we find a way
to blend them into a coherent whole. This article presents a solution by
establishing a link between the two cultures. Through examples, we describe the
challenges and potential gains of this new integrated statistical thinking.
arXiv link: http://arxiv.org/abs/2005.13596v1
Probabilistic multivariate electricity price forecasting using implicit generative ensemble post-processing
risk-sensitive optimal decision making. In this paper, we propose implicit
generative ensemble post-processing, a novel framework for multivariate
probabilistic electricity price forecasting. We use a likelihood-free implicit
generative model based on an ensemble of point forecasting models to generate
multivariate electricity price scenarios with a coherent dependency structure
as a representation of the joint predictive distribution. Our ensemble
post-processing method outperforms well-established model combination
benchmarks. This is demonstrated on a data set from the German day-ahead
market. As our method works on top of an ensemble of domain-specific expert
models, it can readily be deployed to other forecasting tasks.
arXiv link: http://arxiv.org/abs/2005.13417v1
Fair Policy Targeting
welfare programs is discrimination: individualized treatments may induce
disparities across sensitive attributes such as age, gender, or race. This
paper addresses the question of the design of fair and efficient treatment
allocation rules. We adopt the non-maleficence perspective of first do no harm:
we select the fairest allocation within the Pareto frontier. We cast the
optimization into a mixed-integer linear program formulation, which can be
solved using off-the-shelf algorithms. We derive regret bounds on the
unfairness of the estimated policy function and small sample guarantees on the
Pareto frontier under general notions of fairness. Finally, we illustrate our
method using an application from education economics.
arXiv link: http://arxiv.org/abs/2005.12395v3
An alternative to synthetic control for models with many covariates under sparsity
effects when only one unit is treated. While initially aimed at evaluating the
effect of large-scale macroeconomic changes with very few available control
units, it has increasingly been used in place of more well-known
microeconometric tools in a broad range of applications, but its properties in
this context are unknown. This paper introduces an alternative to the synthetic
control method, which is developed both in the usual asymptotic framework and
in the high-dimensional scenario. We propose an estimator of average treatment
effect that is doubly robust, consistent and asymptotically normal. It is also
immunized against first-step selection mistakes. We illustrate these properties
using Monte Carlo simulations and applications to both standard and potentially
high-dimensional settings, and offer a comparison with the synthetic control
method.
arXiv link: http://arxiv.org/abs/2005.12225v2
Bootstrap Inference for Quantile Treatment Effects in Randomized Experiments with Matched Pairs
effects (QTEs) in randomized experiments with matched-pairs designs (MPDs).
Standard multiplier bootstrap inference fails to capture the negative
dependence of observations within each pair and is therefore conservative.
Analytical inference involves estimating multiple functional quantities that
require several tuning parameters. Instead, this paper proposes two bootstrap
methods that can consistently approximate the limit distribution of the
original QTE estimator and lessen the burden of tuning parameter choice. Most
especially, the inverse propensity score weighted multiplier bootstrap can be
implemented without knowledge of pair identities.
arXiv link: http://arxiv.org/abs/2005.11967v4
Macroeconomic factors for inflation in Argentine 2013-2019
order to identify the role of the relevant macroeconomic variables in driving
the inflation. The Macroeconomic predictors that usually affect the inflation
are summarized using a small number of factors constructed by the principal
components. This allows us to identify the crucial role of money growth,
inflation expectation and exchange rate in driving the inflation. Then we use
this factors to build econometric models to forecast inflation. Specifically,
we use univariate and multivariate models such as classical autoregressive,
Factor models and FAVAR models. Results of forecasting suggest that models
which incorporate more economic information outperform the benchmark.
Furthermore, causality test and impulse response are performed in order to
examine the short-run dynamics of inflation to shocks in the principal factors.
arXiv link: http://arxiv.org/abs/2005.11455v1
The probability of a robust inference for internal validity and its applications in regression models
this study, we define the unobserved sample based on the counterfactuals and
formalize its relationship with the null hypothesis statistical testing (NHST)
for regression models. The probability of a robust inference for internal
validity, i.e., the PIV, is the probability of rejecting the null hypothesis
again based on the ideal sample which is defined as the combination of the
observed and unobserved samples, provided the same null hypothesis has already
been rejected for the observed sample. When the unconfoundedness assumption is
dubious, one can bound the PIV of an inference based on bounded belief about
the mean counterfactual outcomes, which is often needed in this case.
Essentially, the PIV is statistical power of the NHST that is thought to be
built on the ideal sample. We summarize the process of evaluating internal
validity with the PIV into a six-step procedure and illustrate it with an
empirical example (i.e., Hong and Raudenbush (2005)).
arXiv link: http://arxiv.org/abs/2005.12784v1
On the Nuisance of Control Variables in Regression Analysis
effect of a treatment on an outcome. In this paper, we argue that the estimated
effect sizes of controls are unlikely to have a causal interpretation
themselves, though. This is because even valid controls are possibly endogenous
and represent a combination of several different causal mechanisms operating
jointly on the outcome, which is hard to interpret theoretically. Therefore, we
recommend refraining from interpreting marginal effects of controls and
focusing on the main variables of interest, for which a plausible
identification argument can be established. To prevent erroneous managerial or
policy implications, coefficients of control variables should be clearly marked
as not having a causal interpretation or omitted from regression tables
altogether. Moreover, we advise against using control variable estimates for
subsequent theory building and meta-analyses.
arXiv link: http://arxiv.org/abs/2005.10314v5
Stochastic modeling of assets and liabilities with mortality risk
returns and liability cash-flows of a typical pensions insurer. On the asset
side, we model the investment returns on equities and various classes of
fixed-income instruments including short- and long-maturity fixed-rate bonds as
well as index-linked and corporate bonds. On the liability side, the risks are
driven by future mortality developments as well as price and wage inflation.
All the risk factors are modeled as a multivariate stochastic process that
captures the dynamics and the dependencies across different risk factors. The
model is easy to interpret and to calibrate to both historical data and to
forecasts or expert views concerning the future. The simple structure of the
model allows for efficient computations. The construction of a million
scenarios takes only a few minutes on a personal computer. The approach is
illustrated with an asset-liability analysis of a defined benefit pension fund.
arXiv link: http://arxiv.org/abs/2005.09974v1
Uniform Rates for Kernel Estimators of Weakly Dependent Data
absolutely regular stationary processes that are uniform in the bandwidth and
in infinite-dimensional classes of dependent variables and regressors. Our
results are useful for establishing asymptotic theory for two-step
semiparametric estimators in time series models. We apply our results to obtain
nonparametric estimates and their rates for Expected Shortfall processes.
arXiv link: http://arxiv.org/abs/2005.09951v1
Treatment recommendation with distributional targets
treatment recommendation based on an experiment. The desirability of the
outcome distribution resulting from the policy recommendation is measured
through a functional capturing the distributional characteristic that the
decision maker is interested in optimizing. This could be, e.g., its inherent
inequality, welfare, level of poverty or its distance to a desired outcome
distribution. If the functional of interest is not quasi-convex or if there are
constraints, the optimal recommendation may be a mixture of treatments. This
vastly expands the set of recommendations that must be considered. We
characterize the difficulty of the problem by obtaining maximal expected regret
lower bounds. Furthermore, we propose two (near) regret-optimal policies. The
first policy is static and thus applicable irrespectively of subjects arriving
sequentially or not in the course of the experimentation phase. The second
policy can utilize that subjects arrive sequentially by successively
eliminating inferior treatments and thus spends the sampling effort where it is
most needed.
arXiv link: http://arxiv.org/abs/2005.09717v4
Evaluating Policies Early in a Pandemic: Bounding Policy Effects with Nonrandomly Missing Data
governments introduced a number of policies to combat the spread of Covid-19.
In this paper, we propose a new approach to bound the effects of such
early-pandemic policies on Covid-19 cases and other outcomes while dealing with
complications arising from (i) limited availability of Covid-19 tests, (ii)
differential availability of Covid-19 tests across locations, and (iii)
eligibility requirements for individuals to be tested. We use our approach
study the effects of Tennessee's expansion of Covid-19 testing early in the
pandemic and find that the policy decreased Covid-19 cases.
arXiv link: http://arxiv.org/abs/2005.09605v6
Instrumental Variables with Treatment-Induced Selection: Exact Bias Results
analysis conditions on the treatment. Judea Pearl's early graphical definition
of instrumental variables explicitly prohibited conditioning on the treatment.
Nonetheless, the practice remains common. In this paper, we derive exact
analytic expressions for IV selection bias across a range of data-generating
models, and for various selection-inducing procedures. We present four sets of
results for linear models. First, IV selection bias depends on the conditioning
procedure (covariate adjustment vs. sample truncation). Second, IV selection
bias due to covariate adjustment is the limiting case of IV selection bias due
to sample truncation. Third, in certain models, the IV and OLS estimators under
selection bound the true causal effect in large samples. Fourth, we
characterize situations where IV remains preferred to OLS despite selection on
the treatment. These results broaden the notion of IV selection bias beyond
sample truncation, replace prior simulation findings with exact analytic
formulas, and enable formal sensitivity analyses.
arXiv link: http://arxiv.org/abs/2005.09583v1
A Flexible Stochastic Conditional Duration Model
markets. We argue that widely accepted rules for aggregating seemingly related
trades mislead inference pertaining to durations between unrelated trades:
while any two trades executed in the same second are probably related, it is
extremely unlikely that all such pairs of trades are, in a typical sample. By
placing uncertainty about which trades are related within our model, we improve
inference for the distribution of durations between unrelated trades,
especially near zero. We introduce a normalized conditional distribution for
durations between unrelated trades that is both flexible and amenable to
shrinkage towards an exponential distribution, which we argue is an appropriate
first-order model. Thanks to highly efficient draws of state variables,
numerical efficiency of posterior simulation is much higher than in previous
studies. In an empirical application, we find that the conditional hazard
function for durations between unrelated trades varies much less than what most
studies find. We claim that this is because we avoid statistical artifacts that
arise from deterministic trade-aggregation rules and unsuitable parametric
distributions.
arXiv link: http://arxiv.org/abs/2005.09166v1
Is being an only child harmful to psychological health?: Evidence from an instrumental variable analysis of China's One-Child Policy
psychological health, leveraging data on the One-Child Policy in China. We use
an instrumental variable approach to address the potential unmeasured
confounding between the fertility decision and psychological health, where the
instrumental variable is an index on the intensity of the implementation of the
One-Child Policy. We establish an analytical link between the local
instrumental variable approach and principal stratification to accommodate the
continuous instrumental variable. Within the principal stratification
framework, we postulate a Bayesian hierarchical model to infer various causal
estimands of policy interest while adjusting for the clustering data structure.
We apply the method to the data from the China Family Panel Studies and find
small but statistically significant negative effects of being an only child on
self-reported psychological health for some subpopulations. Our analysis
reveals treatment effect heterogeneity with respect to both observed and
unobserved characteristics. In particular, urban males suffer the most from
being only children, and the negative effect has larger magnitude if the
families were more resistant to the One-Child Policy. We also conduct
sensitivity analysis to assess the key instrumental variable assumption.
arXiv link: http://arxiv.org/abs/2005.09130v2
Role models and revealed gender-specific costs of STEM in an extended Roy model of major choice
extended Roy model of sector selection. We interpret this non consumption
utility component as a compensating wage differential. The bounds are derived
under the assumption that potential utilities in each sector are (jointly)
stochastically monotone with respect to an observed selection shifter. The
research is motivated by the analysis of women's choice of university major,
their under representation in mathematics intensive fields, and the impact of
role models on choices and outcomes. To illustrate our methodology, we
investigate the cost of STEM fields with data from a German graduate survey,
and using the mother's education level and the proportion of women on the STEM
faculty at the time of major choice as selection shifters.
arXiv link: http://arxiv.org/abs/2005.09095v4
Irregular Identification of Structural Models with Nonparametric Unobserved Heterogeneity
pervasiveness of heterogeneity in economic behaviour (cf. Heckman 2001). This
paper shows that cumulative distribution functions and quantiles of the
nonparametric unobserved heterogeneity have an infinite efficiency bound in
many structural economic models of interest. The paper presents a relatively
simple check of this fact. The usefulness of the theory is demonstrated with
several relevant examples in economics, including, among others, the proportion
of individuals with severe long term unemployment duration, the average
marginal effect and the proportion of individuals with a positive marginal
effect in a correlated random coefficient model with heterogenous first-stage
effects, and the distribution and quantiles of random coefficients in linear,
binary and the Mixed Logit models. Monte Carlo simulations illustrate the
finite sample implications of our findings for the distribution and quantiles
of the random coefficients in the Mixed Logit model.
arXiv link: http://arxiv.org/abs/2005.08611v1
Nested Model Averaging on Solution Path for High-dimensional Linear Regression
high-dimensional linear regression problem. In particular, we propose to
combine model averaging with regularized estimators (e.g., lasso and SLOPE) on
the solution path for high-dimensional linear regression. In simulation
studies, we first conduct a systematic investigation on the impact of predictor
ordering on the behavior of nested model averaging, then show that nested model
averaging with lasso and SLOPE compares favorably with other competing methods,
including the infeasible lasso and SLOPE with the tuning parameter optimally
selected. A real data analysis on predicting the per capita violent crime in
the United States shows an outstanding performance of the nested model
averaging with lasso.
arXiv link: http://arxiv.org/abs/2005.08057v1
Conformal Prediction: a Unified Review of Theory and New Challenges
Conformal Prediction -- an innovative distribution-free, non-parametric
forecasting method, based on minimal assumptions -- that is able to yield in a
very straightforward way predictions sets that are valid in a statistical sense
also in in the finite sample case. The in-depth discussion provided in the
paper covers the theoretical underpinnings of Conformal Prediction, and then
proceeds to list the more advanced developments and adaptations of the original
idea.
arXiv link: http://arxiv.org/abs/2005.07972v2
Fast and Accurate Variational Inference for Models with Many Latent Variables
utilize the information in big or complex data. However, they can be difficult
to estimate using standard approaches, and variational inference methods are a
popular alternative. Key to the success of these is the selection of an
approximation to the target density that is accurate, tractable and fast to
calibrate using optimization methods. Most existing choices can be inaccurate
or slow to calibrate when there are many latent variables. Here, we propose a
family of tractable variational approximations that are more accurate and
faster to calibrate for this case. It combines a parsimonious parametric
approximation for the parameter posterior, with the exact conditional posterior
of the latent variables. We derive a simplified expression for the
re-parameterization gradient of the variational lower bound, which is the main
ingredient of efficient optimization algorithms used to implement variational
estimation. To do so only requires the ability to generate exactly or
approximately from the conditional posterior of the latent variables, rather
than to compute its density. We illustrate using two complex contemporary
econometric examples. The first is a nonlinear multivariate state space model
for U.S. macroeconomic variables. The second is a random coefficients tobit
model applied to two million sales by 20,000 individuals in a large consumer
panel from a marketing study. In both cases, we show that our approximating
family is considerably more accurate than mean field or structured Gaussian
approximations, and faster than Markov chain Monte Carlo. Last, we show how to
implement data sub-sampling in variational inference for our approximation,
which can lead to a further reduction in computation time. MATLAB code
implementing the method for our examples is included in supplementary material.
arXiv link: http://arxiv.org/abs/2005.07430v3
Dynamic shrinkage in time-varying parameter stochastic volatility in mean models
flexibility. This is often achieved by employing suitable shrinkage priors that
penalize model complexity but also reward model fit. In this note, we modify
the stochastic volatility in mean (SVM) model proposed in Chan (2017) by
introducing state-of-the-art shrinkage techniques that allow for time-variation
in the degree of shrinkage. Using a real-time inflation forecast exercise, we
show that employing more flexible prior distributions on several key parameters
slightly improves forecast performance for the United States (US), the United
Kingdom (UK) and the Euro Area (EA). Comparing in-sample results reveals that
our proposed model yields qualitatively similar insights to the original
version of the model.
arXiv link: http://arxiv.org/abs/2005.06851v1
Combining Population and Study Data for Inference on Event Rates
share of individuals in some subgroup of a population that experience some
event. The specific complication is that the size of the subgroup needs to be
estimated, whereas the number of individuals that experience the event is
known. The problem is motivated by the recent study of Streeck et al. (2020),
who estimate the infection fatality rate (IFR) of SARS-CoV-2 infection in a
German town that experienced a super-spreading event in mid-February 2020. In
their case the subgroup of interest is comprised of all infected individuals,
and the event is death caused by the infection. We clarify issues with the
precise definition of the target parameter in this context, and propose
confidence intervals (CIs) based on classical statistical principles that
result in good coverage properties.
arXiv link: http://arxiv.org/abs/2005.06769v1
Moment Conditions for Dynamic Panel Logit Models with Fixed Effects
choice panel data with individual specific fixed effects. We describe how to
systematically explore the existence of moment conditions that do not depend on
the fixed effects, and we demonstrate how to construct them when they exist.
Our approach is closely related to the numerical "functional differencing"
construction in Bonhomme (2012), but our emphasis is to find explicit analytic
expressions for the moment functions. We first explain the construction and
give examples of such moment conditions in various models. Then, we focus on
the dynamic binary choice logit model and explore the implications of the
moment conditions for identification and estimation of the model parameters
that are common to all individuals.
arXiv link: http://arxiv.org/abs/2005.05942v7
Fractional trends and cycles in macroeconomic time series
avoids prior assumptions about the long-run dynamic characteristics by
modelling the permanent component as a fractionally integrated process and
incorporating a fractional lag operator into the autoregressive polynomial of
the cyclical component. The model allows for an endogenous estimation of the
integration order jointly with the other model parameters and, therefore, no
prior specification tests with respect to persistence are required. We relate
the model to the Beveridge-Nelson decomposition and derive a modified Kalman
filter estimator for the fractional components. Identification, consistency,
and asymptotic normality of the maximum likelihood estimator are shown. For US
macroeconomic data we demonstrate that, unlike $I(1)$ correlated unobserved
components models, the new model estimates a smooth trend together with a cycle
hitting all NBER recessions. While $I(1)$ unobserved components models yield an
upward-biased signal-to-noise ratio whenever the integration order of the
data-generating mechanism is greater than one, the fractionally integrated
model attributes less variation to the long-run shocks due to the fractional
trend specification and a higher variation to the cycle shocks due to the
fractional lag operator, leading to more persistent cycles and smooth trend
estimates that reflect macroeconomic common sense.
arXiv link: http://arxiv.org/abs/2005.05266v2
Macroeconomic Forecasting with Fractional Factor Models
and derive models where nonstationary, potentially cointegrated data of
different persistence is modelled as a function of common fractionally
integrated factors. A two-stage estimator, that combines principal components
and the Kalman filter, is proposed. The forecast performance is studied for a
high-dimensional US macroeconomic data set, where we find that benefits from
the fractional factor models can be substantial, as they outperform univariate
autoregressions, principal components, and the factor-augmented
error-correction model.
arXiv link: http://arxiv.org/abs/2005.04897v1
Posterior Probabilities for Lorenz and Stochastic Dominance of Australian Income Distributions
posterior probabilities for dominance for all pairwise comparisons of income
distributions in these years. The dominance criteria considered are Lorenz
dominance and first and second order stochastic dominance. The income
distributions are estimated using an infinite mixture of gamma density
functions, with posterior probabilities computed as the proportion of Markov
chain Monte Carlo draws that satisfy the inequalities that define the dominance
criteria. We find welfare improvements from 2001 to 2006 and qualified
improvements from 2006 to the later three years. Evidence of an ordering
between 2010, 2014 and 2017 cannot be established.
arXiv link: http://arxiv.org/abs/2005.04870v2
Probabilistic Multi-Step-Ahead Short-Term Water Demand Forecasting with Lasso
decision making. Hence, the development of accurate forecasts is a valuable
field of research to further improve the efficiency of water utilities.
Focusing on probabilistic multi-step-ahead forecasting, a time series model is
introduced, to capture typical autoregressive, calendar and seasonal effects,
to account for time-varying variance, and to quantify the uncertainty and
path-dependency of the water demand process. To deal with the high complexity
of the water demand process a high-dimensional feature space is applied, which
is efficiently tuned by an automatic shrinkage and selection operator (lasso).
It allows to obtain an accurate, simple interpretable and fast computable
forecasting model, which is well suited for real-time applications. The
complete probabilistic forecasting framework allows not only for simulating the
mean and the marginal properties, but also the correlation structure between
hours within the forecasting horizon. For practitioners, complete probabilistic
multi-step-ahead forecasts are of considerable relevance as they provide
additional information about the expected aggregated or cumulative water
demand, so that a statement can be made about the probability with which a
water storage capacity can guarantee the supply over a certain period of time.
This information allows to better control storage capacities and to better
ensure the smooth operation of pumps. To appropriately evaluate the forecasting
performance of the considered models, the energy score (ES) as a strictly
proper multidimensional evaluation criterion, is introduced. The methodology is
applied to the hourly water demand data of a German water supplier.
arXiv link: http://arxiv.org/abs/2005.04522v1
Critical Values Robust to P-hacking
testing theory. As a consequence, significant results are much more common than
they are supposed to be when the null hypothesis is in fact true. In this
paper, we build a model of hypothesis testing with p-hacking. From the model,
we construct critical values such that, if the values are used to determine
significance, and if scientists' p-hacking behavior adjusts to the new
significance standards, significant results occur with the desired frequency.
Such robust critical values allow for p-hacking so they are larger than
classical critical values. To illustrate the amount of correction that
p-hacking might require, we calibrate the model using evidence from the medical
sciences. In the calibrated model the robust critical value for any test
statistic is the classical critical value for the same test statistic with one
fifth of the significance level.
arXiv link: http://arxiv.org/abs/2005.04141v8
How Reliable are Bootstrap-based Heteroskedasticity Robust Tests?
bootstrap-based heteroskedasticity robust tests in linear regression models. In
particular, these results provide an efficient diagnostic check, which can be
used to weed out tests that are unreliable for a given testing problem in the
sense that they overreject substantially. This allows us to assess the
reliability of a large variety of wild bootstrap-based tests in an extensive
numerical study.
arXiv link: http://arxiv.org/abs/2005.04089v2
Fractional trends in unobserved components models
wide range of long-run dynamics by modelling the permanent component as a
fractionally integrated process. The model does not require stationarity and
can be cast in state space form. In a multivariate setup, fractional trends may
yield a cointegrated system. We derive the Kalman filter estimator for the
common fractionally integrated component and establish consistency and
asymptotic (mixed) normality of the maximum likelihood estimator. We apply the
model to extract a common long-run component of three US inflation measures,
where we show that the $I(1)$ assumption is likely to be violated for the
common trend.
arXiv link: http://arxiv.org/abs/2005.03988v2
Dynamic Shrinkage Priors for Large Time-varying Parameter Regressions using Scalable Markov Chain Monte Carlo Methods
coefficients. Careful prior elicitation is required to yield sensible posterior
and predictive inferences. In addition, the computational demands of Markov
Chain Monte Carlo (MCMC) methods mean their use is limited to the case where
the number of predictors is not too large. In light of these two concerns, this
paper proposes a new dynamic shrinkage prior which reflects the empirical
regularity that TVPs are typically sparse (i.e. time variation may occur only
episodically and only for some of the coefficients). A scalable MCMC algorithm
is developed which is capable of handling very high dimensional TVP regressions
or TVP Vector Autoregressions. In an exercise using artificial data we
demonstrate the accuracy and computational efficiency of our methods. In an
application involving the term structure of interest rates in the eurozone, we
find our dynamic shrinkage prior to effectively pick out small amounts of
parameter change and our methods to forecast well.
arXiv link: http://arxiv.org/abs/2005.03906v2
Know Your Clients' behaviours: a cluster analysis of financial transactions
securities commissions and self-regulatory organizations--charged with direct
regulation over investment dealers and mutual fund dealers--to respectively
collect and maintain Know Your Client (KYC) information, such as their age or
risk tolerance, for investor accounts. With this information, investors, under
their advisor's guidance, make decisions on their investments which are
presumed to be beneficial to their investment goals. Our unique dataset is
provided by a financial investment dealer with over 50,000 accounts for over
23,000 clients. We use a modified behavioural finance recency, frequency,
monetary model for engineering features that quantify investor behaviours, and
machine learning clustering algorithms to find groups of investors that behave
similarly. We show that the KYC information collected does not explain client
behaviours, whereas trade and transaction frequency and volume are most
informative. We believe the results shown herein encourage financial regulators
and advisors to use more advanced metrics to better understand and predict
investor behaviours.
arXiv link: http://arxiv.org/abs/2005.03625v2
Diffusion Copulas: Identification and Estimation
diffusions, where the observed process is a nonparametric transformation of an
underlying parametric diffusion (UPD). This modelling strategy yields a general
class of semiparametric Markov diffusion models with parametric dynamic copulas
and nonparametric marginal distributions. We provide primitive conditions for
the identification of the UPD parameters together with the unknown
transformations from discrete samples. Likelihood-based estimators of both
parametric and nonparametric components are developed and we analyze the
asymptotic properties of these. Kernel-based drift and diffusion estimators are
also proposed and shown to be normally distributed in large samples. A
simulation study investigates the finite sample performance of our estimators
in the context of modelling US short-term interest rates. We also present a
simple application of the proposed method for modelling the CBOE volatility
index data.
arXiv link: http://arxiv.org/abs/2005.03513v1
Distributional robustness of K-class estimators and the PULSE
arbitrarily strong interventions, they may not be optimal when the
interventions are bounded. We prove that the classical K-class estimator
satisfies such optimality by establishing a connection between K-class
estimators and anchor regression. This connection further motivates a novel
estimator in instrumental variable settings that minimizes the mean squared
prediction error subject to the constraint that the estimator lies in an
asymptotically valid confidence region of the causal coefficient. We call this
estimator PULSE (p-uncorrelated least squares estimator), relate it to work on
invariance, show that it can be computed efficiently as a data-driven K-class
estimator, even though the underlying optimization problem is non-convex, and
prove consistency. We evaluate the estimators on real data and perform
simulation experiments illustrating that PULSE suffers from less variability.
There are several settings including weak instrument settings, where it
outperforms other estimators.
arXiv link: http://arxiv.org/abs/2005.03353v3
Detecting Latent Communities in Network Formation Models
allows for assortative matching on observed individual characteristics and the
presence of edge-wise fixed effects. We model the coefficients of observed
characteristics to have a latent community structure and the edge-wise fixed
effects to be of low rank. We propose a multi-step estimation procedure
involving nuclear norm regularization, sample splitting, iterative logistic
regression and spectral clustering to detect the latent communities. We show
that the latent communities can be exactly recovered when the expected degree
of the network is of order log n or higher, where n is the number of nodes in
the network. The finite sample performance of the new estimation and inference
methods is illustrated through both simulated and real datasets.
arXiv link: http://arxiv.org/abs/2005.03226v3
Spatial dependence in the rank-size distribution of cities
The Zipf law for cities is one of those. The study views the question of
whether that global regularity is independent of different spatial
distributions of cities. For that purpose, a typical Zipfian rank-size
distribution of cities is generated with random numbers. This distribution is
then cast into different settings of spatial coordinates. For the estimation,
the variables rank and size are supplemented by spatial spillover effects in a
standard spatial econometric approach. Results suggest that distance and
contiguity effects matter. This finding is further corroborated by three
country analyses.
arXiv link: http://arxiv.org/abs/2005.02836v1
Arctic Amplification of Anthropogenic Forcing: A Vector Autoregressive Analysis
in history and keeps trending downward. The understanding of how feedback loops
amplify the effects of external CO2 forcing is still limited. We propose the
VARCTIC, which is a Vector Autoregression (VAR) designed to capture and
extrapolate Arctic feedback loops. VARs are dynamic simultaneous systems of
equations, routinely estimated to predict and understand the interactions of
multiple macroeconomic time series. The VARCTIC is a parsimonious compromise
between full-blown climate models and purely statistical approaches that
usually offer little explanation of the underlying mechanism. Our completely
unconditional forecast has SIE hitting 0 in September by the 2060's. Impulse
response functions reveal that anthropogenic CO2 emission shocks have an
unusually durable effect on SIE -- a property shared by no other shock. We find
Albedo- and Thickness-based feedbacks to be the main amplification channels
through which CO2 anomalies impact SIE in the short/medium run. Further,
conditional forecast analyses reveal that the future path of SIE crucially
depends on the evolution of CO2 emissions, with outcomes ranging from
recovering SIE to it reaching 0 in the 2050's. Finally, Albedo and Thickness
feedbacks are shown to play an important role in accelerating the speed at
which predicted SIE is heading towards 0.
arXiv link: http://arxiv.org/abs/2005.02535v4
Modeling High-Dimensional Unit-Root Time Series
high-dimensional unit-root time series by postulating that a $p$-dimensional
unit-root process is a nonsingular linear transformation of a set of unit-root
processes, a set of stationary common factors, which are dynamically dependent,
and some idiosyncratic white noise components. For the stationary components,
we assume that the factor process captures the temporal-dependence and the
idiosyncratic white noise series explains, jointly with the factors, the
cross-sectional dependence. The estimation of nonsingular linear loading spaces
is carried out in two steps. First, we use an eigenanalysis of a nonnegative
definite matrix of the data to separate the unit-root processes from the
stationary ones and a modified method to specify the number of unit roots. We
then employ another eigenanalysis and a projected principal component analysis
to identify the stationary common factors and the white noise series. We
propose a new procedure to specify the number of white noise series and, hence,
the number of stationary common factors, establish asymptotic properties of the
proposed method for both fixed and diverging $p$ as the sample size $n$
increases, and use simulation and a real example to demonstrate the performance
of the proposed method in finite samples. We also compare our method with some
commonly used ones in the literature regarding the forecast ability of the
extracted factors and find that the proposed method performs well in
out-of-sample forecasting of a 508-dimensional PM$_{2.5}$ series in Taiwan.
arXiv link: http://arxiv.org/abs/2005.03496v2
Stocks Vote with Their Feet: Can a Piece of Paper Document Fights the COVID-19 Pandemic?
essential for both policymakers and stock investors, but challenging because
the crisis has unfolded with extreme speed and the previous index was not
suitable for measuring policy effectiveness for COVID-19. This paper builds an
index of policy effectiveness on fighting COVID-19 pandemic, whose building
method is similar to the index of Policy Uncertainty, based on province-level
paper documents released in China from Jan.1st to Apr.16th of 2020. This paper
also studies the relationships among COVID-19 daily confirmed cases, stock
market volatility, and document-based policy effectiveness in China. This paper
uses the DCC-GARCH model to fit conditional covariance's change rule of
multi-series. This paper finally tests four hypotheses, about the time-space
difference of policy effectiveness and its overflow effect both on the COVID-19
pandemic and stock market. Through the inner interaction of this triad
structure, we can bring forward more specific and scientific suggestions to
maintain stability in the stock market at such exceptional times.
arXiv link: http://arxiv.org/abs/2005.02034v1
Identifying Preferences when Households are Financially Constrained
financially constrained households can narrow down the set of admissible
preferences in a large class of macroeconomic models. Estimates based on
Spanish aggregate data provide further empirical support for this result and
suggest that accounting for this margin can bring estimates closer to
microeconometric evidence. Accounting for financial constraints and the
extensive margin is shown to matter for empirical asset pricing and quantifying
distortions in financial markets.
arXiv link: http://arxiv.org/abs/2005.02010v6
The Murphy Decomposition and the Calibration-Resolution Principle: A New Perspective on Forecast Evaluation
accurate forecasts of all types, from simple point to complete probabilistic
forecasts, in terms of two fundamental underlying properties, autocalibration
and resolution, which can be interpreted as describing a lack of systematic
mistakes and a high information content. This "calibration-resolution
principle" gives a new insight into the nature of forecasting and generalizes
the famous sharpness principle by Gneiting et al. (2007) from probabilistic to
all types of forecasts. It amongst others exposes the shortcomings of several
widely used forecast evaluation methods. The principle is based on a fully
general version of the Murphy decomposition of loss functions, which I provide.
Special cases of this decomposition are well-known and widely used in
meteorology.
Besides using the decomposition in this new theoretical way, after having
introduced it and the underlying properties in a proper theoretical framework,
accompanied by an illustrative example, I also employ it in its classical sense
as a forecast evaluation method as the meteorologists do: As such, it unveils
the driving forces behind forecast errors and complements classical forecast
evaluation methods. I discuss estimation of the decomposition via kernel
regression and then apply it to popular economic forecasts. Analysis of mean
forecasts from the US Survey of Professional Forecasters and quantile forecasts
derived from Bank of England fan charts indeed yield interesting new insights
and highlight the potential of the method.
arXiv link: http://arxiv.org/abs/2005.01835v1
Neural Networks and Value at Risk
simulations of asset returns for Value at Risk threshold estimation. Using
equity markets and long term bonds as test assets in the global, US, Euro area
and UK setting over an up to 1,250 weeks sample horizon ending in August 2018,
we investigate neural networks along three design steps relating (i) to the
initialization of the neural network, (ii) its incentive function according to
which it has been trained and (iii) the amount of data we feed. First, we
compare neural networks with random seeding with networks that are initialized
via estimations from the best-established model (i.e. the Hidden Markov). We
find latter to outperform in terms of the frequency of VaR breaches (i.e. the
realized return falling short of the estimated VaR threshold). Second, we
balance the incentive structure of the loss function of our networks by adding
a second objective to the training instructions so that the neural networks
optimize for accuracy while also aiming to stay in empirically realistic regime
distributions (i.e. bull vs. bear market frequencies). In particular this
design feature enables the balanced incentive recurrent neural network (RNN) to
outperform the single incentive RNN as well as any other neural network or
established approach by statistically and economically significant levels.
Third, we half our training data set of 2,000 days. We find our networks when
fed with substantially less data (i.e. 1,000 days) to perform significantly
worse which highlights a crucial weakness of neural networks in their
dependence on very large data sets ...
arXiv link: http://arxiv.org/abs/2005.01686v2
The Information Content of Taster's Valuation in Tea Auctions of India
online. Before the auction, a sample of the tea lot is sent to potential
bidders and a group of tea tasters. The seller's reserve price is a
confidential function of the tea taster's valuation, which also possibly acts
as a signal to the bidders.
In this paper, we work with the dataset from a single tea auction house, J
Thomas, of tea dust category, on 49 weeks in the time span of 2018-2019, with
the following objectives in mind:
$\bullet$ Objective classification of the various categories of tea dust (25)
into a more manageable, and robust classification of the tea dust, based on
source and grades.
$\bullet$ Predict which tea lots would be sold in the auction market, and a
model for the final price conditioned on sale.
$\bullet$ To study the distribution of price and ratio of the sold tea
auction lots.
$\bullet$ Make a detailed analysis of the information obtained from the tea
taster's valuation and its impact on the final auction price.
The model used has shown various promising results on cross-validation. The
importance of valuation is firmly established through analysis of causal
relationship between the valuation and the actual price. The authors hope that
this study of the properties and the detailed analysis of the role played by
the various factors, would be significant in the decision making process for
the players of the auction game, pave the way to remove the manual interference
in an attempt to automate the auction procedure, and improve tea quality in
markets.
arXiv link: http://arxiv.org/abs/2005.02814v1
Ensemble Forecasting for Intraday Electricity Prices: Simulating Trajectories
evidence that the hourly German Intraday Continuous Market is weak-form
efficient. Therefore, we take a novel, advanced approach to the problem. A
probabilistic forecasting of the hourly intraday electricity prices is
performed by simulating trajectories in every trading window to receive a
realistic ensemble to allow for more efficient intraday trading and redispatch.
A generalized additive model is fitted to the price differences with the
assumption that they follow a zero-inflated distribution, precisely a mixture
of the Dirac and the Student's t-distributions. Moreover, the mixing term is
estimated using a high-dimensional logistic regression with lasso penalty. We
model the expected value and volatility of the series using i.a. autoregressive
and no-trade effects or load, wind and solar generation forecasts and
accounting for the non-linearities in e.g. time to maturity. Both the in-sample
characteristics and forecasting performance are analysed using a rolling window
forecasting study. Multiple versions of the model are compared to several
benchmark models and evaluated using probabilistic forecasting measures and
significance tests. The study aims to forecast the price distribution in the
German Intraday Continuous Market in the last 3 hours of trading, but the
approach allows for application to other continuous markets, especially in
Europe. The results prove superiority of the mixture model over the benchmarks
gaining the most from the modelling of the volatility. They also indicate that
the introduction of XBID reduced the market volatility.
arXiv link: http://arxiv.org/abs/2005.01365v3
Two Burning Questions on COVID-19: Did shutting down the economy help? Can we (partially) reopen the economy without risking the second wave?
facing us is: can we even partially reopen the economy without risking a second
wave? We first need to understand if shutting down the economy helped. And if
it did, is it possible to achieve similar gains in the war against the pandemic
while partially opening up the economy? To do so, it is critical to understand
the effects of the various interventions that can be put into place and their
corresponding health and economic implications. Since many interventions exist,
the key challenge facing policy makers is understanding the potential
trade-offs between them, and choosing the particular set of interventions that
works best for their circumstance. In this memo, we provide an overview of
Synthetic Interventions (a natural generalization of Synthetic Control), a
data-driven and statistically principled method to perform what-if scenario
planning, i.e., for policy makers to understand the trade-offs between
different interventions before having to actually enact them. In essence, the
method leverages information from different interventions that have already
been enacted across the world and fits it to a policy maker's setting of
interest, e.g., to estimate the effect of mobility-restricting interventions on
the U.S., we use daily death data from countries that enforced severe mobility
restrictions to create a "synthetic low mobility U.S." and predict the
counterfactual trajectory of the U.S. if it had indeed applied a similar
intervention. Using Synthetic Interventions, we find that lifting severe
mobility restrictions and only retaining moderate mobility restrictions (at
retail and transit locations), seems to effectively flatten the curve. We hope
this provides guidance on weighing the trade-offs between the safety of the
population, strain on the healthcare system, and impact on the economy.
arXiv link: http://arxiv.org/abs/2005.00072v2
The Interaction Between Credit Constraints and Uncertainty Shocks
activity? This question is answered by using a novel method to identify shocks
to uncertainty in access to credit. Time-variation in uncertainty about credit
availability is estimated using particle Markov Chain Monte Carlo. We extract
shocks to time-varying credit uncertainty and decompose it into two parts: the
first captures the "pure" effect of a shock to the second moment; the second
captures total effects of uncertainty including effects on the first moment.
Using state-dependent local projections, we find that the "pure" effect by
itself generates a sharp slowdown in real activity and the effects are largely
countercyclical. We feed the estimated shocks into a flexible price real
business cycle model with a collateral constraint and show that when the
collateral constraint binds, an uncertainty shock about credit access is
recessionary leading to a simultaneous decline in consumption, investment, and
output.
arXiv link: http://arxiv.org/abs/2004.14719v1
Causal Inference on Networks under Continuous Treatment Interference
also affects other units' outcome. When interference is at work, policy
evaluation mostly relies on the use of randomized experiments under cluster
interference and binary treatment. Instead, we consider a non-experimental
setting under continuous treatment and network interference. In particular, we
define spillover effects by specifying the exposure to network treatment as a
weighted average of the treatment received by units connected through physical,
social or economic interactions. We provide a generalized propensity
score-based estimator to estimate both direct and spillover effects of a
continuous treatment. Our estimator also allows to consider asymmetric network
connections characterized by heterogeneous intensities. To showcase this
methodology, we investigate whether and how spillover effects shape the optimal
level of policy interventions in agricultural markets. Our results show that,
in this context, neglecting interference may underestimate the degree of policy
effectiveness.
arXiv link: http://arxiv.org/abs/2004.13459v2
Measuring wage inequality under right censoring
the last two decades in the probability mass of the right tail of the wage
distribution, through the analysis of the corresponding tail index. In
specific, a conditional tail index estimator is introduced which explicitly
allows for right tail censoring (top-coding), which is a feature of the widely
used current population survey (CPS), as well as of other surveys. Ignoring the
top-coding may lead to inconsistent estimates of the tail index and to under or
over statements of inequality and of its evolution over time. Thus, having a
tail index estimator that explicitly accounts for this sample characteristic is
of importance to better understand and compute the tail index dynamics in the
censored right tail of the wage distribution. The contribution of this paper is
threefold: i) we introduce a conditional tail index estimator that explicitly
handles the top-coding problem, and evaluate its finite sample performance and
compare it with competing methods; ii) we highlight that the factor values used
to adjust the top-coded wage have changed over time and depend on the
characteristics of individuals, occupations and industries, and propose
suitable values; and iii) we provide an in-depth empirical analysis of the
dynamics of the US wage distribution's right tail using the public-use CPS
database from 1992 to 2017.
arXiv link: http://arxiv.org/abs/2004.12856v1
State Dependence and Unobserved Heterogeneity in the Extensive Margin of Trade
bilateral trade. Motivated by a stylized heterogeneous firms model of
international trade with market entry costs, we consider dynamic three-way
fixed effects binary choice models and study the corresponding incidental
parameter problem. The standard maximum likelihood estimator is consistent
under asymptotics where all panel dimensions grow at a constant rate, but it
has an asymptotic bias in its limiting distribution, invalidating inference
even in situations where the bias appears to be small. Thus, we propose two
different bias-corrected estimators. Monte Carlo simulations confirm their
desirable statistical properties. We apply these estimators in a reassessment
of the most commonly studied determinants of the extensive margin of trade.
Both true state dependence and unobserved heterogeneity contribute considerably
to trade persistence and taking this persistence into account matters
significantly in identifying the effects of trade policies on the extensive
margin.
arXiv link: http://arxiv.org/abs/2004.12655v2
Structural Regularization
on economic theory as regularizers for statistical models. We show that even if
a structural model is misspecified, as long as it is informative about the
data-generating mechanism, our method can outperform both the (misspecified)
structural model and un-structural-regularized statistical models. Our method
permits a Bayesian interpretation of theory as prior knowledge and can be used
both for statistical prediction and causal inference. It contributes to
transfer learning by showing how incorporating theory into statistical modeling
can significantly improve out-of-domain predictions and offers a way to
synthesize reduced-form and structural approaches for causal effect estimation.
Simulation experiments demonstrate the potential of our method in various
settings, including first-price auctions, dynamic models of entry and exit, and
demand estimation with instrumental variables. Our method has potential
applications not only in economics, but in other scientific disciplines whose
theoretical models offer important insight but are subject to significant
misspecification concerns.
arXiv link: http://arxiv.org/abs/2004.12601v4
Reducing Interference Bias in Online Marketplace Pricing Experiments
of proposed product changes. However, given that marketplaces are inherently
connected, total average treatment effect estimates obtained through Bernoulli
randomized experiments are often biased due to violations of the stable unit
treatment value assumption. This can be particularly problematic for
experiments that impact sellers' strategic choices, affect buyers' preferences
over items in their consideration set, or change buyers' consideration sets
altogether. In this work, we measure and reduce bias due to interference in
online marketplace experiments by using observational data to create clusters
of similar listings, and then using those clusters to conduct
cluster-randomized field experiments. We provide a lower bound on the magnitude
of bias due to interference by conducting a meta-experiment that randomizes
over two experiment designs: one Bernoulli randomized, one cluster randomized.
In both meta-experiment arms, treatment sellers are subject to a different
platform fee policy than control sellers, resulting in different prices for
buyers. By conducting a joint analysis of the two meta-experiment arms, we find
a large and statistically significant difference between the total average
treatment effect estimates obtained with the two designs, and estimate that
32.60% of the Bernoulli-randomized treatment effect estimate is due to
interference bias. We also find weak evidence that the magnitude and/or
direction of interference bias depends on extent to which a marketplace is
supply- or demand-constrained, and analyze a second meta-experiment to
highlight the difficulty of detecting interference bias when treatment
interventions require intention-to-treat analysis.
arXiv link: http://arxiv.org/abs/2004.12489v1
Inference with Many Weak Instruments
number of instruments can grow at the same rate or slower than the sample size.
We propose a jackknifed version of the classical weak identification-robust
Anderson-Rubin (AR) test statistic. Large-sample inference based on the
jackknifed AR is valid under heteroscedasticity and weak identification. The
feasible version of this statistic uses a novel variance estimator. The test
has uniformly correct size and good power properties. We also develop a
pre-test for weak identification that is related to the size property of a Wald
test based on the Jackknife Instrumental Variable Estimator (JIVE). This new
pre-test is valid under heteroscedasticity and with many instruments.
arXiv link: http://arxiv.org/abs/2004.12445v3
Maximum Likelihood Estimation of Stochastic Frontier Models with Endogeneity
models with endogeneity in cross-section data when the composite error term may
be correlated with inputs and environmental variables. Our framework is a
generalization of the normal half-normal stochastic frontier model with
endogeneity. We derive the likelihood function in closed form using three
fundamental assumptions: the existence of control functions that fully capture
the dependence between regressors and unobservables; the conditional
independence of the two error components given the control functions; and the
conditional distribution of the stochastic inefficiency term given the control
functions being a folded normal distribution. We also provide a Battese-Coelli
estimator of technical efficiency. Our estimator is computationally fast and
easy to implement. We study some of its asymptotic properties, and we showcase
its finite sample behavior in Monte-Carlo simulations and an empirical
application to farmers in Nepal.
arXiv link: http://arxiv.org/abs/2004.12369v3
Limiting Bias from Test-Control Interference in Online Marketplace Experiments
treatment effect (TATE), which measures the difference between the average
outcome if all users were treated and the average outcome if all users were
untreated. However, a simple difference-in-means estimator will give a biased
estimate of the TATE when outcomes of control units depend on the outcomes of
treatment units, an issue we refer to as test-control interference. Using a
simulation built on top of data from Airbnb, this paper considers the use of
methods from the network interference literature for online marketplace
experimentation. We model the marketplace as a network in which an edge exists
between two sellers if their goods substitute for one another. We then simulate
seller outcomes, specifically considering a "status quo" context and
"treatment" context that forces all sellers to lower their prices. We use the
same simulation framework to approximate TATE distributions produced by using
blocked graph cluster randomization, exposure modeling, and the Hajek estimator
for the difference in means. We find that while blocked graph cluster
randomization reduces the bias of the naive difference-in-means estimator by as
much as 62%, it also significantly increases the variance of the estimator. On
the other hand, the use of more sophisticated estimators produces mixed
results. While some provide (small) additional reductions in bias and small
reductions in variance, others lead to increased bias and variance. Overall,
our results suggest that experiment design and analysis techniques from the
network experimentation literature are promising tools for reducing bias due to
test-control interference in marketplace experiments.
arXiv link: http://arxiv.org/abs/2004.12162v1
Sensitivity to Calibrated Parameters
of model parameters and keep them fixed when estimating the remaining
parameters. Calibrated parameters likely affect conclusions based on the model
but estimation time often makes a systematic investigation of the sensitivity
to calibrated parameters infeasible. I propose a simple and computationally
low-cost measure of the sensitivity of parameters and other objects of interest
to the calibrated parameters. In the main empirical application, I revisit the
analysis of life-cycle savings motives in Gourinchas and Parker (2002) and show
that some estimates are sensitive to calibrations.
arXiv link: http://arxiv.org/abs/2004.12100v2
Bayesian Clustered Coefficients Regression with Auxiliary Covariates Assistant Random Effects
similarities between regions, and estimate their shared coefficients in
economics models. In this article, we propose a mixture of finite mixtures
(MFM) clustered regression model with auxiliary covariates that account for
similarities in demographic or economic characteristics over a spatial domain.
Our Bayesian construction provides both inference for number of clusters and
clustering configurations, and estimation for parameters for each cluster.
Empirical performance of the proposed model is illustrated through simulation
experiments, and further applied to a study of influential factors for monthly
housing cost in Georgia.
arXiv link: http://arxiv.org/abs/2004.12022v2
From orders to prices: A stochastic description of the limit order book to forecast intraday returns
events in the limit order book (LOB): order arrivals and cancellations. It is
based on an operator algebra for individual orders and describes their effect
on the LOB. The model inputs are arrival and cancellation rate distributions
that emerge from individual behavior of traders, and we show how prices and
liquidity arise from the LOB dynamics. In a simulation study we illustrate how
the model works and highlight its sensitivity with respect to assumptions
regarding the collective behavior of market participants. Empirically, we test
the model on a LOB snapshot of XETRA, estimate several linearized model
specifications, and conduct in- and out-of-sample forecasts.The in-sample
results based on contemporaneous information suggest that our model describes
returns very well, resulting in an adjusted $R^2$ of roughly 80%. In the more
realistic setting where only past information enters the model, we observe an
adjusted $R^2$ around 15%. The direction of the next return can be predicted
(out-of-sample) with an accuracy above 75% for time horizons below 10 minutes.
On average, we obtain an RMSPE that is 10 times lower than values documented in
the literature.
arXiv link: http://arxiv.org/abs/2004.11953v2
Microeconometrics with Partial Identification
identification, focusing on the developments of the last thirty years. The
topics presented illustrate that the available data combined with credible
maintained assumptions may yield much information about a parameter of
interest, even if they do not reveal it exactly. Special attention is devoted
to discussing the challenges associated with, and some of the solutions put
forward to, (1) obtain a tractable characterization of the values for the
parameters of interest which are observationally equivalent, given the
available data and maintained assumptions; (2) estimate this set of values; (3)
conduct test of hypotheses and make confidence statements. The chapter reviews
advances in partial identification analysis both as applied to learning
(functionals of) probability distributions that are well-defined in the absence
of models, as well as to learning parameters that are well-defined only in the
context of particular models. A simple organizing principle is highlighted: the
source of the identification problem can often be traced to a collection of
random variables that are consistent with the available data and maintained
assumptions. This collection may be part of the observed data or be a model
implication. In either case, it can be formalized as a random set. Random set
theory is then used as a mathematical framework to unify a number of special
results and produce a general methodology to carry out partial identification
analysis.
arXiv link: http://arxiv.org/abs/2004.11751v1
A Comparison of Methods for Treatment Assignment with an Application to Playlist Generation
treatment assignment, a general problem that arises in many applications and
has received significant attention from economists, computer scientists, and
social scientists. We group the various methods proposed in the literature into
three general classes of algorithms (or metalearners): learning models to
predict outcomes (the O-learner), learning models to predict causal effects
(the E-learner), and learning models to predict optimal treatment assignments
(the A-learner). We compare the metalearners in terms of (1) their level of
generality and (2) the objective function they use to learn models from data;
we then discuss the implications that these characteristics have for modeling
and decision making. Notably, we demonstrate analytically and empirically that
optimizing for the prediction of outcomes or causal effects is not the same as
optimizing for treatment assignments, suggesting that in general the A-learner
should lead to better treatment assignments than the other metalearners. We
demonstrate the practical implications of our findings in the context of
choosing, for each user, the best algorithm for playlist generation in order to
optimize engagement. This is the first comparison of the three different
metalearners on a real-world application at scale (based on more than half a
billion individual treatment assignments). In addition to supporting our
analytical findings, the results show how large A/B tests can provide
substantial value for learning treatment assignment policies, rather than
simply choosing the variant that performs best on average.
arXiv link: http://arxiv.org/abs/2004.11532v5
Machine Learning Econometrics: Bayesian algorithms and methods
vastly, a challenge for future generations of econometricians will be to master
efficient algorithms for inference in empirical models with large information
sets. This Chapter provides a review of popular estimation algorithms for
Bayesian inference in econometrics and surveys alternative algorithms developed
in machine learning and computing science that allow for efficient computation
in high-dimensional settings. The focus is on scalability and parallelizability
of each algorithm, as well as their ability to be adopted in various empirical
settings in economics and finance.
arXiv link: http://arxiv.org/abs/2004.11486v1
High-dimensional macroeconomic forecasting using message passing algorithms
large information sets and structural instabilities. First, it treats a
regression model with time-varying coefficients, stochastic volatility and
exogenous predictors, as an equivalent high-dimensional static regression
problem with thousands of covariates. Inference in this specification proceeds
using Bayesian hierarchical priors that shrink the high-dimensional vector of
coefficients either towards zero or time-invariance. Second, it introduces the
frameworks of factor graphs and message passing as a means of designing
efficient Bayesian estimation algorithms. In particular, a Generalized
Approximate Message Passing (GAMP) algorithm is derived that has low
algorithmic complexity and is trivially parallelizable. The result is a
comprehensive methodology that can be used to estimate time-varying parameter
regressions with arbitrarily large number of exogenous predictors. In a
forecasting exercise for U.S. price inflation this methodology is shown to work
very well.
arXiv link: http://arxiv.org/abs/2004.11485v1
Does Subjective Well-being Contribute to Our Understanding of Mexican Well-being?
question surveys can improve our understanding of well-being in Mexico. The
research uses data at the level of the 32 federal entities or States, taking
advantage of the heterogeneity in development indicator readings between and
within geographical areas, the product of socioeconomic inequality. The data
come principally from two innovative subjective questionnaires, BIARE and
ENVIPE, which intersect in their fully representative state-wide applications
in 2014, but also from conventional objective indicator sources such as the HDI
and conventional surveys. This study uses two approaches, a descriptive
analysis of a state-by-state landscape of indicators, both subjective and
objective, in an initial search for stand-out well-being patterns, and an
econometric study of a large selection of mainly subjective indicators inspired
by theory and the findings of previous Mexican research. Descriptive analysis
confirms that subjective well-being correlates strongly with and complements
objective data, providing interesting directions for analysis. The econometrics
literature indicates that happiness increases with income and satisfying of
material needs as theory suggests, but also that Mexicans are relatively happy
considering their mediocre incomes and high levels of insecurity, the last of
which, by categorizing according to satisfaction with life, can be shown to
impact poorer people disproportionately. The article suggests that well-being
is a complex, multidimensional construct which can be revealed by using
exploratory multi-regression and partial correlations models which juxtapose
subjective and objective indicators.
arXiv link: http://arxiv.org/abs/2004.11420v1
Bayesian Optimization of Hyperparameters from Noisy Marginal Likelihood Estimates
maximizing the marginal likelihood. Bayesian optimization is a popular
iterative method where a Gaussian process posterior of the underlying function
is sequentially updated by new function evaluations. An acquisition strategy
uses this posterior distribution to decide where to place the next function
evaluation. We propose a novel Bayesian optimization framework for situations
where the user controls the computational effort, and therefore the precision
of the function evaluations. This is a common situation in econometrics where
the marginal likelihood is often computed by Markov chain Monte Carlo (MCMC) or
importance sampling methods, with the precision of the marginal likelihood
estimator determined by the number of samples. The new acquisition strategy
gives the optimizer the option to explore the function with cheap noisy
evaluations and therefore find the optimum faster. The method is applied to
estimating the prior hyperparameters in two popular models on US macroeconomic
time series data: the steady-state Bayesian vector autoregressive (BVAR) and
the time-varying parameter BVAR with stochastic volatility. The proposed method
is shown to find the optimum much quicker than traditional Bayesian
optimization or grid search.
arXiv link: http://arxiv.org/abs/2004.10092v2
Revealing Cluster Structures Based on Mixed Sampling Frequencies
develops a framework to infer clusters in a panel regression with mixed
frequency data. The linearized MIDAS estimation method is more flexible and
substantially simpler to implement than competing approaches. We show that the
proposed clustering algorithm successfully recovers true membership in the
cross-section, both in theory and in simulations, without requiring prior
knowledge of the number of clusters. This methodology is applied to a
mixed-frequency Okun's law model for state-level data in the U.S. and uncovers
four meaningful clusters based on the dynamic features of state-level labor
markets.
arXiv link: http://arxiv.org/abs/2004.09770v2
Inference by Stochastic Optimization: A Free-Lunch Bootstrap
the asymptotic variance is not analytically tractable. Bootstrap inference
offers a feasible solution but can be computationally costly especially when
the model is complex. This paper uses iterates of a specially designed
stochastic optimization algorithm as draws from which both point estimates and
bootstrap standard errors can be computed in a single run. The draws are
generated by the gradient and Hessian computed from batches of data that are
resampled at each iteration. We show that these draws yield consistent
estimates and asymptotically valid frequentist inference for a large class of
regular problems. The algorithm provides accurate standard errors in simulation
examples and empirical applications at low computational costs. The draws from
the algorithm also provide a convenient way to detect data irregularities.
arXiv link: http://arxiv.org/abs/2004.09627v3
Noise-Induced Randomization in Regression Discontinuity Designs
treatment is determined by whether an observed running variable crosses a
pre-specified threshold. Here we propose a new approach to identification,
estimation, and inference in regression discontinuity designs that uses
knowledge about exogenous noise (e.g., measurement error) in the running
variable. In our strategy, we weight treated and control units to balance a
latent variable of which the running variable is a noisy measure. Our approach
is driven by effective randomization provided by the noise in the running
variable, and complements standard formal analyses that appeal to continuity
arguments while ignoring the stochastic nature of the assignment mechanism.
arXiv link: http://arxiv.org/abs/2004.09458v5
Awareness of crash risk improves Kelly strategies in simulated financial time series
crashes proposed in Kreuser and Sornette (2018). The price process is defined
as a geometric random walk combined with jumps modelled by separate, discrete
distributions associated with positive (and negative) bubbles. The key
ingredient of the model is to assume that the sizes of the jumps are
proportional to the bubble size. Thus, the jumps tend to efficiently bring back
excess bubble prices close to a normal or fundamental value (efficient
crashes). This is different from existing processes studied that assume jumps
that are independent of the mispricing. The present model is simplified
compared to Kreuser and Sornette (2018) in that we ignore the possibility of a
change of the probability of a crash as the price accelerates above the normal
price. We study the behaviour of investment strategies that maximize the
expected log of wealth (Kelly criterion) for the risky asset and a risk-free
asset. We show that the method behaves similarly to Kelly on Geometric Brownian
Motion in that it outperforms other methods in the long-term and it beats
classical Kelly. As a primary source of outperformance, we determine knowledge
about the presence of crashes, but interestingly find that knowledge of only
the size, and not the time of occurrence, already provides a significant and
robust edge. We then perform an error analysis to show that the method is
robust with respect to variations in the parameters. The method is most
sensitive to errors in the expected return.
arXiv link: http://arxiv.org/abs/2004.09368v1
Multi-frequency-band tests for white noise under heteroskedasticity
white noise hypothesis by using the maximum overlap discrete wavelet packet
transform (MODWPT). The MODWPT allows the variance of a process to be
decomposed into the variance of its components on different equal-length
frequency sub-bands, and the MFB tests then measure the distance between the
MODWPT-based variance ratio and its theoretical null value jointly over several
frequency sub-bands. The resulting MFB tests have the chi-squared asymptotic
null distributions under mild conditions, which allow the data to be
heteroskedastic. The MFB tests are shown to have the desirable size and power
performance by simulation studies, and their usefulness is further illustrated
by two applications.
arXiv link: http://arxiv.org/abs/2004.09161v1
Consistent Calibration of Economic Scenario Generators: The Case for Conditional Simulation
forward in time for risk management and asset allocation purposes. It is often
not feasible to calibrate the dynamics of all variables within the ESG to
historical data alone. Calibration to forward-information such as future
scenarios and return expectations is needed for stress testing and portfolio
optimization, but no generally accepted methodology is available. This paper
introduces the Conditional Scenario Simulator, which is a framework for
consistently calibrating simulations and projections of economic and financial
variables both to historical data and forward-looking information. The
framework can be viewed as a multi-period, multi-factor generalization of the
Black-Litterman model, and can embed a wide array of financial and
macroeconomic models. Two practical examples demonstrate this in a frequentist
and Bayesian setting.
arXiv link: http://arxiv.org/abs/2004.09042v1
Estimating High-Dimensional Discrete Choice Model of Differentiated Products with Random Coefficients
differentiated products with possibly high-dimensional product attributes. In
our model, high-dimensional attributes can be determinants of both mean and
variance of the indirect utility of a product. The key restriction in our model
is that the high-dimensional attributes affect the variance of indirect
utilities only through finitely many indices. In a framework of the
random-coefficients logit model, we show a bound on the error rate of a
$l_1$-regularized minimum distance estimator and prove the asymptotic linearity
of the de-biased estimator.
arXiv link: http://arxiv.org/abs/2004.08791v1
Loss aversion and the welfare ranking of policy interventions
policy interventions in terms of welfare when individuals are loss-averse. Our
new criterion for "loss aversion-sensitive dominance" defines a weak partial
ordering of the distributions of policy-induced gains and losses. It applies to
the class of welfare functions which model individual preferences with
non-decreasing and loss-averse attitudes towards changes in outcomes. We also
develop new statistical methods to test loss aversion-sensitive dominance in
practice, using nonparametric plug-in estimates; these allow inference to be
conducted through a special resampling procedure. Since point-identification of
the distribution of policy-induced gains and losses may require strong
assumptions, we extend our comparison criteria, test statistics, and resampling
procedures to the partially-identified case. We illustrate our methods with a
simple empirical application to the welfare comparison of alternative income
support programs in the US.
arXiv link: http://arxiv.org/abs/2004.08468v4
Estimating and Projecting Air Passenger Traffic during the COVID-19 Coronavirus Outbreak and its Socio-Economic Impact
traffic worldwide with the scope of analyze the impact of travel ban on the
aviation sector. Based on historical data from January 2010 till October 2019,
a forecasting model is implemented in order to set a reference baseline. Making
use of airplane movements extracted from online flight tracking platforms and
on-line booking systems, this study presents also a first assessment of recent
changes in flight activity around the world as a result of the COVID-19
pandemic. To study the effects of air travel ban on aviation and in turn its
socio-economic, several scenarios are constructed based on past pandemic crisis
and the observed flight volumes. It turns out that, according to this
hypothetical scenarios, in the first Quarter of 2020 the impact of aviation
losses could have negatively reduced World GDP by 0.02% to 0.12% according to
the observed data and, in the worst case scenarios, at the end of 2020 the loss
could be as high as 1.41-1.67% and job losses may reach the value of 25-30
millions. Focusing on EU27, the GDP loss may amount to 1.66-1.98% by the end of
2020 and the number of job losses from 4.2 to 5 millions in the worst case
scenarios. Some countries will be more affected than others in the short run
and most European airlines companies will suffer from the travel ban.
arXiv link: http://arxiv.org/abs/2004.08460v2
Causal Inference under Outcome-Based Sampling with Monotonicity Assumptions
Specifically, we focus on the binary-outcome and binary-treatment case, where
the parameters of interest are causal relative and attributable risks defined
via the potential outcome framework. It is shown that strong ignorability is
not always as powerful as it is under random sampling and that certain
monotonicity assumptions yield comparable results in terms of sharp identified
intervals. Specifically, the usual odds ratio is shown to be a sharp identified
upper bound on causal relative risk under the monotone treatment response and
monotone treatment selection assumptions. We offer algorithms for inference on
the causal parameters that are aggregated over the true population distribution
of the covariates. We show the usefulness of our approach by studying three
empirical examples: the benefit of attending private school for entering a
prestigious university in Pakistan; the relationship between staying in school
and getting involved with drug-trafficking gangs in Brazil; and the link
between physicians' hours and size of the group practice in the United States.
arXiv link: http://arxiv.org/abs/2004.08318v6
The direct and spillover effects of a nationwide socio-emotional learning program for disruptive students
improve their classroom behavior. Small-scale programs in high-income countries
have been shown to improve treated students' behavior and academic outcomes.
Using a randomized experiment, we show that a nationwide SEL program in Chile
has no effect on eligible students. We find evidence that very disruptive
students may hamper the program's effectiveness. ADHD, a disorder correlated
with disruptiveness, is much more prevalent in Chile than in high-income
countries, so very disruptive students may be more present in Chile than in the
contexts where SEL programs have been shown to work.
arXiv link: http://arxiv.org/abs/2004.08126v1
Short-Term Covid-19 Forecast for Latecomers
the availability of reliable forecasts for the number of cases in the coming
days is of fundamental importance. We propose a simple statistical method for
short-term real-time forecasting of the number of Covid-19 cases and fatalities
in countries that are latecomers -- i.e., countries where cases of the disease
started to appear some time after others. In particular, we propose a penalized
(LASSO) regression with an error correction mechanism to construct a model of a
latecomer in terms of the other countries that were at a similar stage of the
pandemic some days before. By tracking the number of cases and deaths in those
countries, we forecast through an adaptive rolling-window scheme the number of
cases and deaths in the latecomer. We apply this methodology to Brazil, and
show that (so far) it has been performing very well. These forecasts aim to
foster a better short-run management of the health system capacity.
arXiv link: http://arxiv.org/abs/2004.07977v3
Identification of a class of index models: A topological approach
models using a novel approach that relies on general topological results. Our
proof strategy requires substantially weaker conditions on the functions and
distributions characterizing the model compared to existing strategies; in
particular, it does not require any large support conditions on the regressors
of our model. We apply the general identification result to additive random
utility and competing risk models.
arXiv link: http://arxiv.org/abs/2004.07900v1
Non-linear interlinkages and key objectives amongst the Paris Agreement and the Sustainable Development Goals
development are manifested in the Paris Agreement and the Sustainable
Development Goals (SDGs), respectively. These are inherently inter-linked as
progress towards some of these objectives may accelerate or hinder progress
towards others. We investigate how these two agendas influence each other by
defining networks of 18 nodes, consisting of the 17 SDGs and climate change,
for various groupings of countries. We compute a non-linear measure of
conditional dependence, the partial distance correlation, given any subset of
the remaining 16 variables. These correlations are treated as weights on edges,
and weighted eigenvector centralities are calculated to determine the most
important nodes. We find that SDG 6, clean water and sanitation, and SDG 4,
quality education, are most central across nearly all groupings of countries.
In developing regions, SDG 17, partnerships for the goals, is strongly
connected to the progress of other objectives in the two agendas whilst,
somewhat surprisingly, SDG 8, decent work and economic growth, is not as
important in terms of eigenvector centrality.
arXiv link: http://arxiv.org/abs/2004.09318v1
Epidemic control via stochastic optimal control
of this type are used in mathematical epidemiology to capture the time
evolution of highly infectious diseases such as COVID-19. Our approach relies
on reformulating the Hamilton-Jacobi-Bellman equation as a stochastic minimum
principle. This results in a system of forward backward stochastic differential
equations, which is amenable to numerical solution via Monte Carlo simulations.
We present a number of numerical solutions of the system under a variety of
scenarios.
arXiv link: http://arxiv.org/abs/2004.06680v3
On Vickrey's Income Averaging
continuity, and the boundary condition for the present. These properties yield
a unique averaging function that is the density of the reflected Brownian
motion with a drift started at the current income and moving over the past
incomes. When averaging is done over the short past, the weighting function is
asymptotically converging to a Gaussian. When averaging is done over the long
horizon, the weighing function converges to the exponential distribution. For
all intermediate averaging scales, we derive an explicit solution that
interpolates between the two.
arXiv link: http://arxiv.org/abs/2004.06289v1
Estimating the COVID-19 Infection Rate: Anatomy of an Inference Problem
accuracy of tests, reported rates of population infection by the SARS CoV-2
virus are lower than actual rates of infection. Hence, reported rates of severe
illness conditional on infection are higher than actual rates. Understanding
the time path of the COVID-19 pandemic has been hampered by the absence of
bounds on infection rates that are credible and informative. This paper
explains the logical problem of bounding these rates and reports illustrative
findings, using data from Illinois, New York, and Italy. We combine the data
with assumptions on the infection rate in the untested population and on the
accuracy of the tests that appear credible in the current context. We find that
the infection rate might be substantially higher than reported. We also find
that the infection fatality rate in Italy is substantially lower than reported.
arXiv link: http://arxiv.org/abs/2004.06178v1
A Machine Learning Approach for Flagging Incomplete Bid-rigging Cartels
useful for detecting incomplete bid-rigging cartels. Our approach combines
screens, i.e. statistics derived from the distribution of bids in a tender,
with machine learning to predict the probability of collusion. As a
methodological innovation, we calculate such screens for all possible subgroups
of three or four bids within a tender and use summary statistics like the mean,
median, maximum, and minimum of each screen as predictors in the machine
learning algorithm. This approach tackles the issue that competitive bids in
incomplete cartels distort the statistical signals produced by bid rigging. We
demonstrate that our algorithm outperforms previously suggested methods in
applications to incomplete cartels based on empirical data from Switzerland.
arXiv link: http://arxiv.org/abs/2004.05629v1
Wild Bootstrap Inference for Penalized Quantile Regression for Longitudinal Data
has focused primarily on point estimation. In this work, we investigate
statistical inference. We propose a wild residual bootstrap procedure and show
that it is asymptotically valid for approximating the distribution of the
penalized estimator. The model puts no restrictions on individual effects, and
the estimator achieves consistency by letting the shrinkage decay in importance
asymptotically. The new method is easy to implement and simulation studies show
that it has accurate small sample behavior in comparison with existing
procedures. Finally, we illustrate the new approach using U.S. Census data to
estimate a model that includes more than eighty thousand parameters.
arXiv link: http://arxiv.org/abs/2004.05127v3
mFLICA: An R package for Inferring Leadership of Coordination From Time Series
collective goals. One of special cases of leadership is the coordinated pattern
initiation. In this context, leaders are initiators who initiate coordinated
patterns that everyone follows. Given a set of individual-multivariate time
series of real numbers, the mFLICA package provides a framework for R users to
infer coordination events within time series, initiators and followers of these
coordination events, as well as dynamics of group merging and splitting. The
mFLICA package also has a visualization function to make results of leadership
inference more understandable. The package is available on Comprehensive R
Archive Network (CRAN) at https://CRAN.R-project.org/package=mFLICA.
arXiv link: http://arxiv.org/abs/2004.06092v3
Direct and spillover effects of a new tramway line on the commercial vitality of peripheral streets. A synthetic-control approach
rails can cause changes on a very detailed spatial scale, with different
stories unfolding next to each other within a same urban neighborhood. We study
the direct effect of a light rail line built in Florence (Italy) on the retail
density of the street where it was built and and its spillover effect on other
streets in the treated street's neighborhood. To this aim, we investigate the
use of the Synthetic Control Group (SCG) methods in panel comparative case
studies where interference between the treated and the untreated units is
plausible, an issue still little researched in the SCG methodological
literature. We frame our discussion in the potential outcomes approach. Under a
partial interference assumption, we formally define relevant direct and
spillover causal effects. We also consider the “unrealized” spillover effect
on the treated street in the hypothetical scenario that another street in the
treated unit's neighborhood had been assigned to the intervention.
arXiv link: http://arxiv.org/abs/2004.05027v5
Forecasts with Bayesian vector autoregressions under real time conditions
taking a real time versus pseudo out-of-sample perspective. We use monthly
vintages for the United States (US) and the Euro Area (EA) and estimate a set
of vector autoregressive (VAR) models of different sizes with constant and
time-varying parameters (TVPs) and stochastic volatility (SV). Our results
suggest differences in the relative ordering of model performance for point and
density forecasts depending on whether real time data or truncated final
vintages in pseudo out-of-sample simulations are used for evaluating forecasts.
No clearly superior specification for the US or the EA across variable types
and forecast horizons can be identified, although larger models featuring TVPs
appear to be affected the least by missing values and data revisions. We
identify substantial differences in performance metrics with respect to whether
forecasts are produced for the US or the EA.
arXiv link: http://arxiv.org/abs/2004.04984v1
On the Factors Influencing the Choices of Weekly Telecommuting Frequencies of Post-secondary Students in Toronto
choices by post-secondary students in Toronto. It uses a dataset collected
through a large-scale travel survey conducted on post-secondary students of
four major universities in Toronto and it employs multiple alternative
econometric modelling techniques for the empirical investigation. Results
contribute on two fronts. Firstly, it presents empirical investigations of
factors affecting telecommuting frequency choices of post-secondary students
that are rare in literature. Secondly, it identifies better a performing
econometric modelling technique for modelling telecommuting frequency choices.
Empirical investigation clearly reveals that telecommuting for school related
activities is prevalent among post-secondary students in Toronto. Around 80
percent of 0.18 million of the post-secondary students of the region, who make
roughly 36,000 trips per day, also telecommute at least once a week.
Considering that large numbers of students need to spend a long time travelling
from home to campus with around 33 percent spending more than two hours a day
on travelling, telecommuting has potential to enhance their quality of life.
Empirical investigations reveal that car ownership and living farther from the
campus have similar positive effects on the choice of higher frequency of
telecommuting. Students who use a bicycle for regular travel are least likely
to telecommute, compared to those using transit or a private car.
arXiv link: http://arxiv.org/abs/2004.04683v1
Bias optimal vol-of-vol estimation: the role of window overlapping
parameters involved in estimating the integrated volatility of the spot
volatility via the simple realized estimator by Barndorff-Nielsen and Veraart
(2009). Our analytic results are obtained assuming that the spot volatility is
a continuous mean-reverting process and that consecutive local windows for
estimating the spot volatility are allowed to overlap in a finite sample
setting. Moreover, our analytic results support some optimal selections of
tuning parameters prescribed in the literature, based on numerical evidence.
Interestingly, it emerges that window-overlapping is crucial for optimizing the
finite-sample bias of volatility-of-volatility estimates.
arXiv link: http://arxiv.org/abs/2004.04013v2
Manipulation-Proof Machine Learning
In many settings, from consumer credit to criminal justice, those decisions are
made by applying an estimator to data on an individual's observed behavior. But
when consequential decisions are encoded in rules, individuals may
strategically alter their behavior to achieve desired outcomes. This paper
develops a new class of estimator that is stable under manipulation, even when
the decision rule is fully transparent. We explicitly model the costs of
manipulating different behaviors, and identify decision rules that are stable
in equilibrium. Through a large field experiment in Kenya, we show that
decision rules estimated with our strategy-robust method outperform those based
on standard supervised learning approaches.
arXiv link: http://arxiv.org/abs/2004.03865v1
Robust Empirical Bayes Confidence Intervals
means problem. The intervals are centered at the usual linear empirical Bayes
estimator, but use a critical value accounting for shrinkage. Parametric EBCIs
that assume a normal distribution for the means (Morris, 1983b) may
substantially undercover when this assumption is violated. In contrast, our
EBCIs control coverage regardless of the means distribution, while remaining
close in length to the parametric EBCIs when the means are indeed Gaussian. If
the means are treated as fixed, our EBCIs have an average coverage guarantee:
the coverage probability is at least $1 - \alpha$ on average across the $n$
EBCIs for each of the means. Our empirical application considers the effects of
U.S. neighborhoods on intergenerational mobility.
arXiv link: http://arxiv.org/abs/2004.03448v4
Inference in Unbalanced Panel Data Models with Interactive Fixed Effects
estimator in unbalanced panels where the source of attrition is conditionally
random. For inference, we propose a method of alternating projections algorithm
based on straightforward scalar expressions to compute the residualized
variables required for the estimation of the bias terms and the covariance
matrix. Simulation experiments confirm our asymptotic results as reliable
finite sample approximations. Furthermore, we reassess Acemoglu et al. (2019).
Allowing for a more general form of unobserved heterogeneity, we confirm
significant effects of democratization on growth.
arXiv link: http://arxiv.org/abs/2004.03414v2
Visualising the Evolution of English Covid-19 Cases with Topological Data Analysis Ball Mapper
trends and maps. Whilst these are helpful, they neglect important
multi-dimensional interactions between characteristics of communities. Using
the Topological Data Analysis Ball Mapper algorithm we construct an abstract
representation of NUTS3 level economic data, overlaying onto it the confirmed
cases of Covid-19 in England. In so doing we may understand how the disease
spreads on different socio-economical dimensions. It is observed that some
areas of the characteristic space have quickly raced to the highest levels of
infection, while others close by in the characteristic space, do not show large
infection growth. Likewise, we see patterns emerging in very different areas
that command more monitoring. A strong contribution for Topological Data
Analysis, and the Ball Mapper algorithm especially, in comprehending dynamic
epidemic data is signposted.
arXiv link: http://arxiv.org/abs/2004.03282v2
Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments
treatment variables, under unconfoundedness and with nonparametric or
high-dimensional nuisance functions. Our double debiased machine learning (DML)
estimators for the average dose-response function (or the average structural
function) and the partial effects are asymptotically normal with non-parametric
convergence rates. The first-step estimators for the nuisance conditional
expectation function and the conditional density can be nonparametric or ML
methods. Utilizing a kernel-based doubly robust moment function and
cross-fitting, we give high-level conditions under which the nuisance function
estimators do not affect the first-order large sample distribution of the DML
estimators. We provide sufficient low-level conditions for kernel, series, and
deep neural networks. We justify the use of kernel to localize the continuous
treatment at a given value by the Gateaux derivative. We implement various ML
methods in Monte Carlo simulations and an empirical application on a job
training program evaluation
arXiv link: http://arxiv.org/abs/2004.03036v8
What do online listings tell us about the housing market?
limitations, that recently started to be overcome using data coming from
housing sales advertisements (ads) websites. In this paper, using a large
dataset of ads in Italy, we provide the first comprehensive analysis of the
problems and potential of these data. The main problem is that multiple ads
("duplicates") can correspond to the same housing unit. We show that this issue
is mainly caused by sellers' attempt to increase visibility of their listings.
Duplicates lead to misrepresentation of the volume and composition of housing
supply, but this bias can be corrected by identifying duplicates with machine
learning tools. We then focus on the potential of these data. We show that the
timeliness, granularity, and online nature of these data allow monitoring of
housing demand, supply and liquidity, and that the (asking) prices posted on
the website can be more informative than transaction prices.
arXiv link: http://arxiv.org/abs/2004.02706v1
Spanning analysis of stock market anomalies under Prospect Stochastic Dominance
securities or relaxing investment constraints improves the investment
opportunity set for prospect investors. We formulate a new testing procedure
for prospect spanning for two nested portfolio sets based on subsampling and
Linear Programming. In an application, we use the prospect spanning framework
to evaluate whether well-known anomalies are spanned by standard factors. We
find that of the strategies considered, many expand the opportunity set of the
prospect type investors, thus have real economic value for them. In-sample and
out-of-sample results prove remarkably consistent in identifying genuine
anomalies for prospect investors.
arXiv link: http://arxiv.org/abs/2004.02670v1
Kernel Estimation of Spot Volatility with Microstructure Noise Using Pre-Averaging
semimartingale using a kernel estimator. We prove a Central Limit Theorem with
optimal convergence rate for a general two-sided kernel. Next, we introduce a
new pre-averaging/kernel estimator for spot volatility to handle the
microstructure noise of ultra high-frequency observations. We prove a Central
Limit Theorem for the estimation error with an optimal rate and study the
optimal selection of the bandwidth and kernel functions. We show that the
pre-averaging/kernel estimator's asymptotic variance is minimal for exponential
kernels, hence, justifying the need of working with kernels of unbounded
support as proposed in this work. We also develop a feasible implementation of
the proposed estimators with optimal bandwidth. Monte Carlo experiments confirm
the superior performance of the devised method.
arXiv link: http://arxiv.org/abs/2004.01865v3
Estimation and Uniform Inference in Sparse High-Dimensional Additive Models
nonparametric component $f_1$ in the sparse additive model $Y=f_1(X_1)+\ldots +
f_p(X_p) + \varepsilon$ in a high-dimensional setting. Our method integrates
sieve estimation into a high-dimensional Z-estimation framework, facilitating
the construction of uniformly valid confidence bands for the target component
$f_1$. To form these confidence bands, we employ a multiplier bootstrap
procedure. Additionally, we provide rates for the uniform lasso estimation in
high dimensions, which may be of independent interest. Through simulation
studies, we demonstrate that our proposed method delivers reliable results in
terms of estimation and coverage, even in small samples.
arXiv link: http://arxiv.org/abs/2004.01623v2
Targeting predictors in random forest regression
of high-dimensional data. Nonetheless, its benefits may be lessened in sparse
settings due to weak predictors, and a pre-estimation dimension reduction
(targeting) step is required. We show that proper targeting controls the
probability of placing splits along strong predictors, thus providing an
important complement to RF's feature sampling. This is supported by simulations
using representative finite samples. Moreover, we quantify the immediate gain
from targeting in terms of increased strength of individual trees.
Macroeconomic and financial applications show that the bias-variance trade-off
implied by targeting, due to increased correlation among trees in the forest,
is balanced at a medium degree of targeting, selecting the best 10--30% of
commonly applied predictors. Improvements in predictive accuracy of targeted RF
relative to ordinary RF are considerable, up to 12-13%, occurring both in
recessions and expansions, particularly at long horizons.
arXiv link: http://arxiv.org/abs/2004.01411v4
Machine Learning Algorithms for Financial Asset Price Forecasting
algorithms and techniques that can be used for financial asset price
forecasting. The prediction and forecasting of asset prices and returns remains
one of the most challenging and exciting problems for quantitative finance and
practitioners alike. The massive increase in data generated and captured in
recent years presents an opportunity to leverage Machine Learning algorithms.
This study directly compares and contrasts state-of-the-art implementations of
modern Machine Learning algorithms on high performance computing (HPC)
infrastructures versus the traditional and highly popular Capital Asset Pricing
Model (CAPM) on U.S equities data. The implemented Machine Learning models -
trained on time series data for an entire stock universe (in addition to
exogenous macroeconomic variables) significantly outperform the CAPM on
out-of-sample (OOS) test data.
arXiv link: http://arxiv.org/abs/2004.01504v1
Optimal Combination of Arctic Sea Ice Extent Measures: A Dynamic Factor Modeling Approach
as well as an accelerant for future global warming. Since 1978, Arctic sea ice
has been measured using satellite-based microwave sensing; however, different
measures of Arctic sea ice extent have been made available based on differing
algorithmic transformations of the raw satellite data. We propose and estimate
a dynamic factor model that combines four of these measures in an optimal way
that accounts for their differing volatility and cross-correlations. We then
use the Kalman smoother to extract an optimal combined measure of Arctic sea
ice extent. It turns out that almost all weight is put on the NSIDC Sea Ice
Index, confirming and enhancing confidence in the Sea Ice Index and the NASA
Team algorithm on which it is based.
arXiv link: http://arxiv.org/abs/2003.14276v2
A wavelet analysis of inter-dependence, contagion and long memory among global equity markets
equity markets from a time-frequency perspective. An analysis grounded on this
framework allows one to capture information from a different dimension, as
opposed to the traditional time domain analyses, where multiscale structures of
financial markets are clearly extracted. In financial time series, multiscale
features manifest themselves due to presence of multiple time horizons. The
existence of multiple time horizons necessitates a careful investigation of
each time horizon separately as market structures are not homogenous across
different time horizons. The presence of multiple time horizons, with varying
levels of complexity, requires one to investigate financial time series from a
heterogeneous market perspective where market players are said to operate at
different investment horizons. This thesis extends the application of
time-frequency based wavelet techniques to: i) analyse the interdependence of
global equity markets from a heterogeneous investor perspective with a special
focus on the Indian stock market, ii) investigate the contagion effect, if any,
of financial crises on Indian stock market, and iii) to study fractality and
scaling properties of global equity markets and analyse the efficiency of
Indian stock markets using wavelet based long memory methods.
arXiv link: http://arxiv.org/abs/2003.14110v1
Specification tests for generalized propensity scores using double projections
specification of models based on conditional moment restrictions, paying
particular attention to generalized propensity score models. The test procedure
is based on two different projection arguments, leading to test statistics that
are suitable to setups with many covariates, and are (asymptotically) invariant
to the estimation method used to estimate the nuisance parameters. We show that
our proposed tests are able to detect a broad class of local alternatives
converging to the null at the usual parametric rate and illustrate its
attractive power properties via simulations. We also extend our proposal to
test parametric or semiparametric single-index-type models.
arXiv link: http://arxiv.org/abs/2003.13803v2
High-dimensional mixed-frequency IV regression
sampled at mixed frequencies. We show that the high-dimensional slope parameter
of a high-frequency covariate can be identified and accurately estimated
leveraging on a low-frequency instrumental variable. The distinguishing feature
of the model is that it allows handing high-dimensional datasets without
imposing the approximate sparsity restrictions. We propose a
Tikhonov-regularized estimator and derive the convergence rate of its
mean-integrated squared error for time series data. The estimator has a
closed-form expression that is easy to compute and demonstrates excellent
performance in our Monte Carlo experiments. We estimate the real-time price
elasticity of supply on the Australian electricity spot market. Our estimates
suggest that the supply is relatively inelastic and that its elasticity is
heterogeneous throughout the day.
arXiv link: http://arxiv.org/abs/2003.13478v1
Sequential monitoring for cointegrating regressions
null of no breaks against the alternatives that there is either a change in the
slope, or a change to non-cointegration. After observing the regression for a
calibration sample m, we study a CUSUM-type statistic to detect the presence of
change during a monitoring horizon m+1,...,T. Our procedures use a class of
boundary functions which depend on a parameter whose value affects the delay in
detecting the possible break. Technically, these procedures are based on almost
sure limiting theorems whose derivation is not straightforward. We therefore
define a monitoring function which - at every point in time - diverges to
infinity under the null, and drifts to zero under alternatives. We cast this
sequence in a randomised procedure to construct an i.i.d. sequence, which we
then employ to define the detector function. Our monitoring procedure rejects
the null of no break (when correct) with a small probability, whilst it rejects
with probability one over the monitoring horizon in the presence of breaks.
arXiv link: http://arxiv.org/abs/2003.12182v1
Estimating Treatment Effects with Observed Confounders and Mediators
functionals of the observational joint distribution that can be estimated
empirically. Sometimes the do-calculus identifies multiple valid formulae,
prompting us to compare the statistical properties of the corresponding
estimators. For example, the backdoor formula applies when all confounders are
observed and the frontdoor formula applies when an observed mediator transmits
the causal effect. In this paper, we investigate the over-identified scenario
where both confounders and mediators are observed, rendering both estimators
valid. Addressing the linear Gaussian causal model, we demonstrate that either
estimator can dominate the other by an unbounded constant factor. Next, we
derive an optimal estimator, which leverages all observed variables, and bound
its finite-sample variance. We show that it strictly outperforms the backdoor
and frontdoor estimators and that this improvement can be unbounded. We also
present a procedure for combining two datasets, one with observed confounders
and another with observed mediators. Finally, we evaluate our methods on both
simulated data and the IHDP and JTPA datasets.
arXiv link: http://arxiv.org/abs/2003.11991v3
Rationalizing Rational Expectations: Characterization and Tests
marginal distributions of realizations and subjective beliefs. This test is
widely applicable, including in the common situation where realizations and
beliefs are observed in two different datasets that cannot be matched. We show
that whether one can rationalize rational expectations is equivalent to the
distribution of realizations being a mean-preserving spread of the distribution
of beliefs. The null hypothesis can then be rewritten as a system of many
moment inequality and equality constraints, for which tests have been recently
developed in the literature. The test is robust to measurement errors under
some restrictions and can be extended to account for aggregate shocks. Finally,
we apply our methodology to test for rational expectations about future
earnings. While individuals tend to be right on average about their future
earnings, our test strongly rejects rational expectations.
arXiv link: http://arxiv.org/abs/2003.11537v3
Missing at Random or Not: A Semiparametric Testing Approach
been developed concerning the validity and/or efficiency of statistical
procedures. On a central focus, there have been longstanding interests on the
mechanism governing data missingness, and correctly deciding the appropriate
mechanism is crucially relevant for conducting proper practical investigations.
The conventional notions include the three common potential classes -- missing
completely at random, missing at random, and missing not at random. In this
paper, we present a new hypothesis testing approach for deciding between
missing at random and missing not at random. Since the potential alternatives
of missing at random are broad, we focus our investigation on a general class
of models with instrumental variables for data missing not at random. Our
setting is broadly applicable, thanks to that the model concerning the missing
data is nonparametric, requiring no explicit model specification for the data
missingness. The foundational idea is to develop appropriate discrepancy
measures between estimators whose properties significantly differ only when
missing at random does not hold. We show that our new hypothesis testing
approach achieves an objective data oriented choice between missing at random
or not. We demonstrate the feasibility, validity, and efficacy of the new test
by theoretical analysis, simulation studies, and a real data analysis.
arXiv link: http://arxiv.org/abs/2003.11181v1
A Correlated Random Coefficient Panel Model with Time-Varying Endogeneity
We do not restrict the joint distribution of the time-invariant unobserved
heterogeneity and the covariates. We investigate identification of the average
partial effect (APE) when fixed-effect techniques cannot be used to control for
the correlation between the regressors and the time-varying disturbances.
Relying on control variables, we develop a constructive two-step identification
argument. The first step identifies nonparametrically the conditional
expectation of the disturbances given the regressors and the control variables,
and the second step uses “between-group” variations, correcting for
endogeneity, to identify the APE. We propose a natural semiparametric estimator
of the APE, show its $n$ asymptotic normality and compute its asymptotic
variance. The estimator is computationally easy to implement, and Monte Carlo
simulations show favorable finite sample properties. Control variables arise in
various economic and econometric models, and we propose applications of our
argument in several models. As an empirical illustration, we estimate the
average elasticity of intertemporal substitution in a labor supply model with
random coefficients.
arXiv link: http://arxiv.org/abs/2003.09367v2
Causal Simulation Experiments: Lessons from Bias Amplification
of variables which, when conditioned on, may further amplify existing
unmeasured confounding bias (bias amplification). Despite this theoretical
work, existing simulations of bias amplification in clinical settings have
suggested bias amplification may not be as important in many practical cases as
suggested in the theoretical literature.We resolve this tension by using tools
from the semi-parametric regression literature leading to a general
characterization in terms of the geometry of OLS estimators which allows us to
extend current results to a larger class of DAGs, functional forms, and
distributional assumptions. We further use these results to understand the
limitations of current simulation approaches and to propose a new framework for
performing causal simulation experiments to compare estimators. We then
evaluate the challenges and benefits of extending this simulation approach to
the context of a real clinical data set with a binary treatment, laying the
groundwork for a principled approach to sensitivity analysis for bias
amplification in the presence of unmeasured confounding.
arXiv link: http://arxiv.org/abs/2003.08449v1
Experimental Design under Network Interference
spillover effects when the researcher aims to conduct precise inference on
treatment effects. We consider units connected through a single network, local
dependence among individuals, and a general class of estimands encompassing
average treatment and average spillover effects. We introduce a statistical
framework for designing two-wave experiments with networks, where the
researcher optimizes over participants and treatment assignments to minimize
the variance of the estimators of interest, using a first-wave (pilot)
experiment to estimate the variance. We derive guarantees for inference on
treatment effects and regret guarantees on the variance obtained from the
proposed design mechanism. Our results illustrate the existence of a trade-off
in the choice of the pilot study and formally characterize the pilot's size
relative to the main experiment. Simulations using simulated and real-world
networks illustrate the advantages of the method.
arXiv link: http://arxiv.org/abs/2003.08421v4
Interpretable Personalization via Policy Learning with Linear Decision Boundaries
information about consumers, effective personalization of goods and services
has become a core business focus for companies to improve revenues and maintain
a competitive edge. This paper studies the personalization problem through the
lens of policy learning, where the goal is to learn a decision-making rule (a
policy) that maps from consumer and product characteristics (features) to
recommendations (actions) in order to optimize outcomes (rewards). We focus on
using available historical data for offline learning with unknown data
collection procedures, where a key challenge is the non-random assignment of
recommendations. Moreover, in many business and medical applications,
interpretability of a policy is essential. We study the class of policies with
linear decision boundaries to ensure interpretability, and propose learning
algorithms using tools from causal inference to address unbalanced treatments.
We study several optimization schemes to solve the associated non-convex,
non-smooth optimization problem, and find that a Bayesian optimization
algorithm is effective. We test our algorithm with extensive simulation studies
and apply it to an anonymized online marketplace customer purchase dataset,
where the learned policy outputs a personalized discount recommendation based
on customer and product features in order to maximize gross merchandise value
(GMV) for sellers. Our learned policy improves upon the platform's baseline by
88.2% in net sales revenue, while also providing informative insights on which
features are important for the decision-making process. Our findings suggest
that our proposed policy learning framework using tools from causal inference
and Bayesian optimization provides a promising practical approach to
interpretable personalization across a wide range of applications.
arXiv link: http://arxiv.org/abs/2003.07545v4
Anomalous supply shortages from dynamic pricing in on-demand mobility
maintain a self-organized balance of demand and supply. However, throughout
complex dynamical systems, unintended collective states exist that may
compromise their function. Here we reveal how dynamic pricing may induce
demand-supply imbalances instead of preventing them. Combining game theory and
time series analysis of dynamic pricing data from on-demand ride-hailing
services, we explain this apparent contradiction. We derive a phase diagram
demonstrating how and under which conditions dynamic pricing incentivizes
collective action of ride-hailing drivers to induce anomalous supply shortages.
By disentangling different timescales in price time series of ride-hailing
services at 137 locations across the globe, we identify characteristic patterns
in the price dynamics reflecting these anomalous supply shortages. Our results
provide systemic insights for the regulation of dynamic pricing, in particular
in publicly accessible mobility systems, by unraveling under which conditions
dynamic pricing schemes promote anomalous supply shortages.
arXiv link: http://arxiv.org/abs/2003.07736v1
Testing Many Restrictions Under Heteroskedasticity
heteroskedastic linear regression model. The test compares the conventional F
statistic to a critical value that corrects for many restrictions and
conditional heteroskedasticity. This correction uses leave-one-out estimation
to correctly center the critical value and leave-three-out estimation to
appropriately scale it. The large sample properties of the test are established
in an asymptotic framework where the number of tested restrictions may be fixed
or may grow with the sample size, and can even be proportional to the number of
observations. We show that the test is asymptotically valid and has non-trivial
asymptotic power against the same local alternatives as the exact F test when
the latter is valid. Simulations corroborate these theoretical findings and
suggest excellent size control in moderately small samples, even under strong
heteroskedasticity.
arXiv link: http://arxiv.org/abs/2003.07320v3
Stochastic Frontier Analysis with Generalized Errors: inference, model comparison and averaging
generalized model for stochastic frontier analysis (SFA) that nests virtually
all forms used and includes some that have not been considered so far. The
model is based on the generalized t distribution for the observation error and
the generalized beta distribution of the second kind for the
inefficiency-related term. We use this general error structure framework for
formal testing, to compare alternative specifications and to conduct model
averaging. This allows us to deal with model specification uncertainty, which
is one of the main unresolved issues in SFA, and to relax a number of
potentially restrictive assumptions embedded within existing SF models. We also
develop Bayesian inference methods that are less restrictive compared to the
ones used so far and demonstrate feasible approximate alternatives based on
maximum likelihood.
arXiv link: http://arxiv.org/abs/2003.07150v2
Targeting customers under response-dependent costs
the cost for a marketing action depends on the customer response and proposes a
framework to estimate the decision variables for campaign profit optimization.
Targeting a customer is profitable if the impact and associated profit of the
marketing treatment are higher than its cost. Despite the growing literature on
uplift models to identify the strongest treatment-responders, no research has
investigated optimal targeting when the costs of the treatment are unknown at
the time of the targeting decision. Stochastic costs are ubiquitous in direct
marketing and customer retention campaigns because marketing incentives are
conditioned on a positive customer response. This study makes two contributions
to the literature, which are evaluated on an e-commerce coupon targeting
campaign. First, we formally analyze the targeting decision problem under
response-dependent costs. Profit-optimal targeting requires an estimate of the
treatment effect on the customer and an estimate of the customer response
probability under treatment. The empirical results demonstrate that the
consideration of treatment cost substantially increases campaign profit when
used for customer targeting in combination with an estimate of the average or
customer-level treatment effect. Second, we propose a framework to jointly
estimate the treatment effect and the response probability by combining methods
for causal inference with a hurdle mixture model. The proposed causal hurdle
model achieves competitive campaign profit while streamlining model building.
Code is available at https://github.com/Humboldt-WI/response-dependent-costs.
arXiv link: http://arxiv.org/abs/2003.06271v2
Causal Spillover Effects Using Instrumental Variables
instrumental variables. I characterize the population compliance types in a
setting in which spillovers can occur on both treatment take-up and outcomes,
and provide conditions for identification of the marginal distribution of
compliance types. I show that intention-to-treat (ITT) parameters aggregate
multiple direct and spillover effects for different compliance types, and hence
do not have a clear link to causally interpretable parameters. Moreover,
rescaling ITT parameters by first-stage estimands generally recovers a weighted
combination of average effects where the sum of weights is larger than one. I
then analyze identification of causal direct and spillover effects under
one-sided noncompliance, and show that causal effects can be estimated by 2SLS
in this case. I illustrate the proposed methods using data from an experiment
on social interactions and voting behavior. I also introduce an alternative
assumption, independence of peers' types, that identifies parameters of
interest under two-sided noncompliance by restricting the amount of
heterogeneity in average potential outcomes.
arXiv link: http://arxiv.org/abs/2003.06023v5
A mixture autoregressive model based on Gaussian and Student's $t$-distributions
Student's $t$ mixture components. The model has very attractive properties
analogous to the Gaussian and Student's $t$ mixture autoregressive models, but
it is more flexible as it enables to model series which consist of both
conditionally homoscedastic Gaussian regimes and conditionally heteroscedastic
Student's $t$ regimes. The usefulness of our model is demonstrated in an
empirical application to the monthly U.S. interest rate spread between the
3-month Treasury bill rate and the effective federal funds rate.
arXiv link: http://arxiv.org/abs/2003.05221v3
Identification and Estimation of Weakly Separable Models Without Monotonicity
weakly separable models. In their seminal work, Vytlacil and Yildiz (2007)
showed how to identify and estimate the average treatment effect of a dummy
endogenous variable when the outcome is weakly separable in a single index.
Their identification result builds on a monotonicity condition with respect to
this single index. In comparison, we consider similar weakly separable models
with multiple indices, and relax the monotonicity condition for identification.
Unlike Vytlacil and Yildiz (2007), we exploit the full information in the
distribution of the outcome variable, instead of just its mean. Indeed, when
the outcome distribution function is more informative than the mean, our method
is applicable to more general settings than theirs; in particular we do not
rely on their monotonicity assumption and at the same time we also allow for
multiple indices. To illustrate the advantage of our approach, we provide
examples of models where our approach can identify parameters of interest
whereas existing methods would fail. These examples include models with
multiple unobserved disturbance terms such as the Roy model and multinomial
choice models with dummy endogenous variables, as well as potential outcome
models with endogenous random coefficients. Our method is easy to implement and
can be applied to a wide class of models. We establish standard asymptotic
properties such as consistency and asymptotic normality.
arXiv link: http://arxiv.org/abs/2003.04337v2
Fast Bayesian Record Linkage With Record-Specific Disagreement Parameters
that lack a common unique identifier. Matching procedures often struggle to
match records with common names, birthplaces or other field values.
Computational feasibility is also a challenge, particularly when linking large
datasets. We develop a Bayesian method for automated probabilistic record
linkage and show it recovers more than 50% more true matches, holding accuracy
constant, than comparable methods in a matching of military recruitment data to
the 1900 US Census for which expert-labelled matches are available. Our
approach, which builds on a recent state-of-the-art Bayesian method, refines
the modelling of comparison data, allowing disagreement probability parameters
conditional on non-match status to be record-specific in the smaller of the two
datasets. This flexibility significantly improves matching when many records
share common field values. We show that our method is computationally feasible
in practice, despite the added complexity, with an R/C++ implementation that
achieves significant improvement in speed over comparable recent methods. We
also suggest a lightweight method for treatment of very common names and show
how to estimate true positive rate and positive predictive value when true
match status is unavailable.
arXiv link: http://arxiv.org/abs/2003.04238v2
Unit Root Testing with Slowly Varying Trends
deterministic trend component. It is shown that asymptotically the pooled OLS
estimator of overlapping blocks filters out any trend component that satisfies
some Lipschitz condition. Under both fixed-$b$ and small-$b$ block asymptotics,
the limiting distribution of the t-statistic for the unit root hypothesis is
derived. Nuisance parameter corrections provide heteroskedasticity-robust
tests, and serial correlation is accounted for by pre-whitening. A Monte Carlo
study that considers slowly varying trends yields both good size and improved
power results for the proposed tests when compared to conventional unit root
tests.
arXiv link: http://arxiv.org/abs/2003.04066v3
Complete Subset Averaging for Quantile Regressions
subset averaging (CSA) for quantile regressions. All models under consideration
are potentially misspecified and the dimension of regressors goes to infinity
as the sample size increases. Since we average over the complete subsets, the
number of models is much larger than the usual model averaging method which
adopts sophisticated weighting schemes. We propose to use an equal weight but
select the proper size of the complete subset based on the leave-one-out
cross-validation method. Building upon the theory of Lu and Su (2015), we
investigate the large sample properties of CSA and show the asymptotic
optimality in the sense of Li (1987). We check the finite sample performance
via Monte Carlo simulations and empirical applications.
arXiv link: http://arxiv.org/abs/2003.03299v3
Double Machine Learning based Program Evaluation under Unconfoundedness
Double Machine Learning (DML) with a focus on program evaluation under
unconfoundedness. DML based methods leverage flexible prediction models to
adjust for confounding variables in the estimation of (i) standard average
effects, (ii) different forms of heterogeneous effects, and (iii) optimal
treatment assignment rules. An evaluation of multiple programs of the Swiss
Active Labour Market Policy illustrates how DML based methods enable a
comprehensive program evaluation. Motivated by extreme individualised treatment
effect estimates of the DR-learner, we propose the normalised DR-learner
(NDR-learner) to address this issue. The NDR-learner acknowledges that
individualised effect estimates can be stabilised by an individualised
normalisation of inverse probability weights.
arXiv link: http://arxiv.org/abs/2003.03191v5
Equal Predictive Ability Tests Based on Panel Data with Applications to OECD and IMF Forecasts
compare the predictions made by two forecasters. The first type, namely
$S$-statistics, focuses on the overall EPA hypothesis which states that the EPA
holds on average over all panel units and over time. The second, called
$C$-statistics, focuses on the clustered EPA hypothesis where the EPA holds
jointly for a fixed number of clusters of panel units. The asymptotic
properties of the proposed tests are evaluated under weak and strong
cross-sectional dependence. An extensive Monte Carlo simulation shows that the
proposed tests have very good finite sample properties even with little
information about the cross-sectional dependence in the data. The proposed
framework is applied to compare the economic growth forecasts of the OECD and
the IMF, and to evaluate the performance of the consumer price inflation
forecasts of the IMF.
arXiv link: http://arxiv.org/abs/2003.02803v3
Backward CUSUM for Testing and Monitoring Structural Change with an Application to COVID-19 Pandemic Data
from low power and large detection delay. In order to improve the power of the
test, we propose two alternative statistics. The backward CUSUM detector
considers the recursive residuals in reverse chronological order, whereas the
stacked backward CUSUM detector sequentially cumulates a triangular array of
backwardly cumulated residuals. A multivariate invariance principle for partial
sums of recursive residuals is given, and the limiting distributions of the
test statistics are derived under local alternatives. In the retrospective
context, the local power of the tests is shown to be substantially higher than
that of the conventional CUSUM test if a break occurs in the middle or at the
end of the sample. When applied to monitoring schemes, the detection delay of
the stacked backward CUSUM is found to be much shorter than that of the
conventional monitoring CUSUM procedure. Furthermore, we propose an estimator
of the break date based on the backward CUSUM detector and show that in
monitoring exercises this estimator tends to outperform the usual maximum
likelihood estimator. Finally, an application of the methodology to COVID-19
data is presented.
arXiv link: http://arxiv.org/abs/2003.02682v3
Impact of Congestion Charge and Minimum Wage on TNCs: A Case Study for San Francisco
the imposition of a congestion charge and a driver minimum wage. The impact is
assessed using a market equilibrium model to calculate the changes in the
number of passenger trips and trip fare, number of drivers employed, the TNC
platform profit, the number of TNC vehicles, and city revenue. Two charges are
considered: (a) a charge per TNC trip similar to an excise tax, and (b) a
charge per vehicle operating hour (whether or not it has a passenger) similar
to a road tax. Both charges reduce the number of TNC trips, but this reduction
is limited by the wage floor, and the number of TNC vehicles reduced is not
significant. The time-based charge is preferable to the trip-based charge
since, by penalizing idle vehicle time, the former increases vehicle occupancy.
In a case study for San Francisco, the time-based charge is found to be Pareto
superior to the trip-based charge as it yields higher passenger surplus, higher
platform profits, and higher tax revenue for the city.
arXiv link: http://arxiv.org/abs/2003.02550v4
Joint Estimation of Discrete Choice Model and Arrival Rate with Unobserved Stock-out Events
and the arrival rate of potential customers when unobserved stock-out events
occur. In this paper, we generalize [Anupindi et al., 1998] and [Conlon and
Mortimer, 2013] in the sense that (1) we work with generic choice models, (2)
we allow arbitrary numbers of products and stock-out events, and (3) we
consider the existence of the null alternative, and estimates the overall
arrival rate of potential customers. In addition, we point out that the
modeling in [Conlon and Mortimer, 2013] is problematic, and present the correct
formulation.
arXiv link: http://arxiv.org/abs/2003.02313v1
Estimating the Effect of Central Bank Independence on Inflation Using Longitudinal Targeted Maximum Likelihood Estimation
a controversial hypothesis. To date, it has not been possible to satisfactorily
answer this question because the complex macroeconomic structure that gives
rise to the data has not been adequately incorporated into statistical
analyses. We develop a causal model that summarizes the economic process of
inflation. Based on this causal model and recent data, we discuss and identify
the assumptions under which the effect of central bank independence on
inflation can be identified and estimated. Given these and alternative
assumptions, we estimate this effect using modern doubly robust effect
estimators, i.e., longitudinal targeted maximum likelihood estimators. The
estimation procedure incorporates machine learning algorithms and is tailored
to address the challenges associated with complex longitudinal macroeconomic
data. We do not find strong support for the hypothesis that having an
independent central bank for a long period of time necessarily lowers
inflation. Simulation studies evaluate the sensitivity of the proposed methods
in complex settings when certain assumptions are violated and highlight the
importance of working with appropriate learning algorithms for estimation.
arXiv link: http://arxiv.org/abs/2003.02208v7
Identification of Random Coefficient Latent Utility Models
coefficient distributions in perturbed utility models. We cover discrete and
continuous choice models. We establish identification using variation in mean
quantities, and the results apply when an analyst observes aggregate demands
but not whether goods are chosen together. We require exclusion restrictions
and independence between random slope coefficients and random intercepts. We do
not require regressors to have large supports or parametric assumptions.
arXiv link: http://arxiv.org/abs/2003.00276v1
Causal mediation analysis with double machine learning
control for observed confounders in a data-driven way under a
selection-on-observables assumption in a high-dimensional setting. We consider
the average indirect effect of a binary treatment operating through an
intermediate variable (or mediator) on the causal path between the treatment
and the outcome, as well as the unmediated direct effect. Estimation is based
on efficient score functions, which possess a multiple robustness property
w.r.t. misspecifications of the outcome, mediator, and treatment models. This
property is key for selecting these models by double machine learning, which is
combined with data splitting to prevent overfitting in the estimation of the
effects of interest. We demonstrate that the direct and indirect effect
estimators are asymptotically normal and root-n consistent under specific
regularity conditions and investigate the finite sample properties of the
suggested methods in a simulation study when considering lasso as machine
learner. We also provide an empirical application to the U.S. National
Longitudinal Survey of Youth, assessing the indirect effect of health insurance
coverage on general health operating via routine checkups as mediator, as well
as the direct effect. We find a moderate short term effect of health insurance
coverage on general health which is, however, not mediated by routine checkups.
arXiv link: http://arxiv.org/abs/2002.12710v6
Modelling Network Interference with Multi-valued Treatments: the Causal Effect of Immigration Policy on Crime Rates
intervention, face some statistical challenges: in real-world settings
treatments are not randomly assigned and the analysis might be further
complicated by the presence of interference between units. Researchers have
started to develop novel methods that allow to manage spillover mechanisms in
observational studies; recent works focus primarily on binary treatments.
However, many policy evaluation studies deal with more complex interventions.
For instance, in political science, evaluating the impact of policies
implemented by administrative entities often implies a multivariate approach,
as a policy towards a specific issue operates at many different levels and can
be defined along a number of dimensions. In this work, we extend the
statistical framework about causal inference under network interference in
observational studies, allowing for a multi-valued individual treatment and an
interference structure shaped by a weighted network. The estimation strategy is
based on a joint multiple generalized propensity score and allows one to
estimate direct effects, controlling for both individual and network
covariates. We follow the proposed methodology to analyze the impact of the
national immigration policy on the crime rate. We define a multi-valued
characterization of political attitudes towards migrants and we assume that the
extent to which each country can be influenced by another country is modeled by
an appropriate indicator, summarizing their cultural and geographical
proximity. Results suggest that implementing a highly restrictive immigration
policy leads to an increase of the crime rate and the estimated effects is
larger if we take into account interference from other countries.
arXiv link: http://arxiv.org/abs/2003.10525v3
Off-Policy Evaluation and Learning for External Validity under a Covariate Shift
using the historical data obtained from a different policy. The goal of
off-policy evaluation (OPE) is to estimate the expected reward of a new policy
over the evaluation data, and that of off-policy learning (OPL) is to find a
new policy that maximizes the expected reward over the evaluation data.
Although the standard OPE and OPL assume the same distribution of covariate
between the historical and evaluation data, a covariate shift often exists,
i.e., the distribution of the covariate of the historical data is different
from that of the evaluation data. In this paper, we derive the efficiency bound
of OPE under a covariate shift. Then, we propose doubly robust and efficient
estimators for OPE and OPL under a covariate shift by using a nonparametric
estimator of the density ratio between the historical and evaluation data
distributions. We also discuss other possible estimators and compare their
theoretical properties. Finally, we confirm the effectiveness of the proposed
estimators through experiments.
arXiv link: http://arxiv.org/abs/2002.11642v3
Econometric issues with Laubach and Williams' estimates of the natural rate of interest
interest are driven by the downward trending behaviour of 'other factor'
$z_{t}$. I show that their implementation of Stock and Watson's (1998) Median
Unbiased Estimation (MUE) to determine the size of the $\lambda _{z}$ parameter
which drives this downward trend in $z_{t}$ is unsound. It cannot recover the
ratio of interest $\lambda _{z}=a_{r}\sigma _{z}/\sigma _{y}$ from MUE
required for the estimation of the full structural model. This failure is due
to an 'unnecessary' misspecification in Holston et al.'s (2017) formulation of
the Stage 2 model. More importantly, their implementation of MUE on this
misspecified Stage 2 model spuriously amplifies the point estimate of $\lambda
_{z}$. Using a simulation experiment, I show that their procedure generates
excessively large estimates of $\lambda _{z}$ when applied to data generated
from a model where the true $\lambda _{z}$ is equal to zero. Correcting the
misspecification in their Stage 2 model and the implementation of MUE leads to
a substantially smaller $\lambda _{z}$ estimate, and with this, a more subdued
downward trending influence of 'other factor' $z_{t}$ on the natural rate.
Moreover, the $\lambda _{z}$ point estimate is statistically highly
insignificant, suggesting that there is no role for 'other factor' $z_{t}$ in
this model. I also discuss various other estimation issues that arise in
Holston et al.'s (2017) model of the natural rate that make it unsuitable for
policy analysis.
arXiv link: http://arxiv.org/abs/2002.11583v2
Hours Worked and the U.S. Distribution of Real Annual Earnings 1976-2019
decomposing changes in the real annual earnings distribution into composition,
structural and hours effects. We do so via a nonseparable simultaneous model of
hours, wages and earnings. Using the Current Population Survey for the survey
years 1976--2019, we find that changes in the female distribution of annual
hours of work are important in explaining movements in inequality in female
annual earnings. This captures the substantial changes in their employment
behavior over this period. Movements in the male hours distribution only affect
the lower part of their earnings distribution and reflect the sensitivity of
these workers' annual hours of work to cyclical factors.
arXiv link: http://arxiv.org/abs/2002.11211v3
A Practical Approach to Social Learning
structures often deprived of micro-foundations. Both models are limited when
analyzing interim results or performing empirical analysis. We present a method
of generating signal structures which are richer than the binary model, yet are
tractable enough to perform simulations and empirical analysis. We demonstrate
the method's usability by revisiting two classical papers: (1) we discuss the
economic significance of unbounded signals Smith and Sorensen (2000); (2) we
use experimental data from Anderson and Holt (1997) to perform econometric
analysis. Additionally, we provide a necessary and sufficient condition for the
occurrence of action cascades.
arXiv link: http://arxiv.org/abs/2002.11017v1
Estimating Economic Models with Testable Assumptions: Theory and Applications
problem in complete and incomplete economic models with testable assumptions.
Testable assumptions ($A$) give strong and interpretable empirical content to
the models but they also carry the possibility that some distribution of
observed outcomes may reject these assumptions. A natural way to avoid this is
to find a set of relaxed assumptions ($A$) that cannot be rejected by
any distribution of observed outcome and the identified set of the parameter of
interest is not changed when the original assumption is not rejected. The main
contribution of this paper is to characterize the properties of such a relaxed
assumption $A$ using a generalized definition of refutability and
confirmability. I also propose a general method to construct such $A$.
A general estimation and inference procedure is proposed and can be applied to
most incomplete economic models. I apply my methodology to the instrument
monotonicity assumption in Local Average Treatment Effect (LATE) estimation and
to the sector selection assumption in a binary outcome Roy model of employment
sector choice. In the LATE application, I use my general method to construct a
set of relaxed assumptions $A$ that can never be rejected, and the
identified set of LATE is the same as imposing $A$ when $A$ is not rejected.
LATE is point identified under my extension $A$ in the LATE
application. In the binary outcome Roy model, I use my method of incomplete
models to relax Roy's sector selection assumption and characterize the
identified set of the binary potential outcome as a polyhedron.
arXiv link: http://arxiv.org/abs/2002.10415v3
Bayesian Inference in High-Dimensional Time-varying Parameter Models using Integrated Rotated Gaussian Approximations
regressions which involve a large number of explanatory variables. Including
prior information to mitigate over-parameterization concerns has led to many
using Bayesian methods. However, Bayesian Markov Chain Monte Carlo (MCMC)
methods can be very computationally demanding. In this paper, we develop
computationally efficient Bayesian methods for estimating TVP models using an
integrated rotated Gaussian approximation (IRGA). This exploits the fact that
whereas constant coefficients on regressors are often important, most of the
TVPs are often unimportant. Since Gaussian distributions are invariant to
rotations we can split the the posterior into two parts: one involving the
constant coefficients, the other involving the TVPs. Approximate methods are
used on the latter and, conditional on these, the former are estimated with
precision using MCMC methods. In empirical exercises involving artificial data
and a large macroeconomic data set, we show the accuracy and computational
benefits of IRGA methods.
arXiv link: http://arxiv.org/abs/2002.10274v1
Estimation and Inference about Tail Features with Tail Censored Data
observations beyond some threshold are censored. We first show that ignoring
such tail censoring could lead to substantial bias and size distortion, even if
the censored probability is tiny. Second, we propose a new maximum likelihood
estimator (MLE) based on the Pareto tail approximation and derive its
asymptotic properties. Third, we provide a small sample modification to the MLE
by resorting to Extreme Value theory. The MLE with this modification delivers
excellent small sample performance, as shown by Monte Carlo simulations. We
illustrate its empirical relevance by estimating (i) the tail index and the
extreme quantiles of the US individual earnings with the Current Population
Survey dataset and (ii) the tail index of the distribution of macroeconomic
disasters and the coefficient of risk aversion using the dataset collected by
Barro and Urs{\'u}a (2008). Our new empirical findings are substantially
different from the existing literature.
arXiv link: http://arxiv.org/abs/2002.09982v1
Testing for threshold regulation in presence of measurement error with an application to the PPP hypothesis
and can be tested within the threshold autoregressive setting, with the null
hypothesis being a global non-stationary process. Nonetheless, this setting is
debatable since data are often corrupted by measurement errors. Thus, it is
more appropriate to consider a threshold autoregressive moving-average model as
the general hypothesis. We implement this new setting with the integrated
moving-average model of order one as the null hypothesis. We derive a Lagrange
multiplier test which has an asymptotically similar null distribution and
provide the first rigorous proof of tightness pertaining to testing for
threshold nonlinearity against difference stationarity, which is of independent
interest. Simulation studies show that the proposed approach enjoys less bias
and higher power in detecting threshold regulation than existing tests when
there are measurement errors. We apply the new approach to the daily real
exchange rates of Eurozone countries. It lends support to the purchasing power
parity hypothesis, via a nonlinear mean-reversion mechanism triggered upon
crossing a threshold located in the extreme upper tail. Furthermore, we analyse
the Eurozone series and propose a threshold autoregressive moving-average
specification, which sheds new light on the purchasing power parity debate.
arXiv link: http://arxiv.org/abs/2002.09968v3
Survey Bandits with Regret Guarantees
contextual bandits, when a user arrives we get the user's complete feature
vector and then assign a treatment (arm) to that user. In a number of
applications (like healthcare), collecting features from users can be costly.
To address this issue, we propose algorithms that avoid needless feature
collection while maintaining strong regret guarantees.
arXiv link: http://arxiv.org/abs/2002.09814v1
Kernel Conditional Moment Test via Maximum Moment Restriction
moment (KCM) tests. Our tests are built on a novel representation of
conditional moment restrictions in a reproducing kernel Hilbert space (RKHS)
called conditional moment embedding (CMME). After transforming the conditional
moment restrictions into a continuum of unconditional counterparts, the test
statistic is defined as the maximum moment restriction (MMR) within the unit
ball of the RKHS. We show that the MMR not only fully characterizes the
original conditional moment restrictions, leading to consistency in both
hypothesis testing and parameter estimation, but also has an analytic
expression that is easy to compute as well as closed-form asymptotic
distributions. Our empirical studies show that the KCM test has a promising
finite-sample performance compared to existing tests.
arXiv link: http://arxiv.org/abs/2002.09225v3
Forecasting the Intra-Day Spread Densities of Electricity Prices
electric vehicle operators. This paper formulates dynamic density functions,
based upon skewed-t and similar representations, to model and forecast the
German electricity price spreads between different hours of the day, as
revealed in the day-ahead auctions. The four specifications of the density
functions are dynamic and conditional upon exogenous drivers, thereby
permitting the location, scale and shape parameters of the densities to respond
hourly to such factors as weather and demand forecasts. The best fitting and
forecasting specifications for each spread are selected based on the Pinball
Loss function, following the closed-form analytical solutions of the cumulative
distribution functions.
arXiv link: http://arxiv.org/abs/2002.10566v1
Combining Shrinkage and Sparsity in Conjugate Vector Autoregressive Models
autoregressive (VAR) models but, at the same time, introduce the restriction
that each equation features the same set of explanatory variables. This paper
proposes a straightforward means of post-processing posterior estimates of a
conjugate Bayesian VAR to effectively perform equation-specific covariate
selection. Compared to existing techniques using shrinkage alone, our approach
combines shrinkage and sparsity in both the VAR coefficients and the error
variance-covariance matrices, greatly reducing estimation uncertainty in large
dimensions while maintaining computational tractability. We illustrate our
approach by means of two applications. The first application uses synthetic
data to investigate the properties of the model across different
data-generating processes, the second application analyzes the predictive gains
from sparsification in a forecasting exercise for US data.
arXiv link: http://arxiv.org/abs/2002.08760v2
Debiased Off-Policy Evaluation for Recommendation Systems
interactive bandit and reinforcement learning systems such as recommendation
systems. A/B tests are reliable, but are time- and money-consuming, and entail
a risk of failure. In this paper, we develop an alternative method, which
predicts the performance of algorithms given historical data that may have been
generated by a different algorithm. Our estimator has the property that its
prediction converges in probability to the true performance of a counterfactual
algorithm at a rate of $N$, as the sample size $N$ increases. We also
show a correct way to estimate the variance of our prediction, thus allowing
the analyst to quantify the uncertainty in the prediction. These properties
hold even when the analyst does not know which among a large number of
potentially important state variables are actually important. We validate our
method by a simulation experiment about reinforcement learning. We finally
apply it to improve advertisement design by a major advertisement company. We
find that our method produces smaller mean squared errors than state-of-the-art
methods.
arXiv link: http://arxiv.org/abs/2002.08536v3
Forecasting Foreign Exchange Rate: A Multivariate Comparative Analysis between Traditional Econometric, Contemporary Machine Learning & Deep Learning Techniques
such as the foreign the exchange rate or at least estimating the trend
correctly is of key importance for any future investment. In recent times, the
use of computational intelligence-based techniques for forecasting
macroeconomic variables has been proven highly successful. This paper tries to
come up with a multivariate time series approach to forecast the exchange rate
(USD/INR) while parallelly comparing the performance of three multivariate
prediction modelling techniques: Vector Auto Regression (a Traditional
Econometric Technique), Support Vector Machine (a Contemporary Machine Learning
Technique), and Recurrent Neural Networks (a Contemporary Deep Learning
Technique). We have used monthly historical data for several macroeconomic
variables from April 1994 to December 2018 for USA and India to predict USD-INR
Foreign Exchange Rate. The results clearly depict that contemporary techniques
of SVM and RNN (Long Short-Term Memory) outperform the widely used traditional
method of Auto Regression. The RNN model with Long Short-Term Memory (LSTM)
provides the maximum accuracy (97.83%) followed by SVM Model (97.17%) and VAR
Model (96.31%). At last, we present a brief analysis of the correlation and
interdependencies of the variables used for forecasting.
arXiv link: http://arxiv.org/abs/2002.10247v1
Cointegration without Unit Roots
cointegrating relationships break down entirely when autoregressive roots are
near but not exactly equal to unity. We consider this problem within the
framework of a structural VAR, arguing this it is as much a problem of
identification failure as it is of inference. We develop a characterisation of
cointegration based on the impulse response function, which allows long-run
equilibrium relationships to remain identified even in the absence of exact
unit roots. Our approach also provides a framework in which the structural
shocks driving the common persistent components continue to be identified via
long-run restrictions, just as in an SVAR with exact unit roots. We show that
inference on the cointegrating relationships is affected by nuisance
parameters, in a manner familiar from predictive regression; indeed the two
problems are asymptotically equivalent. By adapting the approach of Elliott,
M\"uller and Watson (2015) to our setting, we develop tests that robustly
control size while sacrificing little power (relative to tests that are
efficient in the presence of exact unit roots).
arXiv link: http://arxiv.org/abs/2002.08092v2
Seasonal and Trend Forecasting of Tourist Arrivals: An Adaptive Multiscale Ensemble Learning Approach
challenging task. In the view of the importance of seasonal and trend
forecasting of tourist arrivals, and limited research work paid attention to
these previously. In this study, a new adaptive multiscale ensemble (AME)
learning approach incorporating variational mode decomposition (VMD) and least
square support vector regression (LSSVR) is developed for short-, medium-, and
long-term seasonal and trend forecasting of tourist arrivals. In the
formulation of our developed AME learning approach, the original tourist
arrivals series are first decomposed into the trend, seasonal and remainders
volatility components. Then, the ARIMA is used to forecast the trend component,
the SARIMA is used to forecast seasonal component with a 12-month cycle, while
the LSSVR is used to forecast remainder volatility components. Finally, the
forecasting results of the three components are aggregated to generate an
ensemble forecasting of tourist arrivals by the LSSVR based nonlinear ensemble
approach. Furthermore, a direct strategy is used to implement multi-step-ahead
forecasting. Taking two accuracy measures and the Diebold-Mariano test, the
empirical results demonstrate that our proposed AME learning approach can
achieve higher level and directional forecasting accuracy compared with other
benchmarks used in this study, indicating that our proposed approach is a
promising model for forecasting tourist arrivals with high seasonality and
volatility.
arXiv link: http://arxiv.org/abs/2002.08021v2
Tourism Demand Forecasting: An Ensemble Deep Learning Approach
improve the accuracy of tourism demand forecasting, but presents significant
challenges for forecasting, including curse of dimensionality and high model
complexity. A novel bagging-based multivariate ensemble deep learning approach
integrating stacked autoencoders and kernel-based extreme learning machines
(B-SAKE) is proposed to address these challenges in this study. By using
historical tourist arrival data, economic variable data and search intensity
index (SII) data, we forecast tourist arrivals in Beijing from four countries.
The consistent results of multiple schemes suggest that our proposed B-SAKE
approach outperforms benchmark models in terms of level accuracy, directional
accuracy and even statistical significance. Both bagging and stacked
autoencoder can effectively alleviate the challenges brought by tourism big
data and improve the forecasting performance of the models. The ensemble deep
learning model we propose contributes to tourism forecasting literature and
benefits relevant government officials and tourism practitioners.
arXiv link: http://arxiv.org/abs/2002.07964v3
Fair Prediction with Endogenous Behavior
algorithms deployed in consequential domains (e.g. in criminal justice) treat
different demographic groups "fairly." However, there are several proposed
notions of fairness, typically mutually incompatible. Using criminal justice as
an example, we study a model in which society chooses an incarceration rule.
Agents of different demographic groups differ in their outside options (e.g.
opportunity for legal employment) and decide whether to commit crimes. We show
that equalizing type I and type II errors across groups is consistent with the
goal of minimizing the overall crime rate; other popular notions of fairness
are not.
arXiv link: http://arxiv.org/abs/2002.07147v1
Hopf Bifurcation from new-Keynesian Taylor rule to Ramsey Optimal Policy
new-Keynesian setting. We can show that a shift from Ramsey optimal policy
under short-term commitment (based on a negative feedback mechanism) to a
Taylor rule (based on a positive feedback mechanism) corresponds to a Hopf
bifurcation with opposite policy advice and a change of the dynamic properties.
This bifurcation occurs because of the ad hoc assumption that interest rate is
a forward-looking variable when policy targets (inflation and output gap) are
forward-looking variables in the new-Keynesian theory.
arXiv link: http://arxiv.org/abs/2002.07479v1
Double/Debiased Machine Learning for Dynamic Treatment Effects via g-Estimation
treatments are assigned over time and treatments can have a causal effect on
future outcomes or the state of the treated unit. We propose an extension of
the double/debiased machine learning framework to estimate the dynamic effects
of treatments, which can be viewed as a Neyman orthogonal (locally robust)
cross-fitted version of $g$-estimation in the dynamic treatment regime. Our
method applies to a general class of non-linear dynamic treatment models known
as Structural Nested Mean Models and allows the use of machine learning methods
to control for potentially high dimensional state variables, subject to a mean
square error guarantee, while still allowing parametric estimation and
construction of confidence intervals for the structural parameters of interest.
These structural parameters can be used for off-policy evaluation of any target
dynamic policy at parametric rates, subject to semi-parametric restrictions on
the data generating process. Our work is based on a recursive peeling process,
typical in $g$-estimation, and formulates a strongly convex objective at each
stage, which allows us to extend the $g$-estimation framework in multiple
directions: i) to provide finite sample guarantees, ii) to estimate non-linear
effect heterogeneity with respect to fixed unit characteristics, within
arbitrary function spaces, enabling a dynamic analogue of the RLearner
algorithm for heterogeneous effects, iii) to allow for high-dimensional sparse
parameterizations of the target structural functions, enabling automated model
selection via a recursive lasso algorithm. We also provide guarantees for data
stemming from a single treated unit over a long horizon and under stationarity
conditions.
arXiv link: http://arxiv.org/abs/2002.07285v5
Fairness through Experimentation: Inequality in A/B testing as an approach to responsible design
individuals being left behind. Many businesses are striving to adopt
responsible design practices and avoid any unintended consequences of their
products and services, ranging from privacy vulnerabilities to algorithmic
bias. We propose a novel approach to fairness and inclusiveness based on
experimentation. We use experimentation because we want to assess not only the
intrinsic properties of products and algorithms but also their impact on
people. We do this by introducing an inequality approach to A/B testing,
leveraging the Atkinson index from the economics literature. We show how to
perform causal inference over this inequality measure. We also introduce the
concept of site-wide inequality impact, which captures the inclusiveness impact
of targeting specific subpopulations for experiments, and show how to conduct
statistical inference on this impact. We provide real examples from LinkedIn,
as well as an open-source, highly scalable implementation of the computation of
the Atkinson index and its variance in Spark/Scala. We also provide over a
year's worth of learnings -- gathered by deploying our method at scale and
analyzing thousands of experiments -- on which areas and which kinds of product
innovations seem to inherently foster fairness through inclusiveness.
arXiv link: http://arxiv.org/abs/2002.05819v1
Experimental Design in Two-Sided Platforms: An Analysis of Bias
marketplaces. Many of these experiments exhibit interference, where an
intervention applied to one market participant influences the behavior of
another participant. This interference leads to biased estimates of the
treatment effect of the intervention. We develop a stochastic market model and
associated mean field limit to capture dynamics in such experiments, and use
our model to investigate how the performance of different designs and
estimators is affected by marketplace interference effects. Platforms typically
use two common experimental designs: demand-side ("customer") randomization
(CR) and supply-side ("listing") randomization (LR), along with their
associated estimators. We show that good experimental design depends on market
balance: in highly demand-constrained markets, CR is unbiased, while LR is
biased; conversely, in highly supply-constrained markets, LR is unbiased, while
CR is biased. We also introduce and study a novel experimental design based on
two-sided randomization (TSR) where both customers and listings are randomized
to treatment and control. We show that appropriate choices of TSR designs can
be unbiased in both extremes of market balance, while yielding relatively low
bias in intermediate regimes of market balance.
arXiv link: http://arxiv.org/abs/2002.05670v5
Long-term prediction intervals of economic time series
of univariate economic time series. We propose computational adjustments of the
existing methods to improve coverage probability under a small sample
constraint. A pseudo-out-of-sample evaluation shows that our methods perform at
least as well as selected alternative methods based on model-implied Bayesian
approaches and bootstrapping. Our most successful method yields prediction
intervals for eight macroeconomic indicators over a horizon spanning several
decades.
arXiv link: http://arxiv.org/abs/2002.05384v1
Efficient Adaptive Experimental Design for Average Treatment Effect Estimation
adaptive experiments. In adaptive experiments, experimenters sequentially
assign treatments to experimental units while updating treatment assignment
probabilities based on past data. We start by defining the efficient
treatment-assignment probability, which minimizes the semiparametric efficiency
bound for ATE estimation. Our proposed experimental design estimates and uses
the efficient treatment-assignment probability to assign treatments. At the end
of the proposed design, the experimenter estimates the ATE using a newly
proposed Adaptive Augmented Inverse Probability Weighting (A2IPW) estimator. We
show that the asymptotic variance of the A2IPW estimator using data from the
proposed design achieves the minimized semiparametric efficiency bound. We also
analyze the estimator's finite-sample properties and develop nonparametric and
nonasymptotic confidence intervals that are valid at any round of the proposed
design. These anytime valid confidence intervals allow us to conduct
rate-optimal sequential hypothesis testing, allowing for early stopping and
reducing necessary sample size.
arXiv link: http://arxiv.org/abs/2002.05308v7
Bounds on direct and indirect effects under treatment/mediator endogeneity and outcome attrition
indirect mechanism operating through an intermediate outcome or mediator, as
well as the direct effect of the treatment on the outcome of interest. However,
the evaluation of direct and indirect effects is frequently complicated by
non-ignorable selection into the treatment and/or mediator, even after
controlling for observables, as well as sample selection/outcome attrition. We
propose a method for bounding direct and indirect effects in the presence of
such complications using a method that is based on a sequence of linear
programming problems. Considering inverse probability weighting by propensity
scores, we compute the weights that would yield identification in the absence
of complications and perturb them by an entropy parameter reflecting a specific
amount of propensity score misspecification to set-identify the effects of
interest. We apply our method to data from the National Longitudinal Survey of
Youth 1979 to derive bounds on the explained and unexplained components of a
gender wage gap decomposition that is likely prone to non-ignorable mediator
selection and outcome attrition.
arXiv link: http://arxiv.org/abs/2002.05253v3
A Hierarchy of Limitations in Machine Learning
Machine learning has focused on the usefulness of probability models for
prediction in social systems, but is only now coming to grips with the ways in
which these models are wrong---and the consequences of those shortcomings. This
paper attempts a comprehensive, structured overview of the specific conceptual,
procedural, and statistical limitations of models in machine learning when
applied to society. Machine learning modelers themselves can use the described
hierarchy to identify possible failure points and think through how to address
them, and consumers of machine learning models can know what to question when
confronted with the decision about if, where, and how to apply machine
learning. The limitations go from commitments inherent in quantification
itself, through to showing how unmodeled dependencies can lead to
cross-validation being overly optimistic as a way of assessing model
performance.
arXiv link: http://arxiv.org/abs/2002.05193v2
Efficient Policy Learning from Surrogate-Loss Classification Reductions
importance of efficient policy evaluation and has proposed reductions to
weighted (cost-sensitive) classification. But, efficient policy evaluation need
not yield efficient estimation of policy parameters. We consider the estimation
problem given by a weighted surrogate-loss classification reduction of policy
learning with any score function, either direct, inverse-propensity weighted,
or doubly robust. We show that, under a correct specification assumption, the
weighted classification formulation need not be efficient for policy
parameters. We draw a contrast to actual (possibly weighted) binary
classification, where correct specification implies a parametric model, while
for policy learning it only implies a semiparametric model. In light of this,
we instead propose an estimation approach based on generalized method of
moments, which is efficient for the policy parameters. We propose a particular
method based on recent developments on solving moment problems using neural
networks and demonstrate the efficiency and regret benefits of this method
empirically.
arXiv link: http://arxiv.org/abs/2002.05153v1
Generalized Poisson Difference Autoregressive Processes
integers with sign. The increments of process are Poisson differences and the
dynamics has an autoregressive structure. We study the properties of the
process and exploit the thinning representation to derive stationarity
conditions and the stationary distribution of the process. We provide a
Bayesian inference method and an efficient posterior approximation procedure
based on Monte Carlo. Numerical illustrations on both simulated and real data
show the effectiveness of the proposed inference.
arXiv link: http://arxiv.org/abs/2002.04470v1
The Dimension of the Set of Causal Solutions of Linear Multivariate Rational Expectations Models
structural difference equation obtained from a linear multivariate rational
expectations model. First, it is shown that the number of free parameters
depends on the structure of the zeros at zero of a certain matrix polynomial of
the structural difference equation and the number of inputs of the rational
expectations model. Second, the implications of requiring that some components
of the endogenous variables be predetermined are analysed. Third, a condition
for existence and uniqueness of a causal stationary solution is given.
arXiv link: http://arxiv.org/abs/2002.04369v1
Identifiability and Estimation of Possibly Non-Invertible SVARMA Models: A New Parametrisation
likelihood (ML) estimation of possibly non-invertible structural vector
autoregressive moving average (SVARMA) models driven by independent and
non-Gaussian shocks. In contrast to previous literature, the novel
representation of the MA polynomial matrix using the Wiener-Hopf factorisation
(WHF) focuses on the multivariate nature of the model, generates insights into
its structure, and uses this structure for devising optimisation algorithms. In
particular, it allows to parameterise the location of determinantal zeros
inside and outside the unit circle, and it allows for MA zeros at zero, which
can be interpreted as informational delays. This is highly relevant for
data-driven evaluation of Dynamic Stochastic General Equilibrium (DSGE) models.
Typically imposed identifying restrictions on the shock transmission matrix as
well as on the determinantal root location are made testable. Furthermore, we
provide low level conditions for asymptotic normality of the ML estimator and
analytic expressions for the score and the information matrix. As application,
we estimate the Blanchard and Quah model and show that our method provides
further insights regarding non-invertibility using a standard macroeconometric
model. These and further analyses are implemented in a well documented
R-package.
arXiv link: http://arxiv.org/abs/2002.04346v2
Sequential Monitoring of Changes in Housing Prices
estate markets. The changes in the real estate prices are modeled by a
combination of linear and autoregressive terms. The monitoring scheme is based
on a detector and a suitably chosen boundary function. If the detector crosses
the boundary function, a structural break is detected. We provide the
asymptotics for the procedure under the stability null hypothesis and the
stopping time under the change point alternative. Monte Carlo simulation is
used to show the size and the power of our method under several conditions. We
study the real estate markets in Boston, Los Angeles and at the national U.S.
level. We find structural breaks in the markets, and we segment the data into
stationary segments. It is observed that the autoregressive parameter is
increasing but stays below 1.
arXiv link: http://arxiv.org/abs/2002.04101v1
The Effect of Weather Conditions on Fertilizer Applications: A Spatial Dynamic Panel Data Analysis
analyses the effect of climatic variations on this economic sector, by
considering both a huge dataset and a flexible spatio-temporal model
specification. In particular, we study the response of N-fertilizer application
to abnormal weather conditions, while accounting for other relevant control
variables. The dataset consists of gridded data spanning over 21 years
(1993-2013), while the methodological strategy makes use of a spatial dynamic
panel data (SDPD) model that accounts for both space and time fixed effects,
besides dealing with both space and time dependences. Time-invariant short and
long term effects, as well as time-varying marginal effects are also properly
defined, revealing interesting results on the impact of both GDP and weather
conditions on fertilizer utilizations. The analysis considers four
macro-regions -- Europe, South America, South-East Asia and Africa -- to allow
for comparisons among different socio-economic societies. In addition to
finding both spatial (in the form of knowledge spillover effects) and temporal
dependences as well as a good support for the existence of an environmental
Kuznets curve for fertilizer application, the paper shows peculiar responses of
N-fertilization to deviations from normal weather conditions of moisture for
each selected region, calling for ad hoc policy interventions.
arXiv link: http://arxiv.org/abs/2002.03922v2
Markov Switching
time-variation in the parameters in the form of their state- or regime-specific
values. Importantly, this time-variation is governed by a discrete-valued
latent stochastic process with limited memory. More specifically, the current
value of the state indicator is determined only by the value of the state
indicator from the previous period, thus the Markov property, and the
transition matrix. The latter characterizes the properties of the Markov
process by determining with what probability each of the states can be visited
next period, given the state in the current period. This setup decides on the
two main advantages of the Markov switching models. Namely, the estimation of
the probability of state occurrences in each of the sample periods by using
filtering and smoothing methods and the estimation of the state-specific
parameters. These two features open the possibility for improved
interpretations of the parameters associated with specific regimes combined
with the corresponding regime probabilities, as well as for improved
forecasting performance based on persistent regimes and parameters
characterizing them.
arXiv link: http://arxiv.org/abs/2002.03598v1
Asymptotically Optimal Control of a Centralized Dynamic Matching Market with General Utilities
independent Poisson processes at the same rate and independently abandon the
market if not matched after an exponential amount of time with the same mean.
In this centralized market, the utility for the system manager from matching
any buyer and any seller is a general random variable. We consider a sequence
of systems indexed by $n$ where the arrivals in the $n^{th}$ system
are sped up by a factor of $n$. We analyze two families of one-parameter
policies: the population threshold policy immediately matches an arriving agent
to its best available mate only if the number of mates in the system is above a
threshold, and the utility threshold policy matches an arriving agent to its
best available mate only if the corresponding utility is above a threshold.
Using a fluid analysis of the two-dimensional Markov process of buyers and
sellers, we show that when the matching utility distribution is light-tailed,
the population threshold policy with threshold $n{\ln n}$ is
asymptotically optimal among all policies that make matches only at agent
arrival epochs. In the heavy-tailed case, we characterize the optimal threshold
level for both policies. We also study the utility threshold policy in an
unbalanced matching market with heavy-tailed matching utilities and find that
the buyers and sellers have the same asymptotically optimal utility threshold.
We derive optimal thresholds when the matching utility distribution is
exponential, uniform, Pareto, and correlated Pareto. We find that as the right
tail of the matching utility distribution gets heavier, the threshold level of
each policy (and hence market thickness) increases, as does the magnitude by
which the utility threshold policy outperforms the population threshold policy.
arXiv link: http://arxiv.org/abs/2002.03205v2
On Ridership and Frequency
States had attained its lowest level since 1973. If transit agencies hope to
reverse this trend, they must understand how their service allocation policies
affect ridership. This paper is among the first to model ridership trends on a
hyper-local level over time. A Poisson fixed-effects model is developed to
evaluate the ridership elasticity to frequency on weekdays using passenger
count data from Portland, Miami, Minneapolis/St-Paul, and Atlanta between 2012
and 2018. In every agency, ridership is found to be elastic to frequency when
observing the variation between individual route-segments at one point in time.
In other words, the most frequent routes are already the most productive in
terms of passengers per vehicle-trip. When observing the variation within each
route-segment over time, however, ridership is inelastic; each additional
vehicle-trip is expected to generate less ridership than the average bus
already on the route. In three of the four agencies, the elasticity is a
decreasing function of prior frequency, meaning that low-frequency routes are
the most sensitive to changes in frequency. This paper can help transit
agencies anticipate the marginal effect of shifting service throughout the
network. As the quality and availability of passenger count data improve, this
paper can serve as the methodological basis to explore the dynamics of bus
ridership.
arXiv link: http://arxiv.org/abs/2002.02493v3
Dependence-Robust Inference Using Resampled Statistics
The procedures utilize test statistics constructed by resampling in a manner
that does not depend on the unknown correlation structure of the data. We prove
that the statistics are asymptotically normal under the weak requirement that
the target parameter can be consistently estimated at the parametric rate. This
holds for regular estimators under many well-known forms of weak dependence and
justifies the claim of dependence-robustness. We consider applications to
settings with unknown or complicated forms of dependence, with various forms of
network dependence as leading examples. We develop tests for both moment
equalities and inequalities.
arXiv link: http://arxiv.org/abs/2002.02097v4
Sharpe Ratio Analysis in High Dimensions: Residual-Based Nodewise Regression in Factor Models
fitted factor model are used. We apply our results to the analysis of the
consistency of Sharpe ratio estimators when there are many assets in a
portfolio. We allow for an increasing number of assets as well as time
observations of the portfolio. Since the nodewise regression is not feasible
due to the unknown nature of idiosyncratic errors, we provide a
feasible-residual-based nodewise regression to estimate the precision matrix of
errors which is consistent even when number of assets, p, exceeds the time span
of the portfolio, n. In another new development, we also show that the
precision matrix of returns can be estimated consistently, even with an
increasing number of factors and p>n. We show that: (1) with p>n, the Sharpe
ratio estimators are consistent in global minimum-variance and mean-variance
portfolios; and (2) with p>n, the maximum Sharpe ratio estimator is consistent
when the portfolio weights sum to one; and (3) with p<<n, the
maximum-out-of-sample Sharpe ratio estimator is consistent.
arXiv link: http://arxiv.org/abs/2002.01800v5
A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and Interpretability
functions, especially how tastes vary across individuals. Utility
misspecification may lead to biased estimates, inaccurate interpretations and
limited predictability. In this paper, we utilize a neural network to learn
taste representation. Our formulation consists of two modules: a neural network
(TasteNet) that learns taste parameters (e.g., time coefficient) as flexible
functions of individual characteristics; and a multinomial logit (MNL) model
with utility functions defined with expert knowledge. Taste parameters learned
by the neural network are fed into the choice model and link the two modules.
Our approach extends the L-MNL model (Sifringer et al., 2020) by allowing the
neural network to learn the interactions between individual characteristics and
alternative attributes. Moreover, we formalize and strengthen the
interpretability condition - requiring realistic estimates of behavior
indicators (e.g., value-of-time, elasticity) at the disaggregated level, which
is crucial for a model to be suitable for scenario analysis and policy
decisions. Through a unique network architecture and parameter transformation,
we incorporate prior knowledge and guide the neural network to output realistic
behavior indicators at the disaggregated level. We show that TasteNet-MNL
reaches the ground-truth model's predictability and recovers the nonlinear
taste functions on synthetic data. Its estimated value-of-time and choice
elasticities at the individual level are close to the ground truth. On a
publicly available Swissmetro dataset, TasteNet-MNL outperforms benchmarking
MNLs and Mixed Logit model's predictability. It learns a broader spectrum of
taste variations within the population and suggests a higher average
value-of-time.
arXiv link: http://arxiv.org/abs/2002.00922v2
Profit-oriented sales forecasting: a comparison of forecasting techniques from a business perspective
problem that arises in any forecasting application. Decades of research have
resulted into an enormous amount of forecasting methods that stem from
statistics, econometrics and machine learning (ML), which leads to a very
difficult and elaborate choice to make in any forecasting exercise. This paper
aims to facilitate this process for high-level tactical sales forecasts by
comparing a large array of techniques for 35 times series that consist of both
industry data from the Coca-Cola Company and publicly available datasets.
However, instead of solely focusing on the accuracy of the resulting forecasts,
this paper introduces a novel and completely automated profit-driven approach
that takes into account the expected profit that a technique can create during
both the model building and evaluation process. The expected profit function
that is used for this purpose, is easy to understand and adaptable to any
situation by combining forecasting accuracy with business expertise.
Furthermore, we examine the added value of ML techniques, the inclusion of
external factors and the use of seasonal models in order to ascertain which
type of model works best in tactical sales forecasting. Our findings show that
simple seasonal time series models consistently outperform other methodologies
and that the profit-driven approach can lead to selecting a different
forecasting model.
arXiv link: http://arxiv.org/abs/2002.00949v1
NAPLES;Mining the lead-lag Relationship from Non-synchronous and High-frequency Data
delayed effect on a given time series caused by another time series. lead-lag
effects are ubiquitous in practice and are specifically critical in formulating
investment strategies in high-frequency trading. At present, there are three
major challenges in analyzing the lead-lag effects. First, in practical
applications, not all time series are observed synchronously. Second, the size
of the relevant dataset and rate of change of the environment is increasingly
faster, and it is becoming more difficult to complete the computation within a
particular time limit. Third, some lead-lag effects are time-varying and only
last for a short period, and their delay lengths are often affected by external
factors. In this paper, we propose NAPLES (Negative And Positive lead-lag
EStimator), a new statistical measure that resolves all these problems. Through
experiments on artificial and real datasets, we demonstrate that NAPLES has a
strong correlation with the actual lead-lag effects, including those triggered
by significant macroeconomic announcements.
arXiv link: http://arxiv.org/abs/2002.00724v1
Efficient representation of supply and demand curves on day-ahead electricity markets
auction in a parsimonious way. Our main task is to build an appropriate
algorithm to present the information about electricity prices and demands with
far less parameters than the original one. We represent each curve using
mesh-free interpolation techniques based on radial basis function
approximation. We describe results of this method for the day-ahead IPEX spot
price of Italy.
arXiv link: http://arxiv.org/abs/2002.00507v1
Variable-lag Granger Causality and Transfer Entropy for Time Series Analysis
series data, commonly used in the social and biological sciences. Typical
operationalizations of Granger causality make a strong assumption that every
time point of the effect time series is influenced by a combination of other
time series with a fixed time delay. The assumption of fixed time delay also
exists in Transfer Entropy, which is considered to be a non-linear version of
Granger causality. However, the assumption of the fixed time delay does not
hold in many applications, such as collective behavior, financial markets, and
many natural phenomena. To address this issue, we develop Variable-lag Granger
causality and Variable-lag Transfer Entropy, generalizations of both Granger
causality and Transfer Entropy that relax the assumption of the fixed time
delay and allow causes to influence effects with arbitrary time delays. In
addition, we propose methods for inferring both variable-lag Granger causality
and Transfer Entropy relations. In our approaches, we utilize an optimal
warping path of Dynamic Time Warping (DTW) to infer variable-lag causal
relations. We demonstrate our approaches on an application for studying
coordinated collective behavior and other real-world casual-inference datasets
and show that our proposed approaches perform better than several existing
methods in both simulated and real-world datasets. Our approaches can be
applied in any domain of time series analysis. The software of this work is
available in the R-CRAN package: VLTimeCausality.
arXiv link: http://arxiv.org/abs/2002.00208v3
Natural Experiments
refers to an experiment where a treatment is randomly assigned by someone other
than the researcher. In another interpretation, it refers to a study in which
there is no controlled random assignment, but treatment is assigned by some
external factor in a way that loosely resembles a randomized experiment---often
described as an "as if random" assignment. In yet another interpretation, it
refers to any non-randomized study that compares a treatment to a control
group, without any specific requirements on how the treatment is assigned. I
introduce an alternative definition that seeks to clarify the integral features
of natural experiments and at the same time distinguish them from randomized
controlled experiments. I define a natural experiment as a research study where
the treatment assignment mechanism (i) is neither designed nor implemented by
the researcher, (ii) is unknown to the researcher, and (iii) is probabilistic
by virtue of depending on an external factor. The main message of this
definition is that the difference between a randomized controlled experiment
and a natural experiment is not a matter of degree, but of essence, and thus
conceptualizing a natural experiment as a research design akin to a randomized
experiment is neither rigorous nor a useful guide to empirical analysis. Using
my alternative definition, I discuss how a natural experiment differs from a
traditional observational study, and offer practical recommendations for
researchers who wish to use natural experiments to study causal effects.
arXiv link: http://arxiv.org/abs/2002.00202v1
Estimating Welfare Effects in a Nonparametric Choice Model: The Case of School Vouchers
willingness to pay for a price subsidy and its effects on demand given
exogenous, discrete variation in prices. Our starting point is a nonparametric,
nonseparable model of choice. We exploit the insight that our welfare
parameters in this model can be expressed as functions of demand for the
different alternatives. However, while the variation in the data reveals the
value of demand at the observed prices, the parameters generally depend on its
values beyond these prices. We show how to sharply characterize what we can
learn when demand is specified to be entirely nonparametric or to be
parameterized in a flexible manner, both of which imply that the parameters are
not necessarily point identified. We use our tools to analyze the welfare
effects of price subsidies provided by school vouchers in the DC Opportunity
Scholarship Program. We find that the provision of the status quo voucher and a
wide range of counterfactual vouchers of different amounts can have positive
and potentially large benefits net of costs. The positive effect can be
explained by the popularity of low-tuition schools in the program; removing
them from the program can result in a negative net benefit. We also find that
various standard logit specifications, in comparison, limit attention to demand
functions with low demand for the voucher, which do not capture the large
magnitudes of benefits credibly consistent with the data.
arXiv link: http://arxiv.org/abs/2002.00103v6
Blocked Clusterwise Regression
heterogeneity in panel data by assigning each cross-sectional unit a
one-dimensional, discrete latent type. Such models have been shown to allow
estimation and inference by regression clustering methods. This paper is
motivated by the finding that the clustered heterogeneity models studied in
this literature can be badly misspecified, even when the panel has significant
discrete cross-sectional structure. To address this issue, we generalize
previous approaches to discrete unobserved heterogeneity by allowing each unit
to have multiple, imperfectly-correlated latent variables that describe its
response-type to different covariates. We give inference results for a k-means
style estimator of our model and develop information criteria to jointly select
the number clusters for each latent variable. Monte Carlo simulations confirm
our theoretical results and give intuition about the finite-sample performance
of estimation and model selection. We also contribute to the theory of
clustering with an over-specified number of clusters and derive new convergence
rates for this setting. Our results suggest that over-fitting can be severe in
k-means style estimators when the number of clusters is over-specified.
arXiv link: http://arxiv.org/abs/2001.11130v1
Functional Sequential Treatment Allocation with Covariates
of the covariate vector, instead of targeting the treatment with highest
conditional expectation, the decision maker targets the treatment which
maximizes a general functional of the conditional potential outcome
distribution, e.g., a conditional quantile, trimmed mean, or a socio-economic
functional such as an inequality, welfare or poverty measure. We develop
expected regret lower bounds for this problem, and construct a near minimax
optimal assignment policy.
arXiv link: http://arxiv.org/abs/2001.10996v1
Frequentist Shrinkage under Inequality Constraints
constraints motivated by economic theory. We propose an Inequality Constrained
Shrinkage Estimator (ICSE) which takes the form of a weighted average between
the unconstrained and inequality constrained estimators with the data dependent
weight. The weight drives both the direction and degree of shrinkage. We use a
local asymptotic framework to derive the asymptotic distribution and risk of
the ICSE. We provide conditions under which the asymptotic risk of the ICSE is
strictly less than that of the unrestricted extremum estimator. The degree of
shrinkage cannot be consistently estimated under the local asymptotic
framework. To address this issue, we propose a feasible plug-in estimator and
investigate its finite sample behavior. We also apply our framework to gasoline
demand estimation under the Slutsky restriction.
arXiv link: http://arxiv.org/abs/2001.10586v1
Skills to not fall behind in school
social-emotional skills can be in determining people's quality of life.
Although skills are of great importance in many aspects, in this paper we will
focus our efforts to better understand the relationship between several types
of skills with academic progress delay. Our dataset contains the same students
in 2012 and 2017, and we consider that there was a academic progress delay for
a specific student if he or she progressed less than expected in school grades.
Our methodology primarily includes the use of a Bayesian logistic regression
model and our results suggest that both cognitive and social-emotional skills
may impact the conditional probability of falling behind in school, and the
magnitude of the impact between the two types of skills can be comparable.
arXiv link: http://arxiv.org/abs/2001.10519v1
Risk Fluctuation Characteristics of Internet Finance: Combining Industry Characteristics with Ecological Value
development. Due to the pressure of competition, most technology companies,
including Internet finance companies, continue to explore new markets and new
business. Funding subsidies and resource inputs have led to significant
business income tendencies in financial statements. This tendency of business
income is often manifested as part of the business loss or long-term
unprofitability. We propose a risk change indicator (RFR) and compare the risk
indicator of fourteen representative companies. This model combines extreme
risk value with slope, and the combination method is simple and effective. The
results of experiment show the potential of this model. The risk volatility of
technology enterprises including Internet finance enterprises is highly
cyclical, and the risk volatility of emerging Internet fintech companies is
much higher than that of other technology companies.
arXiv link: http://arxiv.org/abs/2001.09798v1
Estimating Marginal Treatment Effects under Unobserved Group Heterogeneity
classified into unobserved groups based on heterogeneous treatment rules. Using
a finite mixture approach, we propose a marginal treatment effect (MTE)
framework in which the treatment choice and outcome equations can be
heterogeneous across groups. Under the availability of instrumental variables
specific to each group, we show that the MTE for each group can be separately
identified. Based on our identification result, we propose a two-step
semiparametric procedure for estimating the group-wise MTE. We illustrate the
usefulness of the proposed method with an application to economic returns to
college education.
arXiv link: http://arxiv.org/abs/2001.09560v6
Bayesian Panel Quantile Regression for Binary Outcomes with Correlated Random Effects: An Application on Crime Recidivism in Canada
regression with binary outcomes in the presence of correlated random effects.
We construct a working likelihood using an asymmetric Laplace (AL) error
distribution and combine it with suitable prior distributions to obtain the
complete joint posterior distribution. For posterior inference, we propose two
Markov chain Monte Carlo (MCMC) algorithms but prefer the algorithm that
exploits the blocking procedure to produce lower autocorrelation in the MCMC
draws. We also explain how to use the MCMC draws to calculate the marginal
effects, relative risk and odds ratio. The performance of our preferred
algorithm is demonstrated in multiple simulation studies and shown to perform
extremely well. Furthermore, we implement the proposed framework to study crime
recidivism in Quebec, a Canadian Province, using a novel data from the
administrative correctional files. Our results suggest that the recently
implemented "tough-on-crime" policy of the Canadian government has been largely
successful in reducing the probability of repeat offenses in the post-policy
period. Besides, our results support existing findings on crime recidivism and
offer new insights at various quantiles.
arXiv link: http://arxiv.org/abs/2001.09295v1
Saddlepoint approximations for spatial panel data models
likelihood estimator in a spatial panel data model, with fixed effects,
time-varying covariates, and spatially correlated errors. Our saddlepoint
density and tail area approximation feature relative error of order
$O(1/(n(T-1)))$ with $n$ being the cross-sectional dimension and $T$ the
time-series dimension. The main theoretical tool is the tilted-Edgeworth
technique in a non-identically distributed setting. The density approximation
is always non-negative, does not need resampling, and is accurate in the tails.
Monte Carlo experiments on density approximation and testing in the presence of
nuisance parameters illustrate the good performance of our approximation over
first-order asymptotics and Edgeworth expansions. An empirical application to
the investment-saving relationship in OECD (Organisation for Economic
Co-operation and Development) countries shows disagreement between testing
results based on first-order asymptotics and saddlepoint techniques.
arXiv link: http://arxiv.org/abs/2001.10377v3
Oracle Efficient Estimation of Structural Breaks in Cointegrating Regressions
estimate structural breaks in cointegrating regressions. It is well-known that
the group lasso estimator is not simultaneously estimation consistent and model
selection consistent in structural break settings. Hence, we use a first step
group lasso estimation of a diverging number of breakpoint candidates to
produce weights for a second adaptive group lasso estimation. We prove that
parameter changes are estimated consistently by group lasso and show that the
number of estimated breaks is greater than the true number but still
sufficiently close to it. Then, we use these results and prove that the
adaptive group lasso has oracle properties if weights are obtained from our
first step estimation. Simulation results show that the proposed estimator
delivers the expected results. An economic application to the long-run US money
demand function demonstrates the practical importance of this methodology.
arXiv link: http://arxiv.org/abs/2001.07949v4
Fundamental Limits of Testing the Independence of Irrelevant Alternatives in Discrete Choice
Independence of Irrelevant Alternatives (IIA), are together the most widely
used tools of discrete choice. The MNL model serves as the workhorse model for
a variety of fields, but is also widely criticized, with a large body of
experimental literature claiming to document real-world settings where IIA
fails to hold. Statistical tests of IIA as a modelling assumption have been the
subject of many practical tests focusing on specific deviations from IIA over
the past several decades, but the formal size properties of hypothesis testing
IIA are still not well understood. In this work we replace some of the
ambiguity in this literature with rigorous pessimism, demonstrating that any
general test for IIA with low worst-case error would require a number of
samples exponential in the number of alternatives of the choice problem. A
major benefit of our analysis over previous work is that it lies entirely in
the finite-sample domain, a feature crucial to understanding the behavior of
tests in the common data-poor settings of discrete choice. Our lower bounds are
structure-dependent, and as a potential cause for optimism, we find that if one
restricts the test of IIA to violations that can occur in a specific collection
of choice sets (e.g., pairs), one obtains structure-dependent lower bounds that
are much less pessimistic. Our analysis of this testing problem is unorthodox
in being highly combinatorial, counting Eulerian orientations of cycle
decompositions of a particular bipartite graph constructed from a data set of
choices. By identifying fundamental relationships between the comparison
structure of a given testing problem and its sample efficiency, we hope these
relationships will help lay the groundwork for a rigorous rethinking of the IIA
testing problem as well as other testing problems in discrete choice.
arXiv link: http://arxiv.org/abs/2001.07042v1
Efficient and Robust Estimation of the Generalized LATE Model
local average treatment effect (GLATE) model, a generalization of the classical
LATE model encompassing multi-valued treatment and instrument. We derive the
efficient influence function (EIF) and the semiparametric efficiency bound
(SPEB) for two types of parameters: local average structural function (LASF)
and local average structural function for the treated (LASF-T). The moment
condition generated by the EIF satisfies two robustness properties: double
robustness and Neyman orthogonality. Based on the robust moment condition, we
propose the double/debiased machine learning (DML) estimators for LASF and
LASF-T. The DML estimator is semiparametric efficient and suitable for high
dimensional settings. We also propose null-restricted inference methods that
are robust against weak identification issues. As an empirical application, we
study the effects across different sources of health insurance by applying the
developed methods to the Oregon Health Insurance Experiment.
arXiv link: http://arxiv.org/abs/2001.06746v2
A tail dependence-based MST and their topological indicators in modelling systemic risk in the European insurance sector
insurance companies that result from market price channels. In our analysis we
assume that the stock quotations of insurance companies reflect market
sentiments which constitute a very important systemic risk factor.
Interlinkages between insurers and their dynamics have a direct impact on
systemic risk contagion in the insurance sector. We propose herein a new hybrid
approach to the analysis of interlinkages dynamics based on combining the
copula-DCC-GARCH model and Minimum Spanning Trees (MST). Using the
copula-DCC-GARCH model we determine the tail dependence coefficients. Then, for
each analysed period we construct MST based on these coefficients. The dynamics
is analysed by means of time series of selected topological indicators of the
MSTs in the years 2005-2019. Our empirical results show the usefulness of the
proposed approach to the analysis of systemic risk in the insurance sector. The
times series obtained from the proposed hybrid approach reflect the phenomena
occurring on the market. The analysed MST topological indicators can be
considered as systemic risk predictors.
arXiv link: http://arxiv.org/abs/2001.06567v2
Entropy Balancing for Continuous Treatments
extending the original entropy balancing methodology of Hainm\"uller (2012). In
order to estimate balancing weights, the proposed approach solves a globally
convex constrained optimization problem. EBCT weights reliably eradicate
Pearson correlations between covariates and the continuous treatment variable.
This is the case even when other methods based on the generalized propensity
score tend to yield insufficient balance due to strong selection into different
treatment intensities. Moreover, the optimization procedure is more successful
in avoiding extreme weights attached to a single unit. Extensive Monte-Carlo
simulations show that treatment effect estimates using EBCT display similar or
lower bias and uniformly lower root mean squared error. These properties make
EBCT an attractive method for the evaluation of continuous treatments.
arXiv link: http://arxiv.org/abs/2001.06281v2
Distributional synthetic controls
evaluating causal effects of policy changes to quantile functions. The proposed
method provides a geometrically faithful estimate of the entire counterfactual
quantile function of the treated unit. Its appeal stems from an efficient
implementation via a constrained quantile-on-quantile regression. This
constitutes a novel concept of independent interest. The method provides a
unique counterfactual quantile function in any scenario: for continuous,
discrete or mixed distributions. It operates in both repeated cross-sections
and panel data with as little as a single pre-treatment period. The article
also provides abstract identification results by showing that any synthetic
controls method, classical or our generalization, provides the correct
counterfactual for causal models that preserve distances between the outcome
distributions. Working with whole quantile functions instead of aggregate
values allows for tests of equality and stochastic dominance of the
counterfactual- and the observed distribution. It can provide causal inference
on standard outcomes like average- or quantile treatment effects, but also more
general concepts such as counterfactual Lorenz curves or interquartile ranges.
arXiv link: http://arxiv.org/abs/2001.06118v5
Recovering Network Structure from Aggregated Relational Data using Penalized Regression
aggregated relational data (ARD) as a low-cost substitute that can be used to
recover the structure of a latent social network when it is generated by a
specific parametric random effects model. Our main observation is that many
economic network formation models produce networks that are effectively
low-rank. As a consequence, network recovery from ARD is generally possible
without parametric assumptions using a nuclear-norm penalized regression. We
demonstrate how to implement this method and provide finite-sample bounds on
the mean squared error of the resulting estimator for the distribution of
network links. Computation takes seconds for samples with hundreds of
observations. Easy-to-use code in R and Python can be found at
https://github.com/mpleung/ARD.
arXiv link: http://arxiv.org/abs/2001.06052v1
Sparse Covariance Estimation in Logit Mixture Models
covariance matrices of the random coefficients in logit mixture models.
Researchers typically specify covariance matrices in logit mixture models under
one of two extreme assumptions: either an unrestricted full covariance matrix
(allowing correlations between all random coefficients), or a restricted
diagonal matrix (allowing no correlations at all). Our objective is to find
optimal subsets of correlated coefficients for which we estimate covariances.
We propose a new estimator, called MISC, that uses a mixed-integer optimization
(MIO) program to find an optimal block diagonal structure specification for the
covariance matrix, corresponding to subsets of correlated coefficients, for any
desired sparsity level using Markov Chain Monte Carlo (MCMC) posterior draws
from the unrestricted full covariance matrix. The optimal sparsity level of the
covariance matrix is determined using out-of-sample validation. We demonstrate
the ability of MISC to correctly recover the true covariance structure from
synthetic data. In an empirical illustration using a stated preference survey
on modes of transportation, we use MISC to obtain a sparse covariance matrix
indicating how preferences for attributes are related to one another.
arXiv link: http://arxiv.org/abs/2001.05034v1
A Higher-Order Correct Fast Moving-Average Bootstrap for Dependent Data
scheme is based on the i.i.d. resampling of the smoothed moment indicators. We
characterize the class of parametric and semi-parametric estimation problems
for which the method is valid. We show the asymptotic refinements of the
proposed procedure, proving that it is higher-order correct under mild
assumptions on the time series, the estimating functions, and the smoothing
kernel. We illustrate the applicability and the advantages of our procedure for
Generalized Empirical Likelihood estimation. As a by-product, our fast
bootstrap provides higher-order correct asymptotic confidence distributions.
Monte Carlo simulations on an autoregressive conditional duration model provide
numerical evidence that the novel bootstrap yields higher-order accurate
confidence intervals. A real-data application on dynamics of trading volume of
stocks illustrates the advantage of our method over the routinely-applied
first-order asymptotic theory, when the underlying distribution of the test
statistic is skewed or fat-tailed.
arXiv link: http://arxiv.org/abs/2001.04867v2
Panel Data Quantile Regression for Treatment Effect Models
effects (QTE) under rank invariance and rank stationarity assumptions. Ishihara
(2020) explores identification of the nonseparable panel data model under these
assumptions and proposes a parametric estimation based on the minimum distance
method. However, when the dimensionality of the covariates is large, the
minimum distance estimation using this process is computationally demanding. To
overcome this problem, we propose a two-step estimation method based on the
quantile regression and minimum distance methods. We then show the uniform
asymptotic properties of our estimator and the validity of the nonparametric
bootstrap. The Monte Carlo studies indicate that our estimator performs well in
finite samples. Finally, we present two empirical illustrations, to estimate
the distributional effects of insurance provision on household production and
TV watching on child cognitive development.
arXiv link: http://arxiv.org/abs/2001.04324v3
A multi-country dynamic factor model with stochastic volatility for euro area business cycle analysis
country-specific information on output and inflation to estimate an area-wide
measure of the output gap. Our model assumes that output and inflation can be
decomposed into country-specific stochastic trends and a common cyclical
component. Comovement in the trends is introduced by imposing a factor
structure on the shocks to the latent states. We moreover introduce flexible
stochastic volatility specifications to control for heteroscedasticity in the
measurement errors and innovations to the latent states. Carefully specified
shrinkage priors allow for pushing the model towards a homoscedastic
specification, if supported by the data. Our measure of the output gap closely
tracks other commonly adopted measures, with small differences in magnitudes
and timing. To assess whether the model-based output gap helps in forecasting
inflation, we perform an out-of-sample forecasting exercise. The findings
indicate that our approach yields superior inflation forecasts, both in terms
of point and density predictions.
arXiv link: http://arxiv.org/abs/2001.03935v1
Two-Step Estimation of a Strategic Network Formation Model with Clustering
using data from a single large network. We allow the utility function to be
nonseparable in an individual's link choices to capture the spillover effects
from friends in common. In a network with n individuals, an individual with a
nonseparable utility function chooses between 2^{n-1} overlapping portfolios of
links. We develop a novel approach that applies the Legendre transform to the
utility function so that the optimal link choices can be represented as a
sequence of correlated binary choices. The link dependence that results from
the preference for friends in common is captured by an auxiliary variable
introduced by the Legendre transform. We propose a two-step estimator that is
consistent and asymptotically normal. We also derive a limiting approximation
of the game as n grows large that simplifies the computation in large networks.
We apply these methods to favor exchange networks in rural India and find that
the direction of support from a mutual link matters in facilitating favor
provision.
arXiv link: http://arxiv.org/abs/2001.03838v4
Bayesian Median Autoregression for Robust Time Series Forecasting
forecasting. The proposed method utilizes time-varying quantile regression at
the median, favorably inheriting the robustness of median regression in
contrast to the widely used mean-based methods. Motivated by a working Laplace
likelihood approach in Bayesian quantile regression, BayesMAR adopts a
parametric model bearing the same structure as autoregressive models by
altering the Gaussian error to Laplace, leading to a simple, robust, and
interpretable modeling strategy for time series forecasting. We estimate model
parameters by Markov chain Monte Carlo. Bayesian model averaging is used to
account for model uncertainty, including the uncertainty in the autoregressive
order, in addition to a Bayesian model selection approach. The proposed methods
are illustrated using simulations and real data applications. An application to
U.S. macroeconomic data forecasting shows that BayesMAR leads to favorable and
often superior predictive performance compared to the selected mean-based
alternatives under various loss functions that encompass both point and
probabilistic forecasts. The proposed methods are generic and can be used to
complement a rich class of methods that build on autoregressive models.
arXiv link: http://arxiv.org/abs/2001.01116v2
Logical Differencing in Dyadic Network Formation Models with Nontransferable Utilities
nontransferable utilities (NTU). NTU arises frequently in real-world social
interactions that require bilateral consent, but by its nature induces additive
non-separability. We show how unobserved individual heterogeneity in our model
can be canceled out without additive separability, using a novel method we call
logical differencing. The key idea is to construct events involving the
intersection of two mutually exclusive restrictions on the unobserved
heterogeneity, based on multivariate monotonicity. We provide a consistent
estimator and analyze its performance via simulation, and apply our method to
the Nyakatoke risk-sharing networks.
arXiv link: http://arxiv.org/abs/2001.00691v4
Prediction in locally stationary time series
locally stationary process with a smoothly varying trend and use this statistic
to derive consistent predictors in non-stationary time series. In contrast to
the currently available methods for this problem the predictor developed here
does not rely on fitting an autoregressive model and does not require a
vanishing trend. The finite sample properties of the new methodology are
illustrated by means of a simulation study and a financial indices study.
arXiv link: http://arxiv.org/abs/2001.00419v2
Recovering Latent Variables by Matching
estimate linear models with independent latent variables. The method consists
in generating pseudo-observations from the latent variables, so that the
Euclidean distance between the model's predictions and their matched
counterparts in the data is minimized. We show that our nonparametric estimator
is consistent, and we document that it performs well in simulated data. We
apply this method to study the cyclicality of permanent and transitory income
shocks in the Panel Study of Income Dynamics. We find that the dispersion of
income shocks is approximately acyclical, whereas the skewness of permanent
shocks is procyclical. By comparison, we find that the dispersion and skewness
of shocks to hourly wages vary little with the business cycle.
arXiv link: http://arxiv.org/abs/1912.13081v1
Adaptive Discrete Smoothing for High-Dimensional and Nonlinear Panel Data
high-dimensional and non-linear panel data models. We allow for individual
specific (non-linear) functions and estimation with econometric or machine
learning methods by using weighted observations from other individuals. The
weights are determined by a data-driven way and depend on the similarity
between the corresponding functions and are measured based on initial
estimates. The key feature of such a procedure is that it clusters individuals
based on the distance / similarity between them, estimated in a first stage.
Our estimation method can be combined with various statistical estimation
procedures, in particular modern machine learning methods which are in
particular fruitful in the high-dimensional case and with complex,
heterogeneous data. The approach can be interpreted as a \textquotedblleft
soft-clustering\textquotedblright\ in comparison to
traditional\textquotedblleft\ hard clustering\textquotedblright that assigns
each individual to exactly one group. We conduct a simulation study which shows
that the prediction can be greatly improved by using our estimator. Finally, we
analyze a big data set from didichuxing.com, a leading company in
transportation industry, to analyze and predict the gap between supply and
demand based on a large set of covariates. Our estimator clearly performs much
better in out-of-sample prediction compared to existing linear panel data
estimators.
arXiv link: http://arxiv.org/abs/1912.12867v2
Priority to unemployed immigrants? A causal machine learning evaluation of training in Belgium
market effects of three training programmes at various aggregation levels using
Modified Causal Forests, a causal machine learning estimator. While all
programmes have positive effects after the lock-in period, we find substantial
heterogeneity across programmes and unemployed. Simulations show that
'black-box' rules that reassign unemployed to programmes that maximise
estimated individual gains can considerably improve effectiveness: up to 20
percent more (less) time spent in (un)employment within a 30 months window. A
shallow policy tree delivers a simple rule that realizes about 70 percent of
this gain.
arXiv link: http://arxiv.org/abs/1912.12864v4
Credit Risk: Simple Closed Form Approximate Maximum Likelihood Estimator
models for conditional default probabilities for corporate loans where we
develop simple closed form approximations to the maximum likelihood estimator
(MLE) when the underlying covariates follow a stationary Gaussian process. In a
practically reasonable asymptotic regime where the default probabilities are
small, say 1-3% annually, the number of firms and the time period of data
available is reasonably large, we rigorously show that the proposed estimator
behaves similarly or slightly worse than the MLE when the underlying model is
correctly specified. For more realistic case of model misspecification, both
estimators are seen to be equally good, or equally bad. Further, beyond a
point, both are more-or-less insensitive to increase in data. These conclusions
are validated on empirical and simulated data. The proposed approximations
should also have applications outside finance, where logit-type models are used
and probabilities of interest are small.
arXiv link: http://arxiv.org/abs/1912.12611v1
Bayesian estimation of large dimensional time varying VARs using copulas
estimation of large multivariate VARs with time variation in the conditional
mean equations and/or in the covariance structure. With our new methodology,
the original multivariate, n dimensional model is treated as a set of n
univariate estimation problems, and cross-dependence is handled through the use
of a copula. Thus, only univariate distribution functions are needed when
estimating the individual equations, which are often available in closed form,
and easy to handle with MCMC (or other techniques). Estimation is carried out
in parallel for the individual equations. Thereafter, the individual posteriors
are combined with the copula, so obtaining a joint posterior which can be
easily resampled. We illustrate our approach by applying it to a large
time-varying parameter VAR with 25 macroeconomic variables.
arXiv link: http://arxiv.org/abs/1912.12527v1
Minimax Semiparametric Learning With Approximate Sparsity
in statistics. In high-dimensional contexts, this estimation is often performed
under the assumption of exact model sparsity, meaning that only a small number
of parameters are precisely non-zero. This excludes models where linear
formulations only approximate the underlying data distribution, such as
nonparametric regression methods that use basis expansion such as splines,
kernel methods or polynomial regressions. Many recent methods for root-$n$
estimation have been proposed, but the implications of exact model sparsity
remain largely unexplored. In particular, minimax optimality for models that
are not exactly sparse has not yet been developed. This paper formalizes the
concept of approximate sparsity through classical semi-parametric theory. We
derive minimax rates under this formulation for a regression slope and an
average derivative, finding these bounds to be substantially larger than those
in low-dimensional, semi-parametric settings. We identify several new
phenomena. We discover new regimes where rate double robustness does not hold,
yet root-$n$ estimation is still possible. In these settings, we propose an
estimator that achieves minimax optimal rates. Our findings further reveal
distinct optimality boundaries for ordered versus unordered nonparametric
regression estimation.
arXiv link: http://arxiv.org/abs/1912.12213v7
Pareto models for risk management
formulas can be derived for financial downside risk measures (Value-at-Risk,
Expected Shortfall) or reinsurance premiums and related quantities (Large Claim
Index, Return Period). Nevertheless, in practice, distributions are (strictly)
Pareto only in the tails, above (possible very) large threshold. Therefore, it
could be interesting to take into account second order behavior to provide a
better fit. In this article, we present how to go from a strict Pareto model to
Pareto-type distributions. We discuss inference, and derive formulas for
various measures and indices, and finally provide applications on insurance
losses and financial risks.
arXiv link: http://arxiv.org/abs/1912.11736v1
Probability Assessments of an Ice-Free Arctic: Comparing Statistical and Climate Model Projections
environmental and economic consequences including important effects on the pace
and intensity of global climate change. Based on several decades of satellite
data, we provide statistical forecasts of Arctic sea ice extent during the rest
of this century. The best fitting statistical model indicates that overall sea
ice coverage is declining at an increasing rate. By contrast, average
projections from the CMIP5 global climate models foresee a gradual slowing of
Arctic sea ice loss even in scenarios with high carbon emissions. Our
long-range statistical projections also deliver probability assessments of the
timing of an ice-free Arctic. These results indicate almost a 60 percent chance
of an effectively ice-free Arctic Ocean sometime during the 2030s -- much
earlier than the average projection from the global climate models.
arXiv link: http://arxiv.org/abs/1912.10774v4
Improved Central Limit Theorem and bootstrap approximations in high dimensions
distribution of the max statistic in high dimensions. This statistic takes the
form of the maximum over components of the sum of independent random vectors
and its distribution plays a key role in many high-dimensional econometric
problems. Using a novel iterative randomized Lindeberg method, the paper
derives new bounds for the distributional approximation errors. These new
bounds substantially improve upon existing ones and simultaneously allow for a
larger class of bootstrap methods.
arXiv link: http://arxiv.org/abs/1912.10529v2
Building and Testing Yield Curve Generators for P&C Insurance
companies tend to be highly leveraged, with bond holdings much greater than
capital. For GAAP capital, bonds are marked to market but liabilities are not,
so shifts in the yield curve can have a significant impact on capital.
Yield-curve scenario generators are one approach to quantifying this risk. They
produce many future simulated evolutions of the yield curve, which can be used
to quantify the probabilities of bond-value changes that would result from
various maturity-mix strategies. Some of these generators are provided as
black-box models where the user gets only the projected scenarios. One focus of
this paper is to provide methods for testing generated scenarios from such
models by comparing to known distributional properties of yield curves.
P&C insurers hold bonds to maturity and manage cash-flow risk by matching
asset and liability flows. Derivative pricing and stochastic volatility are of
little concern over the relevant time frames. This requires different models
and model testing than what is common in the broader financial markets.
To complicate things further, interest rates for the last decade have not
been following the patterns established in the sixty years following WWII. We
are now coming out of the period of very low rates, yet are still not returning
to what had been thought of as normal before that. Modeling and model testing
are in an evolving state while new patterns emerge.
Our analysis starts with a review of the literature on interest-rate model
testing, with a P&C focus, and an update of the tests for current market
behavior. We then discuss models, and use them to illustrate the fitting and
testing methods. The testing discussion does not require the model-building
section.
arXiv link: http://arxiv.org/abs/1912.10526v1
Efficient and Convergent Sequential Pseudo-Likelihood Estimation of Dynamic Discrete Games
dynamic discrete choice games of incomplete information. k-EPL considers the
joint behavior of multiple players simultaneously, as opposed to individual
responses to other agents' equilibrium play. This, in addition to reframing the
problem from conditional choice probability (CCP) space to value function
space, yields a computationally tractable, stable, and efficient estimator. We
show that each iteration in the k-EPL sequence is consistent and asymptotically
efficient, so the first-order asymptotic properties do not vary across
iterations. Furthermore, we show the sequence achieves higher-order equivalence
to the finite-sample maximum likelihood estimator with iteration and that the
sequence of estimators converges almost surely to the maximum likelihood
estimator at a nearly-superlinear rate when the data are generated by any
regular Markov perfect equilibrium, including equilibria that lead to
inconsistency of other sequential estimators. When utility is linear in
parameters, k-EPL iterations are computationally simple, only requiring that
the researcher solve linear systems of equations to generate pseudo-regressors
which are used in a static logit/probit regression. Monte Carlo simulations
demonstrate the theoretical results and show k-EPL's good performance in finite
samples in both small- and large-scale games, even when the game admits
spurious equilibria in addition to one that generated the data. We apply the
estimator to study the role of competition in the U.S. wholesale club industry.
arXiv link: http://arxiv.org/abs/1912.10488v6
ResLogit: A residual neural network logit model for data-driven choice modelling
model.Our proposed Residual Logit (ResLogit) model formulation seamlessly
integrates a Deep Neural Network (DNN) architecture into a multinomial logit
model. Recently, DNN models such as the Multi-layer Perceptron (MLP) and the
Recurrent Neural Network (RNN) have shown remarkable success in modelling
complex and noisy behavioural data. However, econometric studies have argued
that machine learning techniques are a `black-box' and difficult to interpret
for use in the choice analysis.We develop a data-driven choice model that
extends the systematic utility function to incorporate non-linear cross-effects
using a series of residual layers and using skipped connections to handle model
identifiability in estimating a large number of parameters.The model structure
accounts for cross-effects and choice heterogeneity arising from substitution,
interactions with non-chosen alternatives and other effects in a non-linear
manner.We describe the formulation, model estimation, interpretability and
examine the relative performance and econometric implications of our proposed
model.We present an illustrative example of the model on a classic red/blue bus
choice scenario example. For a real-world application, we use a travel mode
choice dataset to analyze the model characteristics compared to traditional
neural networks and Logit formulations.Our findings show that our ResLogit
approach significantly outperforms MLP models while providing similar
interpretability as a Multinomial Logit model.
arXiv link: http://arxiv.org/abs/1912.10058v2
Optimal Dynamic Treatment Regimes and Partial Welfare Ordering
individuals. The optimal dynamic treatment regime is a regime that maximizes
counterfactual welfare. We introduce a framework in which we can partially
learn the optimal dynamic regime from observational data, relaxing the
sequential randomization assumption commonly employed in the literature but
instead using (binary) instrumental variables. We propose the notion of sharp
partial ordering of counterfactual welfares with respect to dynamic regimes and
establish mapping from data to partial ordering via a set of linear programs.
We then characterize the identified set of the optimal regime as the set of
maximal elements associated with the partial ordering. We relate the notion of
partial ordering with a more conventional notion of partial identification
using topological sorts. Practically, topological sorts can be served as a
policy benchmark for a policymaker. We apply our method to understand returns
to schooling and post-school training as a sequence of treatments by combining
data from multiple sources. The framework of this paper can be used beyond the
current context, e.g., in establishing rankings of multiple treatments or
policies across different counterfactual scenarios.
arXiv link: http://arxiv.org/abs/1912.10014v4
Robust Product-line Pricing under Generalized Extreme Value Models
according to a generalized extreme value (GEV) choice model, and the choice
parameters are not known exactly but lie in an uncertainty set. We show that,
when the robust problem is unconstrained and the price sensitivity parameters
are homogeneous, the robust optimal prices have a constant markup over
products, and we provide formulas that allow to compute this constant markup by
bisection. We further show that, in the case that the price sensitivity
parameters are only homogeneous in each partition of the products, under the
assumption that the choice probability generating function and the uncertainty
set are partition-wise separable, a robust solution will have a constant markup
in each subset, and this constant-markup vector can be found efficiently by
convex optimization. We provide numerical results to illustrate the advantages
of our robust approach in protecting from bad scenarios. Our results hold for
convex and bounded uncertainty sets,} and for any arbitrary GEV model,
including the multinomial logit, nested or cross-nested logit.
arXiv link: http://arxiv.org/abs/1912.09552v2
Temporal-Difference estimation of dynamic discrete choice models
structural parameters in dynamic discrete choice models. Our algorithms are
based on the conditional choice probability approach but use functional
approximations to estimate various terms in the pseudo-likelihood function. We
suggest two approaches: The first - linear semi-gradient - provides
approximations to the recursive terms using basis functions. The second -
Approximate Value Iteration - builds a sequence of approximations to the
recursive terms by solving non-parametric estimation problems. Our approaches
are fast and naturally allow for continuous and/or high-dimensional state
spaces. Furthermore, they do not require specification of transition densities.
In dynamic games, they avoid integrating over other players' actions, further
heightening the computational advantage. Our proposals can be paired with
popular existing methods such as pseudo-maximum-likelihood, and we propose
locally robust corrections for the latter to achieve parametric rates of
convergence. Monte Carlo simulations confirm the properties of our algorithms
in practice.
arXiv link: http://arxiv.org/abs/1912.09509v2
Causal Inference and Data Fusion in Econometrics
econometrics. In practice, the validity of these causal inferences is
contingent on a number of critical assumptions regarding the type of data that
has been collected and the substantive knowledge that is available. For
instance, unobserved confounding factors threaten the internal validity of
estimates, data availability is often limited to non-random, selection-biased
samples, causal effects need to be learned from surrogate experiments with
imperfect compliance, and causal knowledge has to be extrapolated across
structurally heterogeneous populations. A powerful causal inference framework
is required to tackle these challenges, which plague most data analysis to
varying degrees. Building on the structural approach to causality introduced by
Haavelmo (1943) and the graph-theoretic framework proposed by Pearl (1995), the
artificial intelligence (AI) literature has developed a wide array of
techniques for causal learning that allow to leverage information from various
imperfect, heterogeneous, and biased data sources (Bareinboim and Pearl, 2016).
In this paper, we discuss recent advances in this literature that have the
potential to contribute to econometric methodology along three dimensions.
First, they provide a unified and comprehensive framework for causal inference,
in which the aforementioned problems can be addressed in full generality.
Second, due to their origin in AI, they come together with sound, efficient,
and complete algorithmic criteria for automatization of the corresponding
identification task. And third, because of the nonparametric description of
structural models that graph-theoretic approaches build on, they combine the
strengths of both structural econometrics as well as the potential outcomes
framework, and thus offer an effective middle ground between these two
literature streams.
arXiv link: http://arxiv.org/abs/1912.09104v4
Regularized Estimation of High-Dimensional Vector AutoRegressions with Weakly Dependent Innovations
regularization procedures in high-dimensional models. In time series context,
it is mostly restricted to Gaussian autoregressions or mixing sequences. We
study oracle properties of LASSO estimation of weakly sparse
vector-autoregressive models with heavy tailed, weakly dependent innovations
with virtually no assumption on the conditional heteroskedasticity. In contrast
to current literature, our innovation process satisfy an $L^1$ mixingale type
condition on the centered conditional covariance matrices. This condition
covers $L^1$-NED sequences and strong ($\alpha$-) mixing sequences as
particular examples.
arXiv link: http://arxiv.org/abs/1912.09002v3
Variable-lag Granger Causality for Time Series Analysis
series data, commonly used in the social and biological sciences. Typical
operationalizations of Granger causality make a strong assumption that every
time point of the effect time series is influenced by a combination of other
time series with a fixed time delay. However, the assumption of the fixed time
delay does not hold in many applications, such as collective behavior,
financial markets, and many natural phenomena. To address this issue, we
develop variable-lag Granger causality, a generalization of Granger causality
that relaxes the assumption of the fixed time delay and allows causes to
influence effects with arbitrary time delays. In addition, we propose a method
for inferring variable-lag Granger causality relations. We demonstrate our
approach on an application for studying coordinated collective behavior and
show that it performs better than several existing methods in both simulated
and real-world datasets. Our approach can be applied in any domain of time
series analysis.
arXiv link: http://arxiv.org/abs/1912.10829v1
Assessing Inference Methods
assess whether their inference methods reliably control false-positive rates.
We show that different assessments involve trade-offs, varying in the types of
problems they may detect, finite-sample performance, susceptibility to
sequential-testing distortions, susceptibility to cherry-picking, and
implementation complexity. We also show that a commonly used simulation to
assess inference methods in shift-share designs can lead to misleading
conclusions and propose alternatives. Overall, we provide novel insights and
recommendations for applied researchers on how to choose, implement, and
interpret inference assessments in their empirical applications.
arXiv link: http://arxiv.org/abs/1912.08772v14
Econometrics For Decision Making: Building Foundations Sketched By Haavelmo And Wald
aiming to make econometrics useful for decision making. His fundamental
contribution has become thoroughly embedded in subsequent econometric research,
yet it could not answer all the deep issues that the author raised. Notably,
Haavelmo struggled to formalize the implications for decision making of the
fact that models can at most approximate actuality. In the same period, Wald
(1939, 1945) initiated his own seminal development of statistical decision
theory. Haavelmo favorably cited Wald, but econometrics did not embrace
statistical decision theory. Instead, it focused on study of identification,
estimation, and statistical inference. This paper proposes statistical decision
theory as a framework for evaluation of the performance of models in decision
making. I particularly consider the common practice of as-if optimization:
specification of a model, point estimation of its parameters, and use of the
point estimate to make a decision that would be optimal if the estimate were
accurate. A central theme is that one should evaluate as-if optimization or any
other model-based decision rule by its performance across the state space,
listing all states of nature that one believes feasible, not across the model
space. I apply the theme to prediction and treatment choice. Statistical
decision theory is conceptually simple, but application is often challenging.
Advancement of computation is the primary task to continue building the
foundations sketched by Haavelmo and Wald.
arXiv link: http://arxiv.org/abs/1912.08726v4
Estimation of Auction Models with Shape Restrictions
in auction models to estimate various objects of interest, including the
distribution of a bidder's valuations, the bidder's ex ante expected surplus,
and the seller's counterfactual revenue. The basic approach applies broadly in
that (unlike most of the literature) it works for a wide range of auction
formats and allows for asymmetric bidders. Though our approach is not
restrictive, we focus our analysis on first--price, sealed--bid auctions with
independent private valuations. We highlight two nonparametric estimation
strategies, one based on a least squares criterion and the other on a maximum
likelihood criterion. We also provide the first direct estimator of the
strategy function. We establish several theoretical properties of our methods
to guide empirical analysis and inference. In addition to providing the
asymptotic distributions of our estimators, we identify ways in which
methodological choices should be tailored to the objects of their interest. For
objects like the bidders' ex ante surplus and the seller's counterfactual
expected revenue with an additional symmetric bidder, we show that our
input--parameter--free estimators achieve the semiparametric efficiency bound.
For objects like the bidders' inverse strategy function, we provide an easily
implementable boundary--corrected kernel smoothing and transformation method in
order to ensure the squared error is integrable over the entire support of the
valuations. An extensive simulation study illustrates our analytical results
and demonstrates the respective advantages of our least--squares and maximum
likelihood estimators in finite samples. Compared to estimation strategies
based on kernel density estimation, the simulations indicate that the smoothed
versions of our estimators enjoy a large degree of robustness to the choice of
an input parameter.
arXiv link: http://arxiv.org/abs/1912.07466v1
Analysis of Regression Discontinuity Designs with Multiple Cutoffs or Multiple Scores
which includes three commands (rdmc, rdmcplot, rdms)
for analyzing Regression Discontinuity (RD) designs with multiple cutoffs or
multiple scores. The command rdmc applies to non-cumulative and
cumulative multi-cutoff RD settings. It calculates pooled and cutoff-specific
RD treatment effects, and provides robust bias-corrected inference procedures.
Post estimation and inference is allowed. The command rdmcplot offers
RD plots for multi-cutoff settings. Finally, the command rdms concerns
multi-score settings, covering in particular cumulative cutoffs and two running
variables contexts. It also calculates pooled and cutoff-specific RD treatment
effects, provides robust bias-corrected inference procedures, and allows for
post-estimation estimation and inference. These commands employ the
Stata (and R) package rdrobust for plotting,
estimation, and inference. Companion R functions with the same syntax
and capabilities are provided.
arXiv link: http://arxiv.org/abs/1912.07346v2
Prediction Intervals for Synthetic Control Methods
interpretation of synthetic control (SC) methods. We develop conditional
prediction intervals in the SC framework, and provide conditions under which
these intervals offer finite-sample probability guarantees. Our method allows
for covariate adjustment and non-stationary data. The construction begins by
noting that the statistical uncertainty of the SC prediction is governed by two
distinct sources of randomness: one coming from the construction of the (likely
misspecified) SC weights in the pre-treatment period, and the other coming from
the unobservable stochastic error in the post-treatment period when the
treatment effect is analyzed. Accordingly, our proposed prediction intervals
are constructed taking into account both sources of randomness. For
implementation, we propose a simulation-based approach along with
finite-sample-based probability bound arguments, naturally leading to
principled sensitivity analysis methods. We illustrate the numerical
performance of our methods using empirical applications and a small simulation
study. Python, R and Stata software packages
implementing our methodology are available.
arXiv link: http://arxiv.org/abs/1912.07120v4
Network Data
(often) rivalrous relationships connecting them to one another. Input sourcing
by firms, interbank lending, scientific research, and job search are four
examples, among many, of networked economic activities. Motivated by the
premise that networks' structures are consequential, this chapter describes
econometric methods for analyzing them. I emphasize (i) dyadic regression
analysis incorporating unobserved agent-specific heterogeneity and supporting
causal inference, (ii) techniques for estimating, and conducting inference on,
summary network parameters (e.g., the degree distribution or transitivity
index); and (iii) empirical models of strategic network formation admitting
interdependencies in preferences. Current research challenges and open
questions are also discussed.
arXiv link: http://arxiv.org/abs/1912.06346v1
Synthetic Control Inference for Staggered Adoption: Estimating the Dynamic Effects of Board Gender Diversity Policies
adoption. Many policies, such as the board gender quota, are replicated by
other policy setters at different time frames. Our method estimates the dynamic
average treatment effects on the treated using variation introduced by the
staggered adoption of policies. Our method gives asymptotically unbiased
estimators of many interesting quantities and delivers asymptotically valid
inference. By using the proposed method and national labor data in Europe, we
find evidence that quota regulation on board diversity leads to a decrease in
part-time employment, and an increase in full-time employment for female
professionals.
arXiv link: http://arxiv.org/abs/1912.06320v1
High-Dimensional Granger Causality Tests with an Application to VIX and News
regularized regressions. To perform proper inference, we rely on
heteroskedasticity and autocorrelation consistent (HAC) estimation of the
asymptotic variance and develop the inferential theory in the high-dimensional
setting. To recognize the time series data structures we focus on the
sparse-group LASSO estimator, which includes the LASSO and the group LASSO as
special cases. We establish the debiased central limit theorem for low
dimensional groups of regression coefficients and study the HAC estimator of
the long-run variance based on the sparse-group LASSO residuals. This leads to
valid time series inference for individual regression coefficients as well as
groups, including Granger causality tests. The treatment relies on a new
Fuk-Nagaev inequality for a class of $\tau$-mixing processes with heavier than
Gaussian tails, which is of independent interest. In an empirical application,
we study the Granger causal relationship between the VIX and financial news.
arXiv link: http://arxiv.org/abs/1912.06307v4
A Regularized Factor-augmented Vector Autoregressive Model
that allows for sparsity in the factor loadings. In this framework, factors may
only load on a subset of variables which simplifies the factor identification
and their economic interpretation. We identify the factors in a data-driven
manner without imposing specific relations between the unobserved factors and
the underlying time series. Using our approach, the effects of structural
shocks can be investigated on economically meaningful factors and on all
observed time series included in the FAVAR model. We prove consistency for the
estimators of the factor loadings, the covariance matrix of the idiosyncratic
component, the factors, as well as the autoregressive parameters in the dynamic
model. In an empirical application, we investigate the effects of a monetary
policy shock on a broad range of economically relevant variables. We identify
this shock using a joint identification of the factor model and the structural
innovations in the VAR model. We find impulse response functions which are in
line with economic rationale, both on the factor aggregates and observed time
series level.
arXiv link: http://arxiv.org/abs/1912.06049v1
Adaptive Dynamic Model Averaging with an Application to House Price Forecasting
dynamic linear models (DLMs) to predict the future value of a time series. The
performance of DMA critically depends on the appropriate choice of two
forgetting factors. The first of these controls the speed of adaptation of the
coefficient vector of each DLM, while the second enables time variation in the
model averaging stage. In this paper we develop a novel, adaptive dynamic model
averaging (ADMA) methodology. The proposed methodology employs a stochastic
optimisation algorithm that sequentially updates the forgetting factor of each
DLM, and uses a state-of-the-art non-parametric model combination algorithm
from the prediction with expert advice literature, which offers finite-time
performance guarantees. An empirical application to quarterly UK house price
data suggests that ADMA produces more accurate forecasts than the benchmark
autoregressive model, as well as competing DMA specifications.
arXiv link: http://arxiv.org/abs/1912.04661v1
Market Price of Trading Liquidity Risk and Market Depth
analyses. We introduce a framework to analyze the market price of liquidity
risk, which allows us to derive an inhomogeneous Bernoulli ordinary
differential equation. We obtain two closed form solutions, one of which
reproduces the linear function of the order flow in Kyle (1985) for informed
traders. However, when traders are not as asymmetrically informed, an S-shape
function of the order flow is obtained. We perform an empirical intra-day
analysis on Nikkei futures to quantify the price impact of order flow and
compare our results with industry's heuristic price impact functions. Our model
of order flow yields a rich framework for not only to estimate the liquidity
risk parameters, but also to provide a plausible cause of why volatility and
correlation are stochastic in nature. Finally, we find that the market depth
encapsulates the market price of liquidity risk.
arXiv link: http://arxiv.org/abs/1912.04565v1
Regularized Estimation of High-dimensional Factor-Augmented Vector Autoregressive (FAVAR) Models
equation that captures lead-lag correlations amongst a set of observed
variables $X$ and latent factors $F$, and a calibration equation that relates
another set of observed variables $Y$ with $F$ and $X$. The latter equation is
used to estimate the factors that are subsequently used in estimating the
parameters of the VAR system. The FAVAR model has become popular in applied
economic research, since it can summarize a large number of variables of
interest as a few factors through the calibration equation and subsequently
examine their influence on core variables of primary interest through the VAR
equation. However, there is increasing need for examining lead-lag
relationships between a large number of time series, while incorporating
information from another high-dimensional set of variables. Hence, in this
paper we investigate the FAVAR model under high-dimensional scaling. We
introduce an appropriate identification constraint for the model parameters,
which when incorporated into the formulated optimization problem yields
estimates with good statistical properties. Further, we address a number of
technical challenges introduced by the fact that estimates of the VAR system
model parameters are based on estimated rather than directly observed
quantities. The performance of the proposed estimators is evaluated on
synthetic data. Further, the model is applied to commodity prices and reveals
interesting and interpretable relationships between the prices and the factors
extracted from a set of global macroeconomic indicators.
arXiv link: http://arxiv.org/abs/1912.04146v2
Approximate Factor Models with Strongly Correlated Idiosyncratic Errors
where strong serial and cross-sectional correlations amongst the idiosyncratic
component are present. This setting comes up naturally in many applications,
but existing approaches in the literature rely on the assumption that such
correlations are weak, leading to mis-specification of the number of factors
selected and consequently inaccurate inference. In this paper, we explicitly
incorporate the dependent structure present in the idiosyncratic component
through lagged values of the observed multivariate time series. We formulate a
constrained optimization problem to estimate the factor space and the
transition matrices of the lagged values {\em simultaneously}, wherein the
constraints reflect the low rank nature of the common factors and the sparsity
of the transition matrices. We establish theoretical properties of the obtained
estimates, and introduce an easy-to-implement computational procedure for
empirical work. The performance of the model and the implementation procedure
is evaluated on synthetic data and compared with competing approaches, and
further illustrated on a data set involving weekly log-returns of 75 US large
financial institutions for the 2001-2016 period.
arXiv link: http://arxiv.org/abs/1912.04123v1
Energy Scenario Exploration with Modeling to Generate Alternatives (MGA)
way to uncover knife-edge solutions, explore alternative system configurations,
and suggest different ways to achieve policy objectives under conditions of
deep uncertainty. In this paper, we do so by employing an existing optimization
technique called modeling to generate alternatives (MGA), which involves a
change in the model structure in order to systematically explore the
near-optimal decision space. The MGA capability is incorporated into Tools for
Energy Model Optimization and Analysis (Temoa), an open source framework that
also includes a technology rich, bottom up ESOM. In this analysis, Temoa is
used to explore alternative energy futures in a simplified single region energy
system that represents the U.S. electric sector and a portion of the light duty
transport sector. Given the dataset limitations, we place greater emphasis on
the methodological approach rather than specific results.
arXiv link: http://arxiv.org/abs/1912.03788v1
Synthetic Controls with Staggered Adoption
promising opportunities for observational causal inference. Estimation remains
challenging, however, and common regression methods can give misleading
results. A promising alternative is the synthetic control method (SCM), which
finds a weighted average of control units that closely balances the treated
unit's pre-treatment outcomes. In this paper, we generalize SCM, originally
designed to study a single treated unit, to the staggered adoption setting. We
first bound the error for the average effect and show that it depends on both
the imbalance for each treated unit separately and the imbalance for the
average of the treated units. We then propose "partially pooled" SCM weights to
minimize a weighted combination of these measures; approaches that focus only
on balancing one of the two components can lead to bias. We extend this
approach to incorporate unit-level intercept shifts and auxiliary covariates.
We assess the performance of the proposed method via extensive simulations and
apply our results to the question of whether teacher collective bargaining
leads to higher school spending, finding minimal impacts. We implement the
proposed method in the augsynth R package.
arXiv link: http://arxiv.org/abs/1912.03290v2
High-frequency and heteroskedasticity identification in multicountry models: Revisiting spillovers of monetary shocks
information shocks originating from the United States and the euro area.
Employing a panel vector autoregression, we use macroeconomic and financial
variables across several major economies to address both static and dynamic
spillovers. To identify structural shocks, we introduce a novel approach that
combines external instruments with heteroskedasticity-based identification and
sign restrictions. Our results suggest significant spillovers from European
Central Bank and Federal Reserve policies to each other's economies, global
aggregates, and other countries. These effects are more pronounced for central
bank information shocks than for pure monetary policy shocks, and the dominance
of the US in the global economy is reflected in our findings.
arXiv link: http://arxiv.org/abs/1912.03158v2
Triple the gamma -- A unifying shrinkage prior for variance and variable selection in sparse state space and TVP models
changes in the effect of a predictor on the outcome variable. However, in
particular when the number of predictors is large, there is a known risk of
overfitting and poor predictive performance, since the effect of some
predictors is constant over time. We propose a prior for variance shrinkage in
TVP models, called triple gamma. The triple gamma prior encompasses a number of
priors that have been suggested previously, such as the Bayesian lasso, the
double gamma prior and the Horseshoe prior. We present the desirable properties
of such a prior and its relationship to Bayesian Model Averaging for variance
selection. The features of the triple gamma prior are then illustrated in the
context of time varying parameter vector autoregressive models, both for
simulated datasets and for a series of macroeconomics variables in the Euro
Area.
arXiv link: http://arxiv.org/abs/1912.03100v1
Estimating Large Mixed-Frequency Bayesian VAR Models
models with stochastic volatility in real-time situations where data are
sampled at different frequencies. In the case of a large VAR with stochastic
volatility, the mixed-frequency data warrant an additional step in the already
computationally challenging Markov Chain Monte Carlo algorithm used to sample
from the posterior distribution of the parameters. We suggest the use of a
factor stochastic volatility model to capture a time-varying error covariance
structure. Because the factor stochastic volatility model renders the equations
of the VAR conditionally independent, settling for this particular stochastic
volatility model comes with major computational benefits. First, we are able to
improve upon the mixed-frequency simulation smoothing step by leveraging a
univariate and adaptive filtering algorithm. Second, the regression parameters
can be sampled equation-by-equation in parallel. These computational features
of the model alleviate the computational burden and make it possible to move
the mixed-frequency VAR to the high-dimensional regime. We illustrate the model
by an application to US data using our mixed-frequency VAR with 20, 34 and 119
variables.
arXiv link: http://arxiv.org/abs/1912.02231v1
High Dimensional Latent Panel Quantile Regression with an Application to Asset Pricing
accommodate both sparse and dense parts: sparse means while
the number of covariates available is large, potentially only a much smaller
number of them have a nonzero impact on each conditional quantile of the
response variable; while the dense part is represent by a low-rank matrix that
can be approximated by latent factors and their loadings. Such a structure
poses problems for traditional sparse estimators, such as the
$\ell_1$-penalised Quantile Regression, and for traditional latent factor
estimator, such as PCA. We propose a new estimation procedure, based on the
ADMM algorithm, consists of combining the quantile loss function with $\ell_1$
and nuclear norm regularization. We show, under general conditions,
that our estimator can consistently estimate both the nonzero coefficients of
the covariates and the latent low-rank matrix.
Our proposed model has a "Characteristics + Latent Factors" Asset Pricing
Model interpretation: we apply our model and estimator with a large-dimensional
panel of financial data and find that (i) characteristics have sparser
predictive power once latent factors were controlled (ii) the factors and
coefficients at upper and lower quantiles are different from the median.
arXiv link: http://arxiv.org/abs/1912.02151v2
Bilinear form test statistics for extremum estimation
context of the extremum estimation framework with particular interest in
nonlinear hypothesis. We show that the proposed statistic converges to a
conventional chi-square limit. A Monte Carlo experiment suggests that the test
statistic works well in finite samples.
arXiv link: http://arxiv.org/abs/1912.01410v1
Mean-shift least squares model averaging
least squares estimates obtained from a set of models. Our proposed estimator
builds on the Mallows model average (MMA) estimator of Hansen (2007), but,
unlike MMA, simultaneously controls for location bias and regression error
through a common constant. We show that our proposed estimator-- the mean-shift
Mallows model average (MSA) estimator-- is asymptotically optimal to the
original MMA estimator in terms of mean squared error. A simulation study is
presented, where we show that our proposed estimator uniformly outperforms the
MMA estimator.
arXiv link: http://arxiv.org/abs/1912.01194v1
Stylized Facts and Agent-Based Modeling
studies. In the past decade the modeling of financial markets by agent-based
computational economic market models has become a frequently used modeling
approach. The main purpose of these models is to replicate stylized facts and
to identify sufficient conditions for their creations. In this paper we
introduce the most prominent examples of stylized facts and especially present
stylized facts of financial data. Furthermore, we given an introduction to
agent-based modeling. Here, we not only provide an overview of this topic but
introduce the idea of universal building blocks for agent-based economic market
models.
arXiv link: http://arxiv.org/abs/1912.02684v1
Clustering and External Validity in Randomized Controlled Trials
(RCTs) assumes that units' potential outcomes are deterministic. This
assumption is unlikely to hold, as stochastic shocks may take place during the
experiment. In this paper, we consider the case of an RCT with individual-level
treatment assignment, and we allow for individual-level and cluster-level (e.g.
village-level) shocks. We show that one can draw inference on the ATE
conditional on the realizations of the cluster-level shocks, using
heteroskedasticity-robust standard errors, or on the ATE netted out of those
shocks, using cluster-robust standard errors.
arXiv link: http://arxiv.org/abs/1912.01052v7
A multifactor regime-switching model for inter-trade durations in the limit order market
finds that inter-trade durations in ultra-high frequency have two modes. One
mode is to the order of approximately 10^{-4} seconds, and the other is to the
order of 1 second. This phenomenon and other empirical evidence suggest that
there are two regimes associated with the dynamics of inter-trade durations,
and the regime switchings are driven by the changes of high-frequency traders
(HFTs) between providing and taking liquidity. To find how the two modes depend
on information in the limit order book (LOB), we propose a two-state
multifactor regime-switching (MF-RSD) model for inter-trade durations, in which
the probabilities transition matrices are time-varying and depend on some
lagged LOB factors. The MF-RSD model has good in-sample fitness and the
superior out-of-sample performance, compared with some benchmark duration
models. Our findings of the effects of LOB factors on the inter-trade durations
help to understand more about the high-frequency market microstructure.
arXiv link: http://arxiv.org/abs/1912.00764v1
Semiparametric Quantile Models for Ascending Auctions with Asymmetric Bidders
regression specification for asymmetric bidders within the independent private
value framework. Asymmetry is parameterized using powers of a parent private
value distribution, which is generated by a quantile regression specification.
As noted in Cantillon (2008) , this covers and extends models used for
efficient collusion, joint bidding and mergers among homogeneous bidders. The
specification can be estimated for ascending auctions using the winning bids
and the winner's identity. The estimation is in two stage. The asymmetry
parameters are estimated from the winner's identity using a simple maximum
likelihood procedure. The parent quantile regression specification can be
estimated using simple modifications of Gimenes (2017). Specification testing
procedures are also considered. A timber application reveals that weaker
bidders have $30%$ less chances to win the auction than stronger ones. It is
also found that increasing participation in an asymmetric ascending auction may
not be as beneficial as using an optimal reserve price as would have been
expected from a result of BulowKlemperer (1996) valid under symmetry.
arXiv link: http://arxiv.org/abs/1911.13063v2
Inference under random limit bootstrap measures
distribution of a bootstrap statistic, conditional on the data, for the
unconditional limit distribution of a statistic of interest. From this
perspective, randomness of the limit bootstrap measure is regarded as a failure
of the bootstrap. We show that such limiting randomness does not necessarily
invalidate bootstrap inference if validity is understood as control over the
frequency of correct inferences in large samples. We first establish sufficient
conditions for asymptotic bootstrap validity in cases where the unconditional
limit distribution of a statistic can be obtained by averaging a (random)
limiting bootstrap distribution. Further, we provide results ensuring the
asymptotic validity of the bootstrap as a tool for conditional inference, the
leading case being that where a bootstrap distribution estimates consistently a
conditional (and thus, random) limit distribution of a statistic. We apply our
framework to several inference problems in econometrics, including linear
models with possibly non-stationary regressors, functional CUSUM statistics,
conditional Kolmogorov-Smirnov specification tests, the `parameter on the
boundary' problem and tests for constancy of parameters in dynamic econometric
models.
arXiv link: http://arxiv.org/abs/1911.12779v2
An Integrated Early Warning System for Stock Market Turbulence
identifies and predicts stock market turbulence. Based on switching ARCH
(SWARCH) filtering probabilities of the high volatility regime, the proposed
EWS first classifies stock market crises according to an indicator function
with thresholds dynamically selected by the two-peak method. A hybrid algorithm
is then developed in the framework of a long short-term memory (LSTM) network
to make daily predictions that alert turmoils. In the empirical evaluation
based on ten-year Chinese stock data, the proposed EWS yields satisfying
results with the test-set accuracy of $96.6%$ and on average $2.4$ days of the
forewarned period. The model's stability and practical value in real-time
decision-making are also proven by the cross-validation and back-testing.
arXiv link: http://arxiv.org/abs/1911.12596v1
Predicting crashes in oil prices during the COVID-19 pandemic with mixed causal-noncausal models
series can substantially impact predictions of mixed causal-noncausal (MAR)
models, namely dynamic processes that depend not only on their lags but also on
their leads. MAR models have been successfully implemented on commodity prices
as they allow to generate nonlinear features such as locally explosive episodes
(denoted here as bubbles) in a strictly stationary setting. We consider
multiple detrending methods and investigate, using Monte Carlo simulations, to
what extent they preserve the bubble patterns observed in the raw data. MAR
models relies on the dynamics observed in the series alone and does not require
economical background to construct a structural model, which can sometimes be
intricate to specify or which may lack parsimony. We investigate oil prices and
estimate probabilities of crashes before and during the first 2020 wave of the
COVID-19 pandemic. We consider three different mechanical detrending methods
and compare them to a detrending performed using the level of strategic
petroleum reserves.
arXiv link: http://arxiv.org/abs/1911.10916v3
High-Dimensional Forecasting in the Presence of Unit Roots and Cointegration
affects forecasting with Big Data. As most macroeoconomic time series are very
persistent and may contain unit roots, a proper handling of unit roots and
cointegration is of paramount importance for macroeconomic forecasting. The
high-dimensional nature of Big Data complicates the analysis of unit roots and
cointegration in two ways. First, transformations to stationarity require
performing many unit root tests, increasing room for errors in the
classification. Second, modelling unit roots and cointegration directly is more
difficult, as standard high-dimensional techniques such as factor models and
penalized regression are not directly applicable to (co)integrated data and
need to be adapted. We provide an overview of both issues and review methods
proposed to address these issues. These methods are also illustrated with two
empirical applications.
arXiv link: http://arxiv.org/abs/1911.10552v1
Topologically Mapping the Macroeconomy
necessitates representations of data that can inform policy, deepen
understanding and guide future research. Topological Data Analysis offers a set
of tools which deliver on all three calls. Abstract two-dimensional snapshots
of multi-dimensional space readily capture non-monotonic relationships, inform
of similarity between points of interest in parameter space, mapping such to
outcomes. Specific examples show how some, but not all, countries have returned
to Great Depression levels, and reappraise the links between real private
capital growth and the performance of the economy. Theoretical and empirical
expositions alike remind on the dangers of assuming monotonic relationships and
discounting combinations of factors as determinants of outcomes; both dangers
Topological Data Analysis addresses. Policy-makers can look at outcomes and
target areas of the input space where such are not satisfactory, academics may
additionally find evidence to motivate theoretical development, and
practitioners can gain a rapid and robust base for decision making.
arXiv link: http://arxiv.org/abs/1911.10476v1
A singular stochastic control approach for optimal pairs trading with proportional transaction costs
try to find either optimal shares of stocks by assuming no transaction costs or
optimal timing of trading fixed numbers of shares of stocks with transaction
costs. To find optimal strategies which determine optimally both trade times
and number of shares in pairs trading process, we use a singular stochastic
control approach to study an optimal pairs trading problem with proportional
transaction costs. Assuming a cointegrated relationship for a pair of stock
log-prices, we consider a portfolio optimization problem which involves dynamic
trading strategies with proportional transaction costs. We show that the value
function of the control problem is the unique viscosity solution of a nonlinear
quasi-variational inequality, which is equivalent to a free boundary problem
for the singular stochastic control value function. We then develop a discrete
time dynamic programming algorithm to compute the transaction regions, and show
the convergence of the discretization scheme. We illustrate our approach with
numerical examples and discuss the impact of different parameters on
transaction regions. We study the out-of-sample performance in an empirical
study that consists of six pairs of U.S. stocks selected from different
industry sectors, and demonstrate the efficiency of the optimal strategy.
arXiv link: http://arxiv.org/abs/1911.10450v1
Uniform inference for value functions
function, that is, the function that results from optimizing an objective
function marginally over one of its arguments. Marginal optimization is not
Hadamard differentiable (that is, compactly differentiable) as a map between
the spaces of objective and value functions, which is problematic because
standard inference methods for nonlinear maps usually rely on Hadamard
differentiability. However, we show that the map from objective function to an
$L_p$ functional of a value function, for $1 \leq p \leq \infty$, are Hadamard
directionally differentiable. As a result, we establish consistency and weak
convergence of nonparametric plug-in estimates of Cram\'er-von Mises and
Kolmogorov-Smirnov test statistics applied to value functions. For practical
inference, we develop detailed resampling techniques that combine a bootstrap
procedure with estimates of the directional derivatives. In addition, we
establish local size control of tests which use the resampling procedure. Monte
Carlo simulations assess the finite-sample properties of the proposed methods
and show accurate empirical size and nontrivial power of the procedures.
Finally, we apply our methods to the evaluation of a job training program using
bounds for the distribution function of treatment effects.
arXiv link: http://arxiv.org/abs/1911.10215v7
A Practical Introduction to Regression Discontinuity Designs: Foundations
Idrobo, and Rocio Titiunik provide an accessible and practical guide for the
analysis and interpretation of Regression Discontinuity (RD) designs that
encourages the use of a common set of practices and facilitates the
accumulation of RD-based empirical evidence. In this Element, the authors
discuss the foundations of the canonical Sharp RD design, which has the
following features: (i) the score is continuously distributed and has only one
dimension, (ii) there is only one cutoff, and (iii) compliance with the
treatment assignment is perfect. In the accompanying Element, the authors
discuss practical and conceptual extensions to the basic RD setup.
arXiv link: http://arxiv.org/abs/1911.09511v1
Hybrid quantile estimation for asymmetric power GARCH models
moments of financial returns, while their quantile estimation has been rarely
investigated. This paper introduces a simple monotonic transformation on its
conditional quantile function to make the quantile regression tractable. The
asymptotic normality of the resulting quantile estimators is established under
either stationarity or non-stationarity. Moreover, based on the estimation
procedure, new tests for strict stationarity and asymmetry are also
constructed. This is the first try of the quantile estimation for
non-stationary ARCH-type models in the literature. The usefulness of the
proposed methodology is illustrated by simulation results and real data
analysis.
arXiv link: http://arxiv.org/abs/1911.09343v1
Regression Discontinuity Design under Self-selection
distributions of covariates on two sides of the policy intervention, which
essentially violates the continuity of potential outcome assumption. The
standard RD estimand becomes difficult to interpret due to the existence of
some indirect effect, i.e. the effect due to self selection. We show that the
direct causal effect of interest can still be recovered under a class of
estimands. Specifically, we consider a class of weighted average treatment
effects tailored for potentially different target populations. We show that a
special case of our estimands can recover the average treatment effect under
the conditional independence assumption per Angrist and Rokkanen (2015), and
another example is the estimand recently proposed in Fr\"olich and Huber
(2018). We propose a set of estimators through a weighted local linear
regression framework and prove the consistency and asymptotic normality of the
estimators. Our approach can be further extended to the fuzzy RD case. In
simulation exercises, we compare the performance of our estimator with the
standard RD estimator. Finally, we apply our method to two empirical data sets:
the U.S. House elections data in Lee (2008) and a novel data set from Microsoft
Bing on Generalized Second Price (GSP) auction.
arXiv link: http://arxiv.org/abs/1911.09248v1
A Flexible Mixed-Frequency Vector Autoregression with a Steady-State Prior
data. Our model is based on the mean-adjusted parametrization of the VAR and
allows for an explicit prior on the 'steady states' (unconditional means) of
the included variables. Based on recent developments in the literature, we
discuss extensions of the model that improve the flexibility of the modeling
approach. These extensions include a hierarchical shrinkage prior for the
steady-state parameters, and the use of stochastic volatility to model
heteroskedasticity. We put the proposed model to use in a forecast evaluation
using US data consisting of 10 monthly and 3 quarterly variables. The results
show that the predictive ability typically benefits from using mixed-frequency
data, and that improvements can be obtained for both monthly and quarterly
variables. We also find that the steady-state prior generally enhances the
accuracy of the forecasts, and that accounting for heteroskedasticity by means
of stochastic volatility usually provides additional improvements, although not
for all variables.
arXiv link: http://arxiv.org/abs/1911.09151v1
A Scrambled Method of Moments
Monte-Carlo (MC) integration. Under certain conditions, they can approximate
the desired integral at a faster rate than the usual Central Limit Theorem,
resulting in more accurate estimates. This paper explores these methods in a
simulation-based estimation setting with an emphasis on the scramble of Owen
(1995). For cross-sections and short-panels, the resulting Scrambled Method of
Moments simply replaces the random number generator with the scramble
(available in most softwares) to reduce simulation noise. Scrambled Indirect
Inference estimation is also considered. For time series, qMC may not apply
directly because of a curse of dimensionality on the time dimension. A simple
algorithm and a class of moments which circumvent this issue are described.
Asymptotic results are given for each algorithm. Monte-Carlo examples
illustrate these results in finite samples, including an income process with
"lots of heterogeneity."
arXiv link: http://arxiv.org/abs/1911.09128v1
Competition of noise and collectivity in global cryptocurrency trading: route to a self-contained market
basket of the 100 highest-capitalization cryptocurrencies over the period
October 1, 2015, through March 31, 2019, are studied. The corresponding
dynamics predominantly involve one leading eigenvalue of the correlation
matrix, while the others largely coincide with those of Wishart random
matrices. However, the magnitude of the principal eigenvalue, and thus the
degree of collectivity, strongly depends on which cryptocurrency is used as a
base. It is largest when the base is the most peripheral cryptocurrency; when
more significant ones are taken into consideration, its magnitude
systematically decreases, nevertheless preserving a sizable gap with respect to
the random bulk, which in turn indicates that the organization of correlations
becomes more heterogeneous. This finding provides a criterion for recognizing
which currencies or cryptocurrencies play a dominant role in the global
crypto-market. The present study shows that over the period under
consideration, the Bitcoin (BTC) predominates, hallmarking exchange rate
dynamics at least as influential as the US dollar. The BTC started dominating
around the year 2017, while further cryptocurrencies, like the Ethereum (ETH)
and even Ripple (XRP), assumed similar trends. At the same time, the USD, an
original value determinant for the cryptocurrency market, became increasingly
disconnected, its related characteristics eventually approaching those of a
fictitious currency. These results are strong indicators of incipient
independence of the global cryptocurrency market, delineating a self-contained
trade resembling the Forex.
arXiv link: http://arxiv.org/abs/1911.08944v2
Statistical Inference on Partially Linear Panel Model under Unobserved Linearity
identify the linear components in the panel data model with fixed effects.
Under some mild assumptions, the proposed procedure is shown to consistently
estimate the underlying regression function, correctly select the linear
components, and effectively conduct the statistical inference. When compared to
existing methods for detection of linearity in the panel model, our approach is
demonstrated to be theoretically justified as well as practically convenient.
We provide a computational algorithm that implements the proposed procedure
along with a path-based solution method for linearity detection, which avoids
the burden of selecting the tuning parameter for the penalty term. Monte Carlo
simulations are conducted to examine the finite sample performance of our
proposed procedure with detailed findings that confirm our theoretical results
in the paper. Applications to Aggregate Production and Environmental Kuznets
Curve data also illustrate the necessity for detecting linearity in the
partially linear panel model.
arXiv link: http://arxiv.org/abs/1911.08830v1
Equivariant online predictions of non-stationary time series
non-stationary time series under model misspecification. To analyze the
theoretical predictive properties of statistical methods under this setting, we
first define the Kullback-Leibler risk, in order to place the problem within a
decision theoretic framework. Under this framework, we show that a specific
class of dynamic models -- random walk dynamic linear models -- produce exact
minimax predictive densities. We first show this result under Gaussian
assumptions, then relax this assumption using semi-martingale processes. This
result provides a theoretical baseline, under both non-stationary and
stationary time series data, for which other models can be compared against. We
extend the result to the synthesis of multiple predictive densities. Three
topical applications in epidemiology, climatology, and economics, confirm and
highlight our theoretical results.
arXiv link: http://arxiv.org/abs/1911.08662v5
Robust Inference on Infinite and Growing Dimensional Time Series Regression
regression with growing dimension, infinite-order autoregression and
nonparametric sieve regression. Examples include the Chow test and general
linear restriction tests of growing rank $p$. Employing such increasing $p$
asymptotics, we introduce a new scale correction to conventional test
statistics which accounts for a high-order long-run variance (HLV) that emerges
as $ p $ grows with sample size. We also propose a bias correction via a
null-imposed bootstrap to alleviate finite sample bias without sacrificing
power unduly. A simulation study shows the importance of robustifying testing
procedures against the HLV even when $ p $ is moderate. The tests are
illustrated with an application to the oil regressions in Hamilton (2003).
arXiv link: http://arxiv.org/abs/1911.08637v4
Synthetic Controls with Imperfect Pre-Treatment Fit
estimators when the pre-treatment fit is imperfect. In this framework, we show
that these estimators are generally biased if treatment assignment is
correlated with unobserved confounders, even when the number of pre-treatment
periods goes to infinity. Still, we show that a demeaned version of the SC
method can substantially improve in terms of bias and variance relative to the
difference-in-difference estimator. We also derive a specification test for the
demeaned SC estimator in this setting with imperfect pre-treatment fit. Given
our theoretical results, we provide practical guidance for applied researchers
on how to justify the use of such estimators in empirical applications.
arXiv link: http://arxiv.org/abs/1911.08521v2
Inference in Models of Discrete Choice with Social Interactions Using Network Data
interactions when the data consists of a single large network. We provide
theoretical justification for the use of spatial and network HAC variance
estimators in applied work, the latter constructed by using network path
distance in place of spatial distance. Toward this end, we prove new central
limit theorems for network moments in a large class of social interactions
models. The results are applicable to discrete games on networks and dynamic
models where social interactions enter through lagged dependent variables. We
illustrate our results in an empirical application and simulation study.
arXiv link: http://arxiv.org/abs/1911.07106v1
Causal Inference Under Approximate Neighborhood Interference
interference. Commonly used models of interference posit that treatments
assigned to alters beyond a certain network distance from the ego have no
effect on the ego's response. However, this assumption is violated in common
models of social interactions. We propose a substantially weaker model of
"approximate neighborhood interference" (ANI) under which treatments assigned
to alters further from the ego have a smaller, but potentially nonzero, effect
on the ego's response. We formally verify that ANI holds for well-known models
of social interactions. Under ANI, restrictions on the network topology, and
asymptotics under which the network size increases, we prove that standard
inverse-probability weighting estimators consistently estimate useful exposure
effects and are approximately normal. For inference, we consider a network HAC
variance estimator. Under a finite population model, we show that the estimator
is biased but that the bias can be interpreted as the variance of unit-level
exposure effects. This generalizes Neyman's well-known result on conservative
variance estimation to settings with interference.
arXiv link: http://arxiv.org/abs/1911.07085v4
Semiparametric Estimation of Correlated Random Coefficient Models without Instrumental Variables
correlated with some continuous covariates. Such a model specification may
occur in empirical research, for instance, when quantifying the effect of a
continuous treatment observed at two time periods. We show one can carry
identification and estimation without instruments. We propose a semiparametric
estimator of average partial effects and of average treatment effects on the
treated. We showcase the small sample properties of our estimator in an
extensive simulation study. Among other things, we reveal that it compares
favorably with a control function estimator. We conclude with an application to
the effect of malaria eradication on economic development in Colombia.
arXiv link: http://arxiv.org/abs/1911.06857v1
Bayesian state-space modeling for analyzing heterogeneous network effects of US monetary policy
of crucial importance for effectively implementing policy measures. We extend
the empirical econometric literature on the role of production networks in the
propagation of shocks along two dimensions. First, we allow for
industry-specific responses that vary over time, reflecting non-linearities and
cross-sectional heterogeneities in direct transmission channels. Second, we
allow for time-varying network structures and dependence. This feature captures
both variation in the structure of the production network, but also differences
in cross-industry demand elasticities. We find that impacts vary substantially
over time and the cross-section. Higher-order effects appear to be particularly
important in periods of economic and financial uncertainty, often coinciding
with tight credit market conditions and financial stress. Differentials in
industry-specific responses can be explained by how close the respective
industries are to end-consumers.
arXiv link: http://arxiv.org/abs/1911.06206v3
Randomization tests of copula symmetry
proposed. The novel aspect of the tests is a resampling procedure that exploits
group invariance conditions associated with the relevant symmetry hypothesis.
They may be viewed as feasible versions of randomization tests of symmetry, the
latter being inapplicable due to the unobservability of margins. Our tests are
simple to compute, control size asymptotically, consistently detect arbitrary
forms of asymmetry, and do not require the specification of a tuning parameter.
Simulations indicate excellent small sample properties compared to existing
procedures involving the multiplier bootstrap.
arXiv link: http://arxiv.org/abs/1911.05307v1
Combinatorial Models of Cross-Country Dual Meets: What is a Big Victory?
The first model assumes that all runners are equally likely to finish in any
possible order. The second model assumes that each team is selected from a
large identically distributed population of potential runners and with each
potential runner's ranking determined by the initial draw from the combined
population.
arXiv link: http://arxiv.org/abs/1911.05044v1
A Simple Estimator for Quantile Panel Data Models Using Smoothed Quantile Regressions
simple intuition and low computational cost, has been widely used in empirical
studies in recent years. In this paper, we revisit the estimator of Canay
(2011) and point out that in his asymptotic analysis the bias of his estimator
due to the estimation of the fixed effects is mistakenly omitted, and that such
omission will lead to invalid inference on the coefficients. To solve this
problem, we propose a similar easy-to-implement estimator based on smoothed
quantile regressions. The asymptotic distribution of the new estimator is
established and the analytical expression of its asymptotic bias is derived.
Based on these results, we show how to make asymptotically valid inference
based on both analytical and split-panel jackknife bias corrections. Finally,
finite sample simulations are used to support our theoretical analysis and to
illustrate the importance of bias correction in quantile regressions for panel
data.
arXiv link: http://arxiv.org/abs/1911.04729v1
Extended MinP Tests for Global and Multiple testing
with researchers aiming to assess both the collective and individual evidence
against these propositions or hypotheses. To rigorously assess this evidence,
practitioners frequently employ tests with quadratic test statistics, such as
$F$-tests and Wald tests, or tests based on minimum/maximum type test
statistics. This paper introduces a combination test that merges these two
classes of tests using the minimum $p$-value principle. The proposed test
capitalizes on the global power advantages of both constituent tests while
retaining the benefits of the stepdown procedure from minimum/maximum type
tests.
arXiv link: http://arxiv.org/abs/1911.04696v2
Identification in discrete choice models with imperfect information
models where decision makers may be imperfectly informed about the state of the
world. We leverage the notion of one-player Bayes Correlated Equilibrium by
Bergemann and Morris (2016) to provide a tractable characterization of the
sharp identified set. We develop a procedure to practically construct the sharp
identified set following a sieve approach, and provide sharp bounds on
counterfactual outcomes of interest. We use our methodology and data on the
2017 UK general election to estimate a spatial voting model under weak
assumptions on agents' information about the returns to voting. Counterfactual
exercises quantify the consequences of imperfect information on the well-being
of voters and parties.
arXiv link: http://arxiv.org/abs/1911.04529v5
An Asymptotically F-Distributed Chow Test in the Presence of Heteroscedasticity and Autocorrelation
heteroscedasticity and autocorrelation. The test is based on a series
heteroscedasticity and autocorrelation robust variance estimator with
judiciously crafted basis functions. Like the Chow test in a classical normal
linear regression, the proposed test employs the standard F distribution as the
reference distribution, which is justified under fixed-smoothing asymptotics.
Monte Carlo simulations show that the null rejection probability of the
asymptotic F test is closer to the nominal level than that of the chi-square
test.
arXiv link: http://arxiv.org/abs/1911.03771v1
Optimal Experimental Design for Staggered Rollouts
set of units over multiple time periods where the starting time of the
treatment may vary by unit. The design problem involves selecting an initial
treatment time for each unit in order to most precisely estimate both the
instantaneous and cumulative effects of the treatment. We first consider
non-adaptive experiments, where all treatment assignment decisions are made
prior to the start of the experiment. For this case, we show that the
optimization problem is generally NP-hard, and we propose a near-optimal
solution. Under this solution, the fraction entering treatment each period is
initially low, then high, and finally low again. Next, we study an adaptive
experimental design problem, where both the decision to continue the experiment
and treatment assignment decisions are updated after each period's data is
collected. For the adaptive case, we propose a new algorithm, the
Precision-Guided Adaptive Experiment (PGAE) algorithm, that addresses the
challenges at both the design stage and at the stage of estimating treatment
effects, ensuring valid post-experiment inference accounting for the adaptive
nature of the design. Using realistic settings, we demonstrate that our
proposed solutions can reduce the opportunity cost of the experiments by over
50%, compared to static design benchmarks.
arXiv link: http://arxiv.org/abs/1911.03764v6
Group Average Treatment Effects for Observational Studies
effects sorted by impact groups (GATES) for non-randomised experiments. The
groups can be understood as a broader aggregation of the conditional average
treatment effect (CATE) where the number of groups is set in advance. In
economics, this approach is similar to pre-analysis plans. Observational
studies are standard in policy evaluation from labour markets, educational
surveys and other empirical studies. To control for a potential selection-bias,
we implement a doubly-robust estimator in the first stage. We use machine
learning methods to learn the conditional mean functions as well as the
propensity score. The group average treatment effect is then estimated via a
linear projection model. The linear model is easy to interpret, provides
p-values and confidence intervals, and limits the danger of finding spurious
heterogeneity due to small subgroups in the CATE. To control for confounding in
the linear model, we use Neyman-orthogonal moments to partial out the effect
that covariates have on both, the treatment assignment and the outcome. The
result is a best linear predictor for effect heterogeneity based on impact
groups. We find that our proposed method has lower absolute errors as well as
smaller bias than the benchmark doubly-robust estimator. We further introduce a
bagging type averaging for the CATE function for each observation to avoid
biases through sample splitting. The advantage of the proposed method is a
robust linear estimation of heterogeneous group treatment effects in
observational studies.
arXiv link: http://arxiv.org/abs/1911.02688v5
Quantile Factor Models
high-dimensional panel data. Unlike Approximate Factor Models (AFM), where only
location-shifting factors can be extracted, QFM also allow to recover
unobserved factors shifting other relevant parts of the distributions of
observed variables. A quantile regression approach, labeled Quantile Factor
Analysis (QFA), is proposed to consistently estimate all the quantile-dependent
factors and loadings. Their asymptotic distribution is then derived using a
kernel-smoothed version of the QFA estimators. Two consistent model selection
criteria, based on information criteria and rank minimization, are developed to
determine the number of factors at each quantile. Moreover, in contrast to the
conditions required for the use of Principal Components Analysis in AFM, QFA
estimation remains valid even when the idiosyncratic errors have heavy-tailed
distributions. Three empirical applications (regarding macroeconomic, climate
and finance panel data) provide evidence that extra factors shifting the
quantiles other than the means could be relevant in practice.
arXiv link: http://arxiv.org/abs/1911.02173v2
Nonparametric Quantile Regressions for Panel Data Models with Large T
dependent variables are additively separable as unknown functions of the
regressors and the individual effects. We propose two estimators of the
quantile partial effects while controlling for the individual heterogeneity.
The first estimator is based on local linear quantile regressions, and the
second is based on local linear smoothed quantile regressions, both of which
are easy to compute in practice. Within the large T framework, we provide
sufficient conditions under which the two estimators are shown to be
asymptotically normally distributed. In particular, for the first estimator, it
is shown that $N<<T^{2/(d+4)}$ is needed to ignore the incidental parameter
biases, where $d$ is the dimension of the regressors. For the second estimator,
we are able to derive the analytical expression of the asymptotic biases under
the assumption that $N\approx Th^{d}$, where $h$ is the bandwidth parameter in
local linear approximations. Our theoretical results provide the basis of using
split-panel jackknife for bias corrections. A Monte Carlo simulation shows that
the proposed estimators and the bias-correction method perform well in finite
samples.
arXiv link: http://arxiv.org/abs/1911.01824v3
Cheating with (Recursive) Models
correlations? We study an "analyst" who utilizes models that take the form of a
recursive system of linear regression equations. The analyst fits each equation
to minimize the sum of squared errors against an arbitrarily large sample. We
characterize the maximal pairwise correlation that the analyst can predict
given a generic objective covariance matrix, subject to the constraint that the
estimated model does not distort the mean and variance of individual variables.
We show that as the number of variables in the model grows, the false pairwise
correlation can become arbitrarily close to one, regardless of the true
correlation.
arXiv link: http://arxiv.org/abs/1911.01251v1
Model Specification Test with Unlabeled Data: Approach from Covariate Shift
using unlabeled test data. In many cases, we have conducted statistical
inferences based on the assumption that we can correctly specify a model.
However, it is difficult to confirm whether a model is correctly specified. To
overcome this problem, existing works have devised statistical tests for model
specification. Existing works have defined a correctly specified model in
regression as a model with zero conditional mean of the error term over train
data only. Extending the definition in conventional statistical tests, we
define a correctly specified model as a model with zero conditional mean of the
error term over any distribution of the explanatory variable. This definition
is a natural consequence of the orthogonality of the explanatory variable and
the error term. If a model does not satisfy this condition, the model might
lack robustness with regards to the distribution shift. The proposed method
would enable us to reject a misspecified model under our definition. By
applying the proposed method, we can obtain a model that predicts the label for
the unlabeled test data well without losing the interpretability of the model.
In experiments, we show how the proposed method works for synthetic and
real-world datasets.
arXiv link: http://arxiv.org/abs/1911.00688v2
A two-dimensional propensity score matching method for longitudinal quasi-experimental studies: A focus on travel behavior and the built environment
environment and travel behavior has been widely discussed in the literature.
This paper discusses how standard propensity score matching estimators can be
extended to enable such studies by pairing observations across two dimensions:
longitudinal and cross-sectional. Researchers mimic randomized controlled
trials (RCTs) and match observations in both dimensions, to find synthetic
control groups that are similar to the treatment group and to match subjects
synthetically across before-treatment and after-treatment time periods. We call
this a two-dimensional propensity score matching (2DPSM). This method
demonstrates superior performance for estimating treatment effects based on
Monte Carlo evidence. A near-term opportunity for such matching is identifying
the impact of transportation infrastructure on travel behavior.
arXiv link: http://arxiv.org/abs/1911.00667v2
Explaining black box decisions by Shapley cohort refinement
individual input variables to a black box function. Our measure is based on the
Shapley value from cooperative game theory. Many measures of variable
importance operate by changing some predictor values with others held fixed,
potentially creating unlikely or even logically impossible combinations. Our
cohort Shapley measure uses only observed data points. Instead of changing the
value of a predictor we include or exclude subjects similar to the target
subject on that predictor to form a similarity cohort. Then we apply Shapley
value to the cohort averages. We connect variable importance measures from
explainable AI to function decompositions from global sensitivity analysis. We
introduce a squared cohort Shapley value that splits previously studied Shapley
effects over subjects, consistent with a Shapley axiom.
arXiv link: http://arxiv.org/abs/1911.00467v2
Regularized Quantile Regression with Interactive Fixed Effects
models with interactive fixed effects. We propose a nuclear norm penalized
estimator of the coefficients on the covariates and the low-rank matrix formed
by the fixed effects. The estimator solves a convex minimization problem, not
requiring pre-estimation of the (number of the) fixed effects. It also allows
the number of covariates to grow slowly with $N$ and $T$. We derive an error
bound on the estimator that holds uniformly in quantile level. The order of the
bound implies uniform consistency of the estimator and is nearly optimal for
the low-rank component. Given the error bound, we also propose a consistent
estimator of the number of fixed effects at any quantile level. To derive the
error bound, we develop new theoretical arguments under primitive assumptions
and new results on random matrices that may be of independent interest. We
demonstrate the performance of the estimator via Monte Carlo simulations.
arXiv link: http://arxiv.org/abs/1911.00166v4
Analyzing China's Consumer Price Index Comparatively with that of United States
predictability of China's Consumer Price Index (CPI-CN), with a comparison to
those of the United States. Despite the differences in the two leading
economies, both series can be well modeled by a class of Seasonal
Autoregressive Integrated Moving Average Model with Covariates (S-ARIMAX). The
CPI-CN series possess regular patterns of dynamics with stable annual cycles
and strong Spring Festival effects, with fitting and forecasting errors largely
comparable to their US counterparts. Finally, for the CPI-CN, the diffusion
index (DI) approach offers improved predictions than the S-ARIMAX models.
arXiv link: http://arxiv.org/abs/1910.13301v1
Testing Forecast Rationality for Measures of Central Tendency
measure of the central tendency of their (possibly latent) predictive
distribution, for example the mean, median, mode, or any convex combination
thereof. We propose tests of forecast rationality when the measure of central
tendency used by the respondent is unknown. We overcome an identification
problem that arises when the measures of central tendency are equal or in a
local neighborhood of each other, as is the case for (exactly or nearly)
symmetric distributions. As a building block, we also present novel tests for
the rationality of mode forecasts. We apply our tests to income forecasts from
the Federal Reserve Bank of New York's Survey of Consumer Expectations. We find
these forecasts are rationalizable as mode forecasts, but not as mean or median
forecasts. We also find heterogeneity in the measure of centrality used by
respondents when stratifying the sample by past income, age, job stability, and
survey experience.
arXiv link: http://arxiv.org/abs/1910.12545v5
Dual Instrumental Variable Regression
regression, DualIV, which simplifies traditional two-stage methods via a dual
formulation. Inspired by problems in stochastic programming, we show that
two-stage procedures for non-linear IV regression can be reformulated as a
convex-concave saddle-point problem. Our formulation enables us to circumvent
the first-stage regression which is a potential bottleneck in real-world
applications. We develop a simple kernel-based algorithm with an analytic
solution based on this formulation. Empirical results show that we are
competitive to existing, more complicated algorithms for non-linear
instrumental variable regression.
arXiv link: http://arxiv.org/abs/1910.12358v3
Estimating a Large Covariance Matrix in Time-varying Factor Models
estimation. We propose two covariance matrix estimators corresponding with a
time-varying approximate factor model and a time-varying approximate
characteristic-based factor model, respectively. The models allow the factor
loadings, factor covariance matrix, and error covariance matrix to change
smoothly over time. We study the rate of convergence of each estimator. Our
simulation and empirical study indicate that time-varying covariance matrix
estimators generally perform better than time-invariant covariance matrix
estimators. Also, if characteristics are available that genuinely explain true
loadings, the characteristics can be used to estimate loadings more precisely
in finite samples; their helpfulness increases when loadings rapidly change.
arXiv link: http://arxiv.org/abs/1910.11965v1
Fast and Flexible Bayesian Inference in Time-varying Parameter Regression Models
involving K explanatory variables and T observations as a constant coefficient
regression model with KT explanatory variables. In contrast with much of the
existing literature which assumes coefficients to evolve according to a random
walk, a hierarchical mixture model on the TVPs is introduced. The resulting
model closely mimics a random coefficients specification which groups the TVPs
into several regimes. These flexible mixtures allow for TVPs that feature a
small, moderate or large number of structural breaks. We develop
computationally efficient Bayesian econometric methods based on the singular
value decomposition of the KT regressors. In artificial data, we find our
methods to be accurate and much faster than standard approaches in terms of
computation time. In an empirical exercise involving inflation forecasting
using a large number of predictors, we find our models to forecast better than
alternative approaches and document different patterns of parameter change than
are found with approaches which assume random walk evolution of parameters.
arXiv link: http://arxiv.org/abs/1910.10779v4
Nonparametric identification of an interdependent value model with buyer covariates from first-price auction bids
and Weber (1982), where the signals are given by an index gathering signal
shifters observed by the econometrician and private ones specific to each
bidders. The model primitives are shown to be nonparametrically identified from
first-price auction bids under a testable mild rank condition. Identification
holds for all possible signal values. This allows to consider a wide range of
counterfactuals where this is important, as expected revenue in second-price
auction. An estimation procedure is briefly discussed.
arXiv link: http://arxiv.org/abs/1910.10646v1
How well can we learn large factor models without assuming strong factors?
factor structure. The focus is to find what is possible and what is impossible
if the usual strong factor condition is not imposed. We study the minimax rate
and adaptivity issues in two problems: pure factor models and panel regression
with interactive fixed effects. For pure factor models, if the number of
factors is known, we develop adaptive estimation and inference procedures that
attain the minimax rate. However, when the number of factors is not specified a
priori, we show that there is a tradeoff between validity and efficiency: any
confidence interval that has uniform validity for arbitrary factor strength has
to be conservative; in particular its width is bounded away from zero even when
the factors are strong. Conversely, any data-driven confidence interval that
does not require as an input the exact number of factors (including weak ones)
and has shrinking width under strong factors does not have uniform coverage and
the worst-case coverage probability is at most 1/2. For panel regressions with
interactive fixed effects, the tradeoff is much better. We find that the
minimax rate for learning the regression coefficient does not depend on the
factor strength and propose a simple estimator that achieves this rate.
However, when weak factors are allowed, uncertainty in the number of factors
can cause a great loss of efficiency although the rate is not affected. In most
cases, we find that the strong factor condition (and/or exact knowledge of
number of factors) improves efficiency, but this condition needs to be imposed
by faith and cannot be verified in data for inference purposes.
arXiv link: http://arxiv.org/abs/1910.10382v3
Principal Component Analysis: A Generalized Gini Approach
index is proposed (Gini PCA). The Gini PCA generalizes the standard PCA based
on the variance. It is shown, in the Gaussian case, that the standard PCA is
equivalent to the Gini PCA. It is also proven that the dimensionality reduction
based on the generalized Gini correlation matrix, that relies on city-block
distances, is robust to outliers. Monte Carlo simulations and an application on
cars data (with outliers) show the robustness of the Gini PCA and provide
different interpretations of the results compared with the variance PCA.
arXiv link: http://arxiv.org/abs/1910.10133v1
Quasi Maximum Likelihood Estimation of Non-Stationary Large Approximate Dynamic Factor Models
and idiosyncratic trends by means of the Expectation Maximization algorithm,
implemented jointly with the Kalman smoother. We show that, as the
cross-sectional dimension $n$ and the sample size $T$ diverge to infinity, the
common component for a given unit estimated at a given point in time is
$\min(\sqrt n,\sqrt T)$-consistent. The case of local levels and/or local
linear trends trends is also considered. By means of a MonteCarlo simulation
exercise, we compare our approach with estimators based on principal component
analysis.
arXiv link: http://arxiv.org/abs/1910.09841v1
A path-sampling method to partially identify causal effects in instrumental variable models
standard point-identification approaches in general instrumental variable
models. However, this flexibility comes at the cost of a “curse of
cardinality”: the number of restrictions on the identified set grows
exponentially with the number of points in the support of the endogenous
treatment. This article proposes a novel path-sampling approach to this
challenge. It is designed for partially identifying causal effects of interest
in the most complex models with continuous endogenous treatments. A stochastic
process representation allows to seamlessly incorporate assumptions on
individual behavior into the model. Some potential applications include
dose-response estimation in randomized trials with imperfect compliance, the
evaluation of social programs, welfare estimation in demand models, and
continuous choice models. As a demonstration, the method provides informative
nonparametric bounds on household expenditures under the assumption that
expenditure is continuous. The mathematical contribution is an approach to
approximately solving infinite dimensional linear programs on path spaces via
sampling.
arXiv link: http://arxiv.org/abs/1910.09502v2
Multi-Stage Compound Real Options Valuation in Residential PV-Battery Investment
uncertain electricity market environment has become increasingly challenging,
because there generally exist multiple interacting options in these
investments, and failing to systematically consider these options can lead to
decisions that undervalue the investment. In our work, a real options valuation
(ROV) framework is proposed to determine the optimal strategy for executing
multiple interacting options within a distribution network investment, to
mitigate the risk of financial losses in the presence of future uncertainties.
To demonstrate the characteristics of the proposed framework, we determine the
optimal strategy to economically justify the investment in residential
PV-battery systems for additional grid supply during peak demand periods. The
options to defer, and then expand, are considered as multi-stage compound
options, since the option to expand is a subsequent option of the former. These
options are valued via the least squares Monte Carlo method, incorporating
uncertainty over growing power demand, varying diesel fuel price, and the
declining cost of PV-battery technology as random variables. Finally, a
sensitivity analysis is performed to demonstrate how the proposed framework
responds to uncertain events. The proposed framework shows that executing the
interacting options at the optimal timing increases the investment value.
arXiv link: http://arxiv.org/abs/1910.09132v1
Feasible Generalized Least Squares for Panel Data with Cross-sectional and Serial Correlations
panel data models. By estimating the large error covariance matrix
consistently, the proposed feasible GLS (FGLS) estimator is more efficient than
the ordinary least squares (OLS) in the presence of heteroskedasticity, serial,
and cross-sectional correlations. To take into account the serial correlations,
we employ the banding method. To take into account the cross-sectional
correlations, we suggest to use the thresholding method. We establish the
limiting distribution of the proposed estimator. A Monte Carlo study is
considered. The proposed method is applied to an empirical application.
arXiv link: http://arxiv.org/abs/1910.09004v3
Large Dimensional Latent Factor Modeling with Missing Observations and Applications to Causal Inference
from large dimensional panel data with missing observations. We propose an
easy-to-use all-purpose estimator for a latent factor model by applying
principal component analysis to an adjusted covariance matrix estimated from
partially observed panel data. We derive the asymptotic distribution for the
estimated factors, loadings and the imputed values under an approximate factor
model and general missing patterns. The key application is to estimate
counterfactual outcomes in causal inference from panel data. The unobserved
control group is modeled as missing values, which are inferred from the latent
factor model. The inferential theory for the imputed values allows us to test
for individual treatment effects at any time under general adoption patterns
where the units can be affected by unobserved factors.
arXiv link: http://arxiv.org/abs/1910.08273v6
Forecasting under Long Memory and Nonstationarity
fact in many time series from economics and finance. The fractionally
integrated process is the workhorse model for the analysis of these time
series. Nevertheless, there is mixed evidence in the literature concerning its
usefulness for forecasting and how forecasting based on it should be
implemented.
Employing pseudo-out-of-sample forecasting on inflation and realized
volatility time series and simulations we show that methods based on fractional
integration clearly are superior to alternative methods not accounting for long
memory, including autoregressions and exponential smoothing. Our proposal of
choosing a fixed fractional integration parameter of $d=0.5$ a priori yields
the best results overall, capturing long memory behavior, but overcoming the
deficiencies of methods using an estimated parameter.
Regarding the implementation of forecasting methods based on fractional
integration, we use simulations to compare local and global semiparametric and
parametric estimators of the long memory parameter from the Whittle family and
provide asymptotic theory backed up by simulations to compare different mean
estimators. Both of these analyses lead to new results, which are also of
interest outside the realm of forecasting.
arXiv link: http://arxiv.org/abs/1910.08202v1
Econometric Models of Network Formation
econometric models of network formation. The survey starts with a brief
exposition on basic concepts and tools for the statistical description of
networks. I then offer a review of dyadic models, focussing on statistical
models on pairs of nodes and describe several developments of interest to the
econometrics literature. The article also presents a discussion of non-dyadic
models where link formation might be influenced by the presence or absence of
additional links, which themselves are subject to similar influences. This is
related to the statistical literature on conditionally specified models and the
econometrics of game theoretical models. I close with a (non-exhaustive)
discussion of potential areas for further development.
arXiv link: http://arxiv.org/abs/1910.07781v2
A Projection Framework for Testing Shape Restrictions That Form Convex Cones
based on projection for a class of shape restrictions. The key insight we
exploit is that these restrictions form convex cones, a simple and yet elegant
structure that has been barely harnessed in the literature. Based on a
monotonicity property afforded by such a geometric structure, we construct a
bootstrap procedure that, unlike many studies in nonstandard settings,
dispenses with estimation of local parameter spaces, and the critical values
are obtained in a way as simple as computing the test statistic. Moreover, by
appealing to strong approximations, our framework accommodates nonparametric
regression models as well as distributional/density-related and structural
settings. Since the test entails a tuning parameter (due to the nonstandard
nature of the problem), we propose a data-driven choice and prove its validity.
Monte Carlo simulations confirm that our test works well.
arXiv link: http://arxiv.org/abs/1910.07689v4
Asymptotic Theory of $L$-Statistics and Integrable Empirical Processes
functions with respect to random weight functions, which is an extension of
classical $L$-statistics. They appear when sample trimming or Winsorization is
applied to asymptotically linear estimators. The key idea is to consider
empirical processes in the spaces appropriate for integration. First, we
characterize weak convergence of empirical distribution functions and random
weight functions in the space of bounded integrable functions. Second, we
establish the delta method for empirical quantile functions as integrable
functions. Third, we derive the delta method for $L$-statistics. Finally, we
prove weak convergence of their bootstrap processes, showing validity of
nonparametric bootstrap.
arXiv link: http://arxiv.org/abs/1910.07572v1
Identifying Network Ties from Panel Data: Theory and an Application to Tax Competition
social ties does not exist in most publicly available and widely used datasets.
We present results on the identification of social networks from observational
panel data that contains no information on social ties between agents. In the
context of a canonical social interactions model, we provide sufficient
conditions under which the social interactions matrix, endogenous and exogenous
social effect parameters are all globally identified. While this result is
relevant across different estimation strategies, we then describe how
high-dimensional estimation techniques can be used to estimate the interactions
model based on the Adaptive Elastic Net GMM method. We employ the method to
study tax competition across US states. We find the identified social
interactions matrix implies tax competition differs markedly from the common
assumption of competition between geographically neighboring states, providing
further insights for the long-standing debate on the relative roles of factor
mobility and yardstick competition in driving tax setting behavior across
states. Most broadly, our identification and application show the analysis of
social interactions can be extended to economic realms where no network data
exists.
arXiv link: http://arxiv.org/abs/1910.07452v4
Standard Errors for Panel Data Models with Unknown Clusters
models. The proposed estimator is robust to heteroskedasticity, serial
correlation, and cross-sectional correlation of unknown forms. The serial
correlation is controlled by the Newey-West method. To control for
cross-sectional correlations, we propose to use the thresholding method,
without assuming the clusters to be known. We establish the consistency of the
proposed estimator. Monte Carlo simulations show the method works well. An
empirical application is considered.
arXiv link: http://arxiv.org/abs/1910.07406v2
Multivariate Forecasting Evaluation: On Sensitive and Strictly Proper Scoring Rules
there is a growing need of suitable methods for the evaluation of multivariate
predictions. We analyze the sensitivity of the most common scoring rules,
especially regarding quality of the forecasted dependency structures.
Additionally, we propose scoring rules based on the copula, which uniquely
describes the dependency structure for every probability distribution with
continuous marginal distributions. Efficient estimation of the considered
scoring rules and evaluation methods such as the Diebold-Mariano test are
discussed. In detailed simulation studies, we compare the performance of the
renowned scoring rules and the ones we propose. Besides extended synthetic
studies based on recently published results we also consider a real data
example. We find that the energy score, which is probably the most widely used
multivariate scoring rule, performs comparably well in detecting forecast
errors, also regarding dependencies. This contradicts other studies. The
results also show that a proposed copula score provides very strong distinction
between models with correct and incorrect dependency structure. We close with a
comprehensive discussion on the proposed methodology.
arXiv link: http://arxiv.org/abs/1910.07325v1
Matrix Completion, Counterfactuals, and Factor Analysis of Missing Data
from a tall block along with the re-rotated loadings estimated from a wide
block to impute missing values in a panel of data. Assuming that a strong
factor structure holds for the full panel of data and its sub-blocks, it is
shown that the common component can be consistently estimated at four different
rates of convergence without requiring regularization or iteration. An
asymptotic analysis of the estimation error is obtained. An application of our
analysis is estimation of counterfactuals when potential outcomes have a factor
structure. We study the estimation of average and individual treatment effects
on the treated and establish a normal distribution theory that can be useful
for hypothesis testing.
arXiv link: http://arxiv.org/abs/1910.06677v5
Principled estimation of regression discontinuity designs
effect of election outcomes and policy interventions. In these contexts,
treatment effects are typically estimated with covariates included to improve
efficiency. While including covariates improves precision asymptotically, in
practice, treatment effects are estimated with a small number of observations,
resulting in considerable fluctuations in treatment effect magnitude and
precision depending upon the covariates chosen. This practice thus incentivizes
researchers to select covariates which maximize treatment effect statistical
significance rather than precision. Here, I propose a principled approach for
estimating RDDs which provides a means of improving precision with covariates
while minimizing adverse incentives. This is accomplished by integrating the
adaptive LASSO, a machine learning method, into RDD estimation using an R
package developed for this purpose, adaptiveRDD. Using simulations, I show that
this method significantly improves treatment effect precision, particularly
when estimating treatment effects with fewer than 200 observations.
arXiv link: http://arxiv.org/abs/1910.06381v2
Latent Dirichlet Analysis of Categorical Survey Responses
outcomes, so understanding how they comove and differ across individuals is of
considerable interest. Researchers often rely on surveys that report individual
beliefs as qualitative data. We propose using a Bayesian hierarchical latent
class model to analyze the comovements and observed heterogeneity in
categorical survey responses. We show that the statistical model corresponds to
an economic structural model of information acquisition, which guides
interpretation and estimation of the model parameters. An algorithm based on
stochastic optimization is proposed to estimate a model for repeated surveys
when responses follow a dynamic structure and conjugate priors are not
appropriate. Guidance on selecting the number of belief types is also provided.
Two examples are considered. The first shows that there is information in the
Michigan survey responses beyond the consumer sentiment index that is
officially published. The second shows that belief types constructed from
survey responses can be used in a subsequent analysis to estimate heterogeneous
returns to education.
arXiv link: http://arxiv.org/abs/1910.04883v3
Robust Likelihood Ratio Tests for Incomplete Economic Models
parameters in incomplete models. Such models make set-valued predictions and
hence do not generally yield a unique likelihood function. The model structure,
however, allows us to construct tests based on the least favorable pairs of
likelihoods using the theory of Huber and Strassen (1973). We develop tests
robust to model incompleteness that possess certain optimality properties. We
also show that sharp identifying restrictions play a role in constructing such
tests in a computationally tractable manner. A framework for analyzing the
local asymptotic power of the tests is developed by embedding the least
favorable pairs into a model that allows local approximations under the limits
of experiments argument. Examples of the hypotheses we consider include those
on the presence of strategic interaction effects in discrete games of complete
information. Monte Carlo experiments demonstrate the robust performance of the
proposed tests.
arXiv link: http://arxiv.org/abs/1910.04610v2
Averaging estimation for instrumental variables quantile regression
efficiency of the instrumental variables quantile regression (IVQR) estimation.
First, I apply Cheng, Liao, Shi's (2019) averaging GMM framework to the IVQR
model. I propose using the usual quantile regression moments for averaging to
take advantage of cases when endogeneity is not too strong. I also propose
using two-stage least squares slope moments to take advantage of cases when
heterogeneity is not too strong. The empirical optimal weight formula of Cheng
et al. (2019) helps optimize the bias-variance tradeoff, ensuring uniformly
better (asymptotic) risk of the averaging estimator over the standard IVQR
estimator under certain conditions. My implementation involves many
computational considerations and builds on recent developments in the quantile
literature. Second, I propose a bootstrap method that directly averages among
IVQR, quantile regression, and two-stage least squares estimators. More
specifically, I find the optimal weights in the bootstrap world and then apply
the bootstrap-optimal weights to the original sample. The bootstrap method is
simpler to compute and generally performs better in simulations, but it lacks
the formal uniform dominance results of Cheng et al. (2019). Simulation results
demonstrate that in the multiple-regressors/instruments case, both the GMM
averaging and bootstrap estimators have uniformly smaller risk than the IVQR
estimator across data-generating processes (DGPs) with all kinds of
combinations of different endogeneity levels and heterogeneity levels. In DGPs
with a single endogenous regressor and instrument, where averaging estimation
is known to have least opportunity for improvement, the proposed averaging
estimators outperform the IVQR estimator in some cases but not others.
arXiv link: http://arxiv.org/abs/1910.04245v1
Identifiability of Structural Singular Vector Autoregressive Models
autoregressive models (VAR) to the case where the innovation covariance matrix
has reduced rank. Structural singular VAR models appear, for example, as
solutions of rational expectation models where the number of shocks is usually
smaller than the number of endogenous variables, and as an essential building
block in dynamic factor models. We show that order conditions for
identifiability are misleading in the singular case and provide a rank
condition for identifiability of the noise parameters. Since the Yule-Walker
equations may have multiple solutions, we analyze the effect of restrictions on
the system parameters on over- and underidentification in detail and provide
easily verifiable conditions.
arXiv link: http://arxiv.org/abs/1910.04096v2
Identification and Estimation of SVARMA models with Independent and Non-Gaussian Inputs
autoregressive moving average (SVARMA) models driven by independent and
non-Gaussian shocks. It is well known, that SVARMA models driven by Gaussian
errors are not identified without imposing further identifying restrictions on
the parameters. Even in reduced form and assuming stability and invertibility,
vector autoregressive moving average models are in general not identified
without requiring certain parameter matrices to be non-singular. Independence
and non-Gaussianity of the shocks is used to show that they are identified up
to permutations and scalings. In this way, typically imposed identifying
restrictions are made testable. Furthermore, we introduce a maximum-likelihood
estimator of the non-Gaussian SVARMA model which is consistent and
asymptotically normally distributed.
arXiv link: http://arxiv.org/abs/1910.04087v1
Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm
Expectation Maximization (EM) algorithm, jointly with the Kalman smoother. We
prove that as both the cross-sectional dimension, $n$, and the sample size,
$T$, diverge to infinity: (i) the estimated loadings are $\sqrt T$-consistent,
asymptotically normal and equivalent to their Quasi Maximum Likelihood
estimates; (ii) the estimated factors are $\sqrt n$-consistent, asymptotically
normal and equivalent to their Weighted Least Squares estimates. Moreover, the
estimated loadings are asymptotically as efficient as those obtained by
Principal Components analysis, while the estimated factors are more efficient
if the idiosyncratic covariance is sparse enough.
arXiv link: http://arxiv.org/abs/1910.03821v5
On the feasibility of parsimonious variable selection for Hotelling's $T^2$-test
one of the triumphs of classical multivariate analysis. It is uniformly most
powerful among invariant tests, and admissible, proper Bayes, and locally and
asymptotically minimax among all tests. Nonetheless, investigators often prefer
non-invariant tests, especially those obtained by selecting only a small subset
of variables from which the $T^2$-statistic is to be calculated, because such
reduced statistics are more easily interpretable for their specific
application. Thus it is relevant to ask the extent to which power is lost when
variable selection is limited to very small subsets of variables, e.g. of size
one (yielding univariate Student-$t^2$ tests) or size two (yielding bivariate
$T^2$-tests). This study presents some evidence, admittedly fragmentary and
incomplete, suggesting that in some cases no power may be lost over a wide
range of alternatives.
arXiv link: http://arxiv.org/abs/1910.03669v1
Application of Machine Learning in Forecasting International Trade Trends
cross-border exchange of essential goods (e.g. steel, aluminum, soybeans, and
beef). Since trade critically affects employment and wages, predicting future
patterns of trade is a high-priority for policy makers around the world. While
traditional economic models aim to be reliable predictors, we consider the
possibility that Machine Learning (ML) techniques allow for better predictions
to inform policy decisions. Open-government data provide the fuel to power the
algorithms that can explain and forecast trade flows to inform policies. Data
collected in this article describe international trade transactions and
commonly associated economic factors. Machine learning (ML) models deployed
include: ARIMA, GBoosting, XGBoosting, and LightGBM for predicting future trade
patterns, and K-Means clustering of countries according to economic factors.
Unlike short-term and subjective (straight-line) projections and medium-term
(aggre-gated) projections, ML methods provide a range of data-driven and
interpretable projections for individual commodities. Models, their results,
and policies are introduced and evaluated for prediction quality.
arXiv link: http://arxiv.org/abs/1910.03112v1
Boosting High Dimensional Predictive Regressions with Time Varying Parameters
applications. However, the theory is mainly developed assuming that the model
is stationary with time invariant parameters. This is at odds with the
prevalent evidence for parameter instability in economic time series, but
theories for parameter instability are mainly developed for models with a small
number of covariates. In this paper, we present two $L_2$ boosting algorithms
for estimating high dimensional models in which the coefficients are modeled as
functions evolving smoothly over time and the predictors are locally
stationary. The first method uses componentwise local constant estimators as
base learner, while the second relies on componentwise local linear estimators.
We establish consistency of both methods, and address the practical issues of
choosing the bandwidth for the base learners and the number of boosting
iterations. In an extensive application to macroeconomic forecasting with many
potential predictors, we find that the benefits to modeling time variation are
substantial and they increase with the forecast horizon. Furthermore, the
timing of the benefits suggests that the Great Moderation is associated with
substantial instability in the conditional mean of various economic series.
arXiv link: http://arxiv.org/abs/1910.03109v1
A 2-Dimensional Functional Central Limit Theorem for Non-stationary Dependent Random Fields
sheet where the underlying random fields are not necessarily independent or
stationary. Possible applications include unit-root tests for spatial as well
as panel data models.
arXiv link: http://arxiv.org/abs/1910.02577v1
Predicting popularity of EV charging infrastructure from GIS data
adoption of electric vehicles (EV). Charging patterns and the utilization of
infrastructure have consequences not only for the energy demand, loading local
power grids but influence the economic returns, parking policies and further
adoption of EVs. We develop a data-driven approach that is exploiting
predictors compiled from GIS data describing the urban context and urban
activities near charging infrastructure to explore correlations with a
comprehensive set of indicators measuring the performance of charging
infrastructure. The best fit was identified for the size of the unique group of
visitors (popularity) attracted by the charging infrastructure. Consecutively,
charging infrastructure is ranked by popularity. The question of whether or not
a given charging spot belongs to the top tier is posed as a binary
classification problem and predictive performance of logistic regression
regularized with an l-1 penalty, random forests and gradient boosted regression
trees is evaluated. Obtained results indicate that the collected predictors
contain information that can be used to predict the popularity of charging
infrastructure. The significance of predictors and how they are linked with the
popularity are explored as well. The proposed methodology can be used to inform
charging infrastructure deployment strategies.
arXiv link: http://arxiv.org/abs/1910.02498v1
Informational Content of Factor Structures in Simultaneous Binary Response Models
triangular systems. Factor structures have been employed in a variety of
settings in cross sectional and panel data models, and in this paper we
formally quantify their identifying power in a bivariate system often employed
in the treatment effects literature. Our main findings are that imposing a
factor structure yields point identification of parameters of interest, such as
the coefficient associated with the endogenous regressor in the outcome
equation, under weaker assumptions than usually required in these models. In
particular, we show that a "non-standard" exclusion restriction that requires
an explanatory variable in the outcome equation to be excluded from the
treatment equation is no longer necessary for identification, even in cases
where all of the regressors from the outcome equation are discrete. We also
establish identification of the coefficient of the endogenous regressor in
models with more general factor structures, in situations where one has access
to at least two continuous measurements of the common factor.
arXiv link: http://arxiv.org/abs/1910.01318v3
An introduction to flexible methods for policy evaluation
the causal effect of a treatment or intervention on an outcome of interest. As
an introduction to causal inference, the discussion starts with the
experimental evaluation of a randomized treatment. It then reviews evaluation
methods based on selection on observables (assuming a quasi-random treatment
given observed covariates), instrumental variables (inducing a quasi-random
shift in the treatment), difference-in-differences and changes-in-changes
(exploiting changes in outcomes over time), as well as regression
discontinuities and kinks (using changes in the treatment assignment at some
threshold of a running variable). The chapter discusses methods particularly
suited for data with many observations for a flexible (i.e. semi- or
nonparametric) modeling of treatment effects, and/or many (i.e. high
dimensional) observed covariates by applying machine learning to select and
control for covariates in a data-driven way. This is not only useful for
tackling confounding by controlling for instance for factors jointly affecting
the treatment and the outcome, but also for learning effect heterogeneities
across subgroups defined upon observable covariates and optimally targeting
those groups for which the treatment is most effective.
arXiv link: http://arxiv.org/abs/1910.00641v1
Usage-Based Vehicle Insurance: Driving Style Factors of Accident Probability and Severity
automotive insurance. We conduct a comparative analysis of different types of
devices that collect information on vehicle utilization and driving style of
its driver, describe advantages and disadvantages of these devices and indicate
the most efficient from the insurer point of view. The possible formats of
telematics data are described and methods of their processing to a format
convenient for modelling are proposed. We also introduce an approach to
classify the accidents strength. Using all the available information, we
estimate accident probability models for different types of accidents and
identify an optimal set of factors for each of the models. We assess the
quality of resulting models using both in-sample and out-of-sample estimates.
arXiv link: http://arxiv.org/abs/1910.00460v2
An econometric analysis of the Italian cultural supply
analysis from both the methodological and the application side. In this paper a
price index providing a novel and effective solution to price indexes over
several periods and among several countries, that is in both a multi-period and
a multilateral framework, is devised. The reference basket of the devised index
is the union of the intersections of the baskets of all periods/countries in
pairs. As such, it provides a broader coverage than usual indexes. Index
closed-form expressions and updating formulas are provided and properties
investigated. Last, applications with real and simulated data provide evidence
of the performance of the index at stake.
arXiv link: http://arxiv.org/abs/1910.00073v3
Monotonicity-Constrained Nonparametric Estimation and Inference for First-Price Auctions
independent private values that imposes the monotonicity constraint on the
estimated inverse bidding strategy. We show that our estimator has a smaller
asymptotic variance than that of Guerre, Perrigne and Vuong's (2000) estimator.
In addition to establishing pointwise asymptotic normality of our estimator, we
provide a bootstrap-based approach to constructing uniform confidence bands for
the density function of latent valuations.
arXiv link: http://arxiv.org/abs/1909.12974v1
Debiased/Double Machine Learning for Instrumental Variable Quantile Regressions
causal parameter in the presence of high-dimensional controls in an
instrumental variable quantile regression. Our proposed econometric procedure
builds on the Neyman-type orthogonal moment conditions of a previous study
Chernozhukov, Hansen and Wuthrich (2018) and is thus relatively insensitive to
the estimation of the nuisance parameters. The Monte Carlo experiments show
that the estimator copes well with high-dimensional controls. We also apply the
procedure to empirically reinvestigate the quantile treatment effect of 401(k)
participation on accumulated wealth.
arXiv link: http://arxiv.org/abs/1909.12592v3
Inference in Nonparametric Series Estimation with Specification Searches for the Number of Series Terms
tuning parameter, i.e., evaluating estimates and confidence intervals with a
different number of series terms. This paper develops pointwise and uniform
inferences for conditional mean functions in nonparametric series estimations
that are uniform in the number of series terms. As a result, this paper
constructs confidence intervals and confidence bands with possibly
data-dependent series terms that have valid asymptotic coverage probabilities.
This paper also considers a partially linear model setup and develops inference
methods for the parametric part uniform in the number of series terms. The
finite sample performance of the proposed methods is investigated in various
simulation setups as well as in an illustrative example, i.e., the
nonparametric estimation of the wage elasticity of the expected labor supply
from Blomquist and Newey (2002).
arXiv link: http://arxiv.org/abs/1909.12162v2
A Peek into the Unobservable: Hidden States and Bayesian Inference for the Bitcoin and Ether Price Series
properties of cryptocurrencies due to the latter's dual nature: their usage as
financial assets on the one side and their tight connection to the underlying
blockchain structure on the other. In an effort to examine both components via
a unified approach, we apply a recently developed Non-Homogeneous Hidden Markov
(NHHM) model with an extended set of financial and blockchain specific
covariates on the Bitcoin (BTC) and Ether (ETH) price data. Based on the
observable series, the NHHM model offers a novel perspective on the underlying
microstructure of the cryptocurrency market and provides insight on
unobservable parameters such as the behavior of investors, traders and miners.
The algorithm identifies two alternating periods (hidden states) of inherently
different activity -- fundamental versus uninformed or noise traders -- in the
Bitcoin ecosystem and unveils differences in both the short/long run dynamics
and in the financial characteristics of the two states, such as significant
explanatory variables, extreme events and varying series autocorrelation. In a
somewhat unexpected result, the Bitcoin and Ether markets are found to be
influenced by markedly distinct indicators despite their perceived correlation.
The current approach backs earlier findings that cryptocurrencies are unlike
any conventional financial asset and makes a first step towards understanding
cryptocurrency markets via a more comprehensive lens.
arXiv link: http://arxiv.org/abs/1909.10957v2
Scalable Fair Division for 'At Most One' Preferences
practical problem. In the case of divisible goods and additive preferences a
convex program can be used to find the solution that maximizes Nash welfare
(MNW). The MNW solution is equivalent to finding the equilibrium of a market
economy (aka. the competitive equilibrium from equal incomes, CEEI) and thus
has good properties such as Pareto optimality, envy-freeness, and incentive
compatibility in the large. Unfortunately, this equivalence (and nice
properties) breaks down for general preference classes. Motivated by real world
problems such as course allocation and recommender systems we study the case of
additive `at most one' (AMO) preferences - individuals want at most 1 of each
item and lotteries are allowed. We show that in this case the MNW solution is
still a convex program and importantly is a CEEI solution when the instance
gets large but has a `low rank' structure. Thus a polynomial time algorithm can
be used to scale CEEI (which is in general PPAD-hard) for AMO preferences. We
examine whether the properties guaranteed in the limit hold approximately in
finite samples using several real datasets.
arXiv link: http://arxiv.org/abs/1909.10925v1
Structural Change Analysis of Active Cryptocurrency Market
arXiv link: http://arxiv.org/abs/1909.10679v1
Goodness-of-Fit Tests based on Series Estimators in Nonparametric Instrumental Regression
nonparametric instrumental regression. Based on series estimators, test
statistics are established that allow for tests of the general model against a
parametric or nonparametric specification as well as a test of exogeneity of
the vector of regressors. The tests' asymptotic distributions under correct
specification are derived and their consistency against any alternative model
is shown. Under a sequence of local alternative hypotheses, the asymptotic
distributions of the tests is derived. Moreover, uniform consistency is
established over a class of alternatives whose distance to the null hypothesis
shrinks appropriately as the sample size increases. A Monte Carlo study
examines finite sample performance of the test statistics.
arXiv link: http://arxiv.org/abs/1909.10133v1
Specification Testing in Nonparametric Instrumental Quantile Regression
modeling of a structural disturbance. In a nonseparable model with endogenous
regressors, key conditions are validity of instrumental variables and
monotonicity of the model in a scalar unobservable variable. Under these
conditions the nonseparable model is equivalent to an instrumental quantile
regression model. A failure of the key conditions, however, makes instrumental
quantile regression potentially inconsistent. This paper develops a methodology
for testing the hypothesis whether the instrumental quantile regression model
is correctly specified. Our test statistic is asymptotically normally
distributed under correct specification and consistent against any alternative
model. In addition, test statistics to justify the model simplification are
established. Finite sample properties are examined in a Monte Carlo study and
an empirical illustration is provided.
arXiv link: http://arxiv.org/abs/1909.10129v1
Inference for Linear Conditional Moment Inequalities
have a particular linear conditional structure. We use this structure to
construct uniformly valid confidence sets that remain computationally tractable
even in settings with nuisance parameters. We first introduce least favorable
critical values which deliver non-conservative tests if all moments are
binding. Next, we introduce a novel conditional inference approach which
ensures a strong form of insensitivity to slack moments. Our recommended
approach is a hybrid technique which combines desirable aspects of the least
favorable and conditional methods. The hybrid approach performs well in
simulations calibrated to Wollmann (2018), with favorable power and
computational time comparisons relative to existing alternatives.
arXiv link: http://arxiv.org/abs/1909.10062v5
Meaningful causal decompositions in health equity research: definition, identification, and estimation through a weighting framework
interventions that address health disparities (inequities). They ask how
disparities in outcomes may change under hypothetical intervention. Through
study design and assumptions, they can rule out alternate explanations such as
confounding, selection-bias, and measurement error, thereby identifying
potential targets for intervention. Unfortunately, the literature on causal
decomposition analysis and related methods have largely ignored equity concerns
that actual interventionists would respect, limiting their relevance and
practical value. This paper addresses these concerns by explicitly considering
what covariates the outcome disparity and hypothetical intervention adjust for
(so-called allowable covariates) and the equity value judgements these choices
convey, drawing from the bioethics, biostatistics, epidemiology, and health
services research literatures. From this discussion, we generalize
decomposition estimands and formulae to incorporate allowable covariate sets,
to reflect equity choices, while still allowing for adjustment of non-allowable
covariates needed to satisfy causal assumptions. For these general formulae, we
provide weighting-based estimators based on adaptations of
ratio-of-mediator-probability and inverse-odds-ratio weighting. We discuss when
these estimators reduce to already used estimators under certain equity value
judgements, and a novel adaptation under other judgements.
arXiv link: http://arxiv.org/abs/1909.10060v3
Subspace Clustering for Panel Data with Interactive Effects
factor structures which are correlated with the regressors and the group
membership can be unknown. The factor loadings are assumed to be in different
subspaces and the subspace clustering for factor loadings are considered. A
method called least squares subspace clustering estimate (LSSC) is proposed to
estimate the model parameters by minimizing the least-square criterion and to
perform the subspace clustering simultaneously. The consistency of the proposed
subspace clustering is proved and the asymptotic properties of the estimation
procedure are studied under certain conditions. A Monte Carlo simulation study
is used to illustrate the advantages of the proposed method. Further
considerations for the situations that the number of subspaces for factors, the
dimension of factors and the dimension of subspaces are unknown are also
discussed. For illustrative purposes, the proposed method is applied to study
the linkage between income and democracy across countries while subspace
patterns of unobserved factors and factor loadings are allowed.
arXiv link: http://arxiv.org/abs/1909.09928v2
Doubly Robust Identification for Causal Panel Data Models
panel data. Traditionally researchers follow model-based identification
strategies relying on assumptions governing the relation between the potential
outcomes and the observed and unobserved confounders. We focus on a different,
complementary approach to identification where assumptions are made about the
connection between the treatment assignment and the unobserved confounders.
Such strategies are common in cross-section settings but rarely used with panel
data. We introduce different sets of assumptions that follow the two paths to
identification and develop a doubly robust approach. We propose estimation
methods that build on these identification strategies.
arXiv link: http://arxiv.org/abs/1909.09412v3
Discerning Solution Concepts
behavioral restrictions in the form of solution concepts, such as Nash
equilibrium. Choosing the right solution concept is crucial not just for
identification of payoff parameters, but also for the validity and
informativeness of counterfactual exercises and policy implications. We say
that a solution concept is discernible if it is possible to determine whether
it generated the observed data on the players' behavior and covariates. We
propose a set of conditions that make it possible to discern solution concepts.
In particular, our conditions are sufficient to tell whether the players'
choices emerged from Nash equilibria. We can also discern between
rationalizable behavior, maxmin behavior, and collusive behavior. Finally, we
identify the correlation structure of unobserved shocks in our model using a
novel approach.
arXiv link: http://arxiv.org/abs/1909.09320v1
Nonparametric Estimation of the Random Coefficients Model: An Elastic Net Approach
nonparametric random coefficients estimator of Fox, Kim, Ryan, and Bajari
(2011). We show that their estimator is a special case of the nonnegative
LASSO, explaining its sparse nature observed in many applications. Recognizing
this link, we extend the estimator, transforming it to a special case of the
nonnegative elastic net. The extension improves the estimator's recovery of the
true support and allows for more accurate estimates of the random coefficients'
distribution. Our estimator is a generalization of the original estimator and
therefore, is guaranteed to have a model fit at least as good as the original
one. A theoretical analysis of both estimators' properties shows that, under
conditions, our generalized estimator approximates the true distribution more
accurately. Two Monte Carlo experiments and an application to a travel mode
data set illustrate the improved performance of the generalized estimator.
arXiv link: http://arxiv.org/abs/1909.08434v2
How have German University Tuition Fees Affected Enrollment Rates: Robust Model Selection and Design-based Inference in High-Dimensions
effect of a flat 1000 Euro state-dependent university tuition fee on the
enrollment behavior of students during the years 2006-2014. In particular, we
show how the variation in the introduction scheme across states and times can
be exploited to identify the federal average causal effect of tuition fees by
controlling for a large amount of potentially influencing attributes for state
heterogeneity. We suggest a stability post-double selection methodology to
robustly determine the causal effect across types in the transparently modeled
unknown response components. The proposed stability resampling scheme in the
two LASSO selection steps efficiently mitigates the risk of model
underspecification and thus biased effects when the tuition fee policy decision
also depends on relevant variables for the state enrollment rates. Correct
inference for the full cross-section state population in the sample requires
adequate design -- rather than sampling-based standard errors. With the
data-driven model selection and explicit control for spatial cross-effects we
detect that tuition fees induce substantial migration effects where the
mobility occurs both from fee but also from non-fee states suggesting also a
general movement for quality. Overall, we find a significant negative impact of
up to 4.5 percentage points of fees on student enrollment. This is in contrast
to plain one-step LASSO or previous empirical studies with full fixed effects
linear panel regressions which generally underestimate the size and get an only
insignificant effect.
arXiv link: http://arxiv.org/abs/1909.08299v2
Adjusted QMLE for the spatial autoregressive parameter
parameters on maximum likelihood estimation of a parameter of interest is to
recenter the profile score for that parameter. We apply this general principle
to the quasi-maximum likelihood estimator (QMLE) of the autoregressive
parameter $\lambda$ in a spatial autoregression. The resulting estimator for
$\lambda$ has better finite sample properties compared to the QMLE for
$\lambda$, especially in the presence of a large number of covariates. It can
also solve the incidental parameter problem that arises, for example, in social
interaction models with network fixed effects, or in spatial panel models with
individual or time fixed effects. However, spatial autoregressions present
specific challenges for this type of adjustment, because recentering the
profile score may cause the adjusted estimate to be outside the usual parameter
space for $\lambda$. Conditions for this to happen are given, and implications
are discussed. For inference, we propose confidence intervals based on a
Lugannani--Rice approximation to the distribution of the adjusted QMLE of
$\lambda$. Based on our simulations, the coverage properties of these intervals
are excellent even in models with a large number of covariates.
arXiv link: http://arxiv.org/abs/1909.08141v1
Distributional conformal prediction
intervals based on models for conditional distributions such as quantile and
distribution regression. Our approach can be applied to important prediction
problems including cross-sectional prediction, k-step-ahead forecasts,
synthetic controls and counterfactual prediction, and individual treatment
effects prediction. Our method exploits the probability integral transform and
relies on permuting estimated ranks. Unlike regression residuals, ranks are
independent of the predictors, allowing us to construct conditionally valid
prediction intervals under heteroskedasticity. We establish approximate
conditional validity under consistent estimation and provide approximate
unconditional validity under model misspecification, overfitting, and with time
series data. We also propose a simple "shape" adjustment of our baseline method
that yields optimal prediction intervals.
arXiv link: http://arxiv.org/abs/1909.07889v3
Statistical inference for statistical decisions
with sample data. Wald's concept of a statistical decision function (SDF)
embraces all mappings of the form [data -> decision]. An SDF need not perform
statistical inference; that is, it need not use data to draw conclusions about
the true state of nature. Inference-based SDFs have the sequential form [data
-> inference -> decision]. This paper motivates inference-based SDFs as
practical procedures for decision making that may accomplish some of what Wald
envisioned. The paper first addresses binary choice problems, where all SDFs
may be viewed as hypothesis tests. It next considers as-if optimization, which
uses a point estimate of the true state as if the estimate were accurate. It
then extends this idea to as-if maximin and minimax-regret decisions, which use
point estimates of some features of the true state as if they were accurate.
The paper primarily uses finite-sample maximum regret to evaluate the
performance of inference-based SDFs. To illustrate abstract ideas, it presents
specific findings concerning treatment choice and point prediction with sample
data.
arXiv link: http://arxiv.org/abs/1909.06853v1
Comparing the forecasting of cryptocurrencies by Bayesian time-varying volatility models
This study is about the four most capitalized cryptocurrencies: Bitcoin,
Ethereum, Litecoin and Ripple. Different Bayesian models are compared,
including models with constant and time-varying volatility, such as stochastic
volatility and GARCH. Moreover, some crypto-predictors are included in the
analysis, such as S&P 500 and Nikkei 225. In this paper the results show that
stochastic volatility is significantly outperforming the benchmark of VAR in
both point and density forecasting. Using a different type of distribution, for
the errors of the stochastic volatility the student-t distribution came out to
be outperforming the standard normal approach.
arXiv link: http://arxiv.org/abs/1909.06599v1
Fast Algorithms for the Quantile Regression Process
existence of fast algorithms. Despite numerous algorithmic improvements, the
computation time is still non-negligible because researchers often estimate
many quantile regressions and use the bootstrap for inference. We suggest two
new fast algorithms for the estimation of a sequence of quantile regressions at
many quantile indexes. The first algorithm applies the preprocessing idea of
Portnoy and Koenker (1997) but exploits a previously estimated quantile
regression to guess the sign of the residuals. This step allows for a reduction
of the effective sample size. The second algorithm starts from a previously
estimated quantile regression at a similar quantile index and updates it using
a single Newton-Raphson iteration. The first algorithm is exact, while the
second is only asymptotically equivalent to the traditional quantile regression
estimator. We also apply the preprocessing idea to the bootstrap by using the
sample estimates to guess the sign of the residuals in the bootstrap sample.
Simulations show that our new algorithms provide very large improvements in
computation time without significant (if any) cost in the quality of the
estimates. For instance, we divide by 100 the time required to estimate 99
quantile regressions with 20 regressors and 50,000 observations.
arXiv link: http://arxiv.org/abs/1909.05782v2
A Consistent LM Type Specification Test for Semiparametric Panel Data Models
semiparametric panel data models with fixed effects. The test statistic
resembles the Lagrange Multiplier (LM) test statistic in parametric models and
is based on a quadratic form in the restricted model residuals. The use of
series methods facilitates both estimation of the null model and computation of
the test statistic. The asymptotic distribution of the test statistic is
standard normal, so that appropriate critical values can easily be computed.
The projection property of series estimators allows me to develop a degrees of
freedom correction. This correction makes it possible to account for the
estimation variance and obtain refined asymptotic results. It also
substantially improves the finite sample performance of the test.
arXiv link: http://arxiv.org/abs/1909.05649v1
Estimation and Applications of Quantile Regression for Binary Longitudinal Data
longitudinal data settings. A novel Markov chain Monte Carlo (MCMC) method is
designed to fit the model and its computational efficiency is demonstrated in a
simulation study. The proposed approach is flexible in that it can account for
common and individual-specific parameters, as well as multivariate
heterogeneity associated with several covariates. The methodology is applied to
study female labor force participation and home ownership in the United States.
The results offer new insights at the various quantiles, which are of interest
to policymakers and researchers alike.
arXiv link: http://arxiv.org/abs/1909.05560v1
Quantile regression methods for first-price auctions
auctions with symmetric risk-neutral bidders under the independent
private-value paradigm. It is first shown that a private-value quantile
regression generates a quantile regression for the bids. The private-value
quantile regression can be easily estimated from the bid quantile regression
and its derivative with respect to the quantile level. This also allows to test
for various specification or exogeneity null hypothesis using the observed bids
in a simple way. A new local polynomial technique is proposed to estimate the
latter over the whole quantile level interval. Plug-in estimation of
functionals is also considered, as needed for the expected revenue or the case
of CRRA risk-averse bidders, which is amenable to our framework. A
quantile-regression analysis to USFS timber is found more appropriate than the
homogenized-bid methodology and illustrates the contribution of each
explanatory variables to the private-value distribution. Linear interactive
sieve extensions are proposed and studied in the Appendices.
arXiv link: http://arxiv.org/abs/1909.05542v2
Recovering Preferences from Finite Data
sufficient conditions for convergence to a unique underlying "true" preference.
Our conditions are weak, and therefore valid in a wide range of economic
environments. We develop applications to expected utility theory, choice over
consumption bundles, menu choice and intertemporal consumption. Our framework
unifies the revealed preference tradition with models that allow for errors.
arXiv link: http://arxiv.org/abs/1909.05457v4
Validating Weak-form Market Efficiency in United States Stock Markets with Trend Deterministic Price Data and Machine Learning
decades. In particular, weak-form market efficiency -- the notion that past
prices cannot predict future performance -- is strongly supported by
econometric evidence. In contrast, machine learning algorithms implemented to
predict stock price have been touted, to varying degrees, as successful.
Moreover, some data scientists boast the ability to garner above-market returns
using price data alone. This study endeavors to connect existing econometric
research on weak-form efficient markets with data science innovations in
algorithmic trading. First, a traditional exploration of stationarity in stock
index prices over the past decade is conducted with Augmented Dickey-Fuller and
Variance Ratio tests. Then, an algorithmic trading platform is implemented with
the use of five machine learning algorithms. Econometric findings identify
potential stationarity, hinting technical evaluation may be possible, though
algorithmic trading results find little predictive power in any machine
learning model, even when using trend-specific metrics. Accounting for
transaction costs and risk, no system achieved above-market returns
consistently. Our findings reinforce the validity of weak-form market
efficiency.
arXiv link: http://arxiv.org/abs/1909.05151v1
Matching Estimators with Few Treated and Many Control Observations
but many control observations. We show that, under standard assumptions, the
nearest neighbor matching estimator for the average treatment effect on the
treated is asymptotically unbiased in this framework. However, when the number
of treated observations is fixed, the estimator is not consistent, and it is
generally not asymptotically normal. Since standard inference methods are
inadequate, we propose alternative inference methods, based on the theory of
randomization tests under approximate symmetry, that are asymptotically valid
in this framework. We show that these tests are valid under relatively strong
assumptions when the number of treated observations is fixed, and under weaker
assumptions when the number of treated observations increases, but at a lower
rate relative to the number of control observations.
arXiv link: http://arxiv.org/abs/1909.05093v4
Direct and Indirect Effects based on Changes-in-Changes
changes-in-changes assumptions restricting unobserved heterogeneity over time.
This allows disentangling the causal effect of a binary treatment on a
continuous outcome into an indirect effect operating through a binary
intermediate variable (called mediator) and a direct effect running via other
causal mechanisms. We identify average and quantile direct and indirect effects
for various subgroups under the condition that the outcome is monotonic in the
unobserved heterogeneity and that the distribution of the latter does not
change over time conditional on the treatment and the mediator. We also provide
a simulation study and an empirical application to the Jobs II programme.
arXiv link: http://arxiv.org/abs/1909.04981v3
Estimating the volatility of Bitcoin using GARCH models
tGARCH) with Student t-distribution, Generalized Error distribution (GED), and
Normal Inverse Gaussian (NIG) distribution are examined. The new development
allows for the modeling of volatility clustering effects, the leptokurtic and
the skewed distributions in the return series of Bitcoin. Comparative to the
two distributions, the normal inverse Gaussian distribution captured adequately
the fat tails and skewness in all the GARCH type models. The tGARCH model was
the best model as it described the asymmetric occurrence of shocks in the
Bitcoin market. That is, the response of investors to the same amount of good
and bad news are distinct. From the empirical results, it can be concluded that
tGARCH-NIG was the best model to estimate the volatility in the return series
of Bitcoin. Generally, it would be optimal to use the NIG distribution in GARCH
type models since time series of most cryptocurrency are leptokurtic.
arXiv link: http://arxiv.org/abs/1909.04903v2
Bayesian Inference on Volatility in the Presence of Infinite Jump Activity and Microstructure Noise
measure and control the risk of financial assets. A L\'{e}vy process with
infinite jump activity and microstructure noise is considered one of the
simplest, yet accurate enough, models for financial data at high-frequency.
Utilizing this model, we propose a "purposely misspecified" posterior of the
volatility obtained by ignoring the jump-component of the process. The
misspecified posterior is further corrected by a simple estimate of the
location shift and re-scaling of the log likelihood. Our main result
establishes a Bernstein-von Mises (BvM) theorem, which states that the proposed
adjusted posterior is asymptotically Gaussian, centered at a consistent
estimator, and with variance equal to the inverse of the Fisher information. In
the absence of microstructure noise, our approach can be extended to inferences
of the integrated variance of a general It\^o semimartingale. Simulations are
provided to demonstrate the accuracy of the resulting credible intervals, and
the frequentist properties of the approximate Bayesian inference based on the
adjusted posterior.
arXiv link: http://arxiv.org/abs/1909.04853v1
Regression to the Mean's Impact on the Synthetic Control Method: Bias and Sensitivity Analysis
able to discern true treatment effects from random noise and effects due to
confounding. Difference-in-Difference techniques which match treated units to
control units based on pre-treatment outcomes, such as the synthetic control
approach have been presented as principled methods to account for confounding.
However, we show that use of synthetic controls or other matching procedures
can introduce regression to the mean (RTM) bias into estimates of the average
treatment effect on the treated. Through simulations, we show RTM bias can lead
to inflated type I error rates as well as decreased power in typical policy
evaluation settings. Further, we provide a novel correction for RTM bias which
can reduce bias and attain appropriate type I error rates. This correction can
be used to perform a sensitivity analysis which determines how results may be
affected by RTM. We use our proposed correction and sensitivity analysis to
reanalyze data concerning the effects of California's Proposition 99, a
large-scale tobacco control program, on statewide smoking rates.
arXiv link: http://arxiv.org/abs/1909.04706v1
Double Robustness for Complier Parameters and a Semiparametric Test for Complier Characteristics
instruments induce subpopulations of compliers with the same observable
characteristics on average, and (ii) whether compliers have observable
characteristics that are the same as the full population on average. The test
is a flexible robustness check for the external validity of instruments. We use
it to reinterpret the difference in LATE estimates that Angrist and Evans
(1998) obtain when using different instrumental variables. To justify the test,
we characterize the doubly robust moment for Abadie (2003)'s class of complier
parameters, and we analyze a machine learning update to $\kappa$ weighting.
arXiv link: http://arxiv.org/abs/1909.05244v7
Virtual Historical Simulation for estimating the conditional VaR of large portfolios
strategies can be advocated. A multivariate strategy requires estimating a
dynamic model for the vector of risk factors, which is often challenging, when
at all possible, for large portfolios. A univariate approach based on a dynamic
model for the portfolio's return seems more attractive. However, when the
combination of the individual returns is time varying, the portfolio's return
series is typically non stationary which may invalidate statistical inference.
An alternative approach consists in reconstituting a "virtual portfolio", whose
returns are built using the current composition of the portfolio and for which
a stationary dynamic model can be estimated.
This paper establishes the asymptotic properties of this method, that we call
Virtual Historical Simulation. Numerical illustrations on simulated and real
data are provided.
arXiv link: http://arxiv.org/abs/1909.04661v1
Dynamics of reallocation within India's income distribution
Indian income distribution, with a particular focus on the dynamics of the
bottom of the distribution. Specifically, we use a stochastic model of
Geometric Brownian Motion with a reallocation parameter that was constructed to
capture the quantum and direction of composite redistribution implied in the
income distribution. It is well known that inequality has been rising in India
in the recent past, but the assumption has been that while the rich benefit
more than proportionally from economic growth, the poor are also better off
than before. Findings from our model refute this, as we find that since the
early 2000s reallocation has consistently been negative, and that the Indian
income distribution has entered a regime of perverse redistribution of
resources from the poor to the rich. Outcomes from the model indicate not only
that income shares of the bottom decile ( 1%) and bottom percentile ( 0.03%)
are at historic lows, but also that real incomes of the bottom decile (-2.5%)
and percentile (-6%) have declined in the 2000s. We validate these findings
using income distribution data and find support for our contention of
persistent negative reallocation in the 2000s. We characterize these findings
in the context of increasing informalization of the workforce in the formal
manufacturing and service sectors, as well as the growing economic insecurity
of the agricultural workforce in India. Significant structural changes will be
required to address this phenomenon.
arXiv link: http://arxiv.org/abs/1909.04452v4
Tree-based Synthetic Control Methods: Consequences of moving the US Embassy
prediction problem and replace its linear regression with a nonparametric model
inspired by machine learning. The proposed method enables us to achieve
accurate counterfactual predictions and we provide theoretical guarantees. We
apply our method to a highly debated policy: the relocation of the US embassy
to Jerusalem. In Israel and Palestine, we find that the average number of
weekly conflicts has increased by roughly 103% over 48 weeks since the
relocation was announced on December 6, 2017. By using conformal inference and
placebo tests, we justify our model and find the increase to be statistically
significant.
arXiv link: http://arxiv.org/abs/1909.03968v3
An Economic Topology of the Brexit vote
Brexit, in the referendum of June 2016 has continued to occupy academics, the
media and politicians. Using topological data analysis ball mapper we extract
information from multi-dimensional datasets gathered on Brexit voting and
regional socio-economic characteristics. While we find broad patterns
consistent with extant empirical work, we also evidence that support for Leave
drew from a far more homogenous demographic than Remain. Obtaining votes from
this concise set was more straightforward for Leave campaigners than was
Remain's task of mobilising a diverse group to oppose Brexit.
arXiv link: http://arxiv.org/abs/1909.03490v2
Multiway Cluster Robust Double/Debiased Machine Learning
clustered sampling environments. We propose a novel multiway cross fitting
algorithm and a multiway DML estimator based on this algorithm. We also develop
a multiway cluster robust standard error formula. Simulations indicate that the
proposed procedure has favorable finite sample performance. Applying the
proposed method to market share data for demand analysis, we obtain larger
two-way cluster robust standard errors than non-robust ones.
arXiv link: http://arxiv.org/abs/1909.03489v3
Identifying Different Definitions of Future in the Assessment of Future Economic Conditions: Application of PU Learning and Text Mining
Japanese government, contains assessments of current and future economic
conditions by people from various fields. Although this survey provides
insights regarding economic policy for policymakers, a clear definition of the
word "future" in future economic conditions is not provided. Hence, the
assessments respondents provide in the survey are simply based on their
interpretations of the meaning of "future." This motivated us to reveal the
different interpretations of the future in their judgments of future economic
conditions by applying weakly supervised learning and text mining. In our
research, we separate the assessments of future economic conditions into
economic conditions of the near and distant future using learning from positive
and unlabeled data (PU learning). Because the dataset includes data from
several periods, we devised new architecture to enable neural networks to
conduct PU learning based on the idea of multi-task learning to efficiently
learn a classifier. Our empirical analysis confirmed that the proposed method
could separate the future economic conditions, and we interpreted the
classification results to obtain intuitions for policymaking.
arXiv link: http://arxiv.org/abs/1909.03348v3
Shrinkage Estimation of Network Spillovers with Factor Structured Errors
interaction that is flexible both in its approach to specifying the network of
connections between cross-sectional units, and in controlling for unobserved
heterogeneity. It is assumed that there are different sources of information
available on a network, which can be represented in the form of multiple
weights matrices. These matrices may reflect observed links, different measures
of connectivity, groupings or other network structures, and the number of
matrices may be increasing with sample size. A penalised quasi-maximum
likelihood estimator is proposed which aims to alleviate the risk of network
misspecification by shrinking the coefficients of irrelevant weights matrices
to exactly zero. Moreover, controlling for unobserved factors in estimation
provides a safeguard against the misspecification that might arise from
unobserved heterogeneity. The asymptotic properties of the estimator are
derived in a framework where the true value of each parameter remains fixed as
the total number of parameters increases. A Monte Carlo simulation is used to
assess finite sample performance, and in an empirical application the method is
applied to study the prevalence of network spillovers in determining growth
rates across countries.
arXiv link: http://arxiv.org/abs/1909.02823v4
Using Wasserstein Generative Adversarial Networks for the Design of Monte Carlo Simulations
compare the performance of the new methods to those of existing methods in
Monte Carlo studies. The credibility of such Monte Carlo studies is often
limited because of the freedom the researcher has in choosing the design. In
recent years a new class of generative models emerged in the machine learning
literature, termed Generative Adversarial Networks (GANs) that can be used to
systematically generate artificial data that closely mimics real economic
datasets, while limiting the degrees of freedom for the researcher and
optionally satisfying privacy guarantees with respect to their training data.
In addition if an applied researcher is concerned with the performance of a
particular statistical method on a specific data set (beyond its theoretical
properties in large samples), she may wish to assess the performance, e.g., the
coverage rate of confidence intervals or the bias of the estimator, using
simulated data which resembles her setting. Tol illustrate these methods we
apply Wasserstein GANs (WGANs) to compare a number of different estimators for
average treatment effects under unconfoundedness in three distinct settings
(corresponding to three real data sets) and present a methodology for assessing
the robustness of the results. In this example, we find that (i) there is not
one estimator that outperforms the others in all three settings, so researchers
should tailor their analytic approach to a given setting, and (ii) systematic
simulation studies can be helpful for selecting among competing methods in this
situation.
arXiv link: http://arxiv.org/abs/1909.02210v3
Inference in Difference-in-Differences: How Much Should We Trust in Independent Clusters?
when there is spatial correlation. We present novel theoretical insights and
empirical evidence on the settings in which ignoring spatial correlation should
lead to more or less distortions in DID applications. We show that details such
as the time frame used in the estimation, the choice of the treated and control
groups, and the choice of the estimator, are key determinants of distortions
due to spatial correlation. We also analyze the feasibility and trade-offs
involved in a series of alternatives to take spatial correlation into account.
Given that, we provide relevant recommendations for applied researchers on how
to mitigate and assess the possibility of inference distortions due to spatial
correlation.
arXiv link: http://arxiv.org/abs/1909.01782v7
Testing nonparametric shape restrictions
as constraints on the signs of derivatives, U-(S-)shape, symmetry,
quasi-convexity, log-convexity, $r$-convexity, among others, in a nonparametric
framework using partial sums empirical processes. We show that, after a
suitable transformation, its asymptotic distribution is a functional of the
standard Brownian motion, so that critical values are available. However, due
to the possible poor approximation of the asymptotic critical values to the
finite sample ones, we also describe a valid bootstrap algorithm.
arXiv link: http://arxiv.org/abs/1909.01675v2
Bias and Consistency in Three-way Gravity Models
{Pseudo-Maximum Likelihood} (“PPML”) estimator recently recommended for
identifying the effects of trade policies and in other panel data gravity
settings. Despite the number and variety of fixed effects involved, we confirm
PPML is consistent for fixed $T$ and we show it is in fact the only estimator
among a wide range of PML gravity estimators that is generally consistent in
this context when $T$ is fixed. At the same time, asymptotic confidence
intervals in fixed-$T$ panels are not correctly centered at the true point
estimates, and cluster-robust variance estimates used to construct standard
errors are generally biased as well. We characterize each of these biases
analytically and show both numerically and empirically that they are salient
even for real-data settings with a large number of countries. We also offer
practical remedies that can be used to obtain more reliable inferences of the
effects of trade policies and other time-varying gravity variables, which we
make available via an accompanying Stata package called ppml_fe_bias.
arXiv link: http://arxiv.org/abs/1909.01327v6
State Drug Policy Effectiveness: Comparative Policy Analysis of Drug Overdose Mortality
innovations have followed suit in an effort to prevent overdose deaths.
State-level drug law is a set of policies that may reinforce or undermine each
other, and analysts have a limited set of tools for handling the policy
collinearity using statistical methods. This paper uses a machine learning
method called hierarchical clustering to empirically generate "policy bundles"
by grouping states with similar sets of policies in force at a given time
together for analysis in a 50-state, 10-year interrupted time series regression
with drug overdose deaths as the dependent variable. Policy clusters were
generated from 138 binomial variables observed by state and year from the
Prescription Drug Abuse Policy System. Clustering reduced the policies to a set
of 10 bundles. The approach allows for ranking of the relative effect of
different bundles and is a tool to recommend those most likely to succeed. This
study shows that a set of policies balancing Medication Assisted Treatment,
Naloxone Access, Good Samaritan Laws, Medication Assisted Treatment,
Prescription Drug Monitoring Programs and legalization of medical marijuana
leads to a reduced number of overdose deaths, but not until its second year in
force.
arXiv link: http://arxiv.org/abs/1909.01936v3
Are Bitcoins price predictable? Evidence from machine learning techniques using technical indicators
predict the price of Bitcoin. Accurately predicting the price for Bitcoin is
therefore important for decision-making process of investors and market players
in the cryptocurrency market. Using historical data from 01/01/2012 to
16/08/2019, machine learning techniques (Generalized linear model via penalized
maximum likelihood, random forest, support vector regression with linear
kernel, and stacking ensemble) were used to forecast the price of Bitcoin. The
prediction models employed key and high dimensional technical indicators as the
predictors. The performance of these techniques were evaluated using mean
absolute percentage error (MAPE), root mean square error (RMSE), mean absolute
error (MAE), and coefficient of determination (R-squared). The performance
metrics revealed that the stacking ensemble model with two base learner (random
forest and generalized linear model via penalized maximum likelihood) and
support vector regression with linear kernel as meta-learner was the optimal
model for forecasting Bitcoin price. The MAPE, RMSE, MAE, and R-squared values
for the stacking ensemble model were 0.0191%, 15.5331 USD, 124.5508 USD, and
0.9967 respectively. These values show a high degree of reliability in
predicting the price of Bitcoin using the stacking ensemble model. Accurately
predicting the future price of Bitcoin will yield significant returns for
investors and market players in the cryptocurrency market.
arXiv link: http://arxiv.org/abs/1909.01268v1
SortedEffects: Sorted Causal Effects in R
regression models. This method consists of reporting percentiles of the partial
effects in addition to the average commonly used to summarize the heterogeneity
in the partial effects. They also proposed to use the sorted effects to carry
out classification analysis where the observational units are classified as
most and least affected if their causal effects are above or below some tail
sorted effects. The R package SortedEffects implements the estimation and
inference methods therein and provides tools to visualize the results. This
vignette serves as an introduction to the package and displays basic
functionality of the functions within.
arXiv link: http://arxiv.org/abs/1909.00836v3
Fixed-k Inference for Conditional Extremal Quantiles
data to construct asymptotically valid confidence intervals (CIs) for
conditional extremal quantiles from a fixed number $k$ of nearest-neighbor tail
observations. As a by-product, we also construct CIs for extremal quantiles of
coefficients in linear random coefficient models. For any fixed $k$, the CIs
are uniformly valid without parametric assumptions over a set of nonparametric
data generating processes associated with various tail indices. Simulation
studies show that our CIs exhibit superior small-sample coverage and length
properties than alternative nonparametric methods based on asymptotic
normality. Applying the proposed method to Natality Vital Statistics, we study
factors of extremely low birth weights. We find that signs of major effects are
the same as those found in preceding studies based on parametric models, but
with different magnitudes.
arXiv link: http://arxiv.org/abs/1909.00294v3
Mapping Firms' Locations in Technological Space: A Topological Analysis of Patent Statistics
technological space is challenging due to its high dimensionality. We propose a
new method to characterize firms' inventive activities via topological data
analysis (TDA) that represents high-dimensional data in a shape graph. Applying
this method to 333 major firms' patents in 1976--2005 reveals substantial
heterogeneity: some firms remain undifferentiated; others develop unique
portfolios. Firms with unique trajectories, which we define and measure
graph-theoretically as "flares" in the Mapper graph, perform better. This
association is statistically and economically significant, and continues to
hold after we control for portfolio size, firm survivorship, industry
classification, and firm fixed effects. By contrast, existing techniques --
such as principal component analysis (PCA) and Jaffe's (1989) clustering method
-- struggle to track these firm-level dynamics.
arXiv link: http://arxiv.org/abs/1909.00257v7
Rethinking travel behavior modeling representations through embeddings
re-representing discrete variables that are typically used in travel demand
modeling, such as mode, trip purpose, education level, family type or
occupation. This re-representation process essentially maps those variables
into a latent space called the embedding space. The benefit of this is
that such spaces allow for richer nuances than the typical transformations used
in categorical variables (e.g. dummy encoding, contrasted encoding, principal
components analysis). While the usage of latent variable representations is not
new per se in travel demand modeling, the idea presented here brings several
innovations: it is an entirely data driven algorithm; it is informative and
consistent, since the latent space can be visualized and interpreted based on
distances between different categories; it preserves interpretability of
coefficients, despite being based on Neural Network principles; and it is
transferrable, in that embeddings learned from one dataset can be reused for
other ones, as long as travel behavior keeps consistent between the datasets.
The idea is strongly inspired on natural language processing techniques,
namely the word2vec algorithm. Such algorithm is behind recent developments
such as in automatic translation or next word prediction. Our method is
demonstrated using a model choice model, and shows improvements of up to 60%
with respect to initial likelihood, and up to 20% with respect to likelihood of
the corresponding traditional model (i.e. using dummy variables) in
out-of-sample evaluation. We provide a new Python package, called PyTre (PYthon
TRavel Embeddings), that others can straightforwardly use to replicate our
results or improve their own models. Our experiments are themselves based on an
open dataset (swissmetro).
arXiv link: http://arxiv.org/abs/1909.00154v1
Systemic Risk Clustering of China Internet Financial Based on t-SNE Machine Learning Algorithm
have shown that Internet financial platforms have different financial systemic
risk characteristics when they are subject to macroeconomic shocks or fragile
internal crisis. From the perspective of regional development of Internet
finance, this paper uses t-SNE machine learning algorithm to obtain data mining
of China's Internet finance development index involving 31 provinces and 335
cities and regions. The conclusion of the peak and thick tail characteristics,
then proposed three classification risks of Internet financial systemic risk,
providing more regionally targeted recommendations for the systematic risk of
Internet finance.
arXiv link: http://arxiv.org/abs/1909.03808v1
The economics of minority language use: theory and empirical evidence for a language game model
We study three modern multilingual societies -- the Basque Country, Ireland and
Wales -- which are endowed with two, linguistically distant, official
languages: $A$, spoken by all individuals, and $B$, spoken by a bilingual
minority. In the three cases it is observed a decay in the use of minoritarian
$B$, a sign of diversity loss. However, for the "Council of Europe" the key
factor to avoid the shift of $B$ is its use in all domains. Thus, we
investigate the language choices of the bilinguals by means of an evolutionary
game theoretic model. We show that the language population dynamics has reached
an evolutionary stable equilibrium where a fraction of bilinguals have shifted
to speak $A$. Thus, this equilibrium captures the decline in the use of $B$. To
test the theory we build empirical models that predict the use of $B$ for each
proportion of bilinguals. We show that model-based predictions fit very well
the observed use of Basque, Irish, and Welsh.
arXiv link: http://arxiv.org/abs/1908.11604v1
Infinitely Stochastic Micro Forecasting
unconventional tool for stochastic prediction of future expenses based on the
individual (micro) developments of recorded events. Consider a firm,
enterprise, institution, or state, which possesses knowledge about particular
historical events. For each event, there is a series of several related
subevents: payments or losses spread over time, which all leads to an
infinitely stochastic process at the end. Nevertheless, the issue is that some
already occurred events do not have to be necessarily reported. The aim lies in
forecasting future subevent flows coming from already reported, occurred but
not reported, and yet not occurred events. Our methodology is illustrated on
quantitative risk assessment, however, it can be applied to other areas such as
startups, epidemics, war damages, advertising and commercials, digital
payments, or drug prescription as manifested in the paper. As a theoretical
contribution, inference for infinitely stochastic processes is developed. In
particular, a non-homogeneous Poisson process with non-homogeneous Poisson
processes as marks is used, which includes for instance the Cox process as a
special case.
arXiv link: http://arxiv.org/abs/1908.10636v2
Stock Price Forecasting and Hypothesis Testing Using Neural Networks
predict NYSE, NASDAQ and AMEX stock prices from historical data. We experiment
with different architectures and compare data normalization techniques. Then,
we leverage those findings to question the efficient-market hypothesis through
a formal statistical test.
arXiv link: http://arxiv.org/abs/1908.11212v1
Theory of Weak Identification in Semiparametric Models
models and an efficiency concept. Weak identification occurs when a parameter
is weakly regular, i.e., when it is locally homogeneous of degree zero. When
this happens, consistent or equivariant estimation is shown to be impossible.
We then show that there exists an underlying regular parameter that fully
characterizes the weakly regular parameter. While this parameter is not unique,
concepts of sufficiency and minimality help pin down a desirable one. If
estimation of minimal sufficient underlying parameters is inefficient, it
introduces noise in the corresponding estimation of weakly regular parameters,
whence we can improve the estimators by local asymptotic Rao-Blackwellization.
We call an estimator weakly efficient if it does not admit such improvement.
New weakly efficient estimators are presented in linear IV and nonlinear
regression models. Simulation of a linear IV model demonstrates how 2SLS and
optimal IV estimators are improved.
arXiv link: http://arxiv.org/abs/1908.10478v3
A multi-scale symmetry analysis of uninterrupted trends returns of daily financial indices
financial indices, by means of a statistical procedure developed by the authors
based on a symmetry statistic by Einmahl and Mckeague. We applied this
statistical methodology to financial uninterrupted daily trends returns and to
other derived observable. In our opinion, to study distributional symmetry,
trends returns offer more advantages than the commonly used daily financial
returns; the two most important being: 1) Trends returns involve sampling over
different time scales and 2) By construction, this variable time series
contains practically the same number of non-negative and negative entry values.
We also show that these time multi-scale returns display distributional
bi-modality. Daily financial indices analyzed in this work, are the Mexican
IPC, the American DJIA, DAX from Germany and the Japanese Market index Nikkei,
covering a time period from 11-08-1991 to 06-30-2017. We show that, at the time
scale resolution and significance considered in this paper, it is almost always
feasible to find an interval of possible symmetry points containing one most
plausible symmetry point denoted by C. Finally, we study the temporal evolution
of C showing that this point is seldom zero and responds with sensitivity to
extreme market events.
arXiv link: http://arxiv.org/abs/1908.11204v1
The Ridge Path Estimator for Linear Instrumental Variables
variables (IV) estimator that uses a ridge regression penalty. The
regularization tuning parameter is selected empirically by splitting the
observed data into training and test samples. Conditional on the tuning
parameter, the training sample creates a path from the IV estimator to a prior.
The optimal tuning parameter is the value along this path that minimizes the IV
objective function for the test sample.
The empirically selected regularization tuning parameter becomes an estimated
parameter that jointly converges with the parameters of interest. The
asymptotic distribution of the tuning parameter is a nonstandard mixture
distribution. Monte Carlo simulations show the asymptotic distribution captures
the characteristics of the sampling distributions and when this ridge estimator
performs better than two-stage least squares.
arXiv link: http://arxiv.org/abs/1908.09237v1
Welfare Analysis in Dynamic Models
develop estimation and inference for these parameters even in the presence of a
high-dimensional state space. Examples of welfare metrics include average
welfare, average marginal welfare effects, and welfare decompositions into
direct and indirect effects similar to Oaxaca (1973) and Blinder (1973). We
derive dual and doubly robust representations of welfare metrics that
facilitate debiased inference. For average welfare, the value function does not
have to be estimated. In general, debiasing can be applied to any estimator of
the value function, including neural nets, random forests, Lasso, boosting, and
other high-dimensional methods. In particular, we derive Lasso and Neural
Network estimators of the value function and associated dynamic dual
representation and establish associated mean square convergence rates for these
functions. Debiasing is automatic in the sense that it only requires knowledge
of the welfare metric of interest, not the form of bias correction. The
proposed methods are applied to estimate a dynamic behavioral model of teacher
absenteeism in DHR and associated average teacher welfare.
arXiv link: http://arxiv.org/abs/1908.09173v5
Constraint Qualifications in Partial Identification
problems that fulfill constraint qualifications. The literature on estimation
and inference under partial identification frequently restricts the geometry of
identified sets with diverse high-level assumptions. These superficially appear
to be different approaches to closely related problems. We extensively analyze
their relation. Among other things, we show that for partial identification
through pure moment inequalities, numerous assumptions from the literature
essentially coincide with the Mangasarian-Fromowitz constraint qualification.
This clarifies the relation between well-known contributions, including within
econometrics, and elucidates stringency, as well as ease of verification, of
some high-level assumptions in seminal papers.
arXiv link: http://arxiv.org/abs/1908.09103v4
Dyadic Regression
units are of primary interest, arise frequently in social science research.
Regression analyses with such data feature prominently in many research
literatures (e.g., gravity models of trade). The dependence structure
associated with dyadic data raises special estimation and, especially,
inference issues. This chapter reviews currently available methods for
(parametric) dyadic regression analysis and presents guidelines for empirical
researchers.
arXiv link: http://arxiv.org/abs/1908.09029v1
Nonparametric estimation of causal heterogeneity under high-dimensional confounding
estimating heterogeneous average treatment effects that vary with a limited
number of discrete and continuous covariates in a selection-on-observables
framework where the number of possible confounders is very large. We propose a
two-step estimator for which the first step is estimated by machine learning.
We show that this estimator has desirable statistical properties like
consistency, asymptotic normality and rate double robustness. In particular, we
derive the coupled convergence conditions between the nonparametric and the
machine learning steps. We also show that estimating population average
treatment effects by averaging the estimated heterogeneous effects is
semi-parametrically efficient. The new estimator is an empirical example of the
effects of mothers' smoking during pregnancy on the resulting birth weight.
arXiv link: http://arxiv.org/abs/1908.08779v1
Heterogeneous Earnings Effects of the Job Corps by Gender Earnings: A Translated Quantile Approach
for males than for females. This effect heterogeneity favouring males contrasts
with the results of the majority of other training programmes' evaluations.
Applying the translated quantile approach of Bitler, Hoynes, and Domina (2014),
I investigate a potential mechanism behind the surprising findings for the Job
Corps. My results provide suggestive evidence that the effect of heterogeneity
by gender operates through existing gender earnings inequality rather than Job
Corps trainability differences.
arXiv link: http://arxiv.org/abs/1908.08721v1
Online Causal Inference for Advertising in Real-Time Bidding Auctions
impressions to competing advertisers, continue to enjoy success in digital
advertising. Assessing the effectiveness of such advertising remains a
challenge in research and practice. This paper proposes a new approach to
perform causal inference on advertising bought through such mechanisms.
Leveraging the economic structure of first- and second-price auctions, we first
show that the effects of advertising are identified by the optimal bids. Hence,
since these optimal bids are the only objects that need to be recovered, we
introduce an adapted Thompson sampling (TS) algorithm to solve a multi-armed
bandit problem that succeeds in recovering such bids and, consequently, the
effects of advertising while minimizing the costs of experimentation. We derive
a regret bound for our algorithm which is order optimal and use data from RTB
auctions to show that it outperforms commonly used methods that estimate the
effects of advertising.
arXiv link: http://arxiv.org/abs/1908.08600v4
A Doubly Corrected Robust Variance Estimator for Linear GMM
generalized method of moments (GMM) including the one-step, two-step, and
iterated estimators. Our formula additionally corrects for the
over-identification bias in variance estimation on top of the commonly used
finite sample correction of Windmeijer (2005) which corrects for the bias from
estimating the efficient weight matrix, so is doubly corrected. An important
feature of the proposed double correction is that it automatically provides
robustness to misspecification of the moment condition. In contrast, the
conventional variance estimator and the Windmeijer correction are inconsistent
under misspecification. That is, the proposed double correction formula
provides a convenient way to obtain improved inference under correct
specification and robustness against misspecification at the same time.
arXiv link: http://arxiv.org/abs/1908.07821v2
Analyzing Commodity Futures Using Factor State-Space Models with Wishart Stochastic Volatility
and forecast the term structure of future contracts on commodities. Our
approach builds upon the dynamic 3-factor Nelson-Siegel model and its 4-factor
Svensson extension and assumes for the latent level, slope and curvature
factors a Gaussian vector autoregression with a multivariate Wishart stochastic
volatility process. Exploiting the conjugacy of the Wishart and the Gaussian
distribution, we develop a computationally fast and easy to implement MCMC
algorithm for the Bayesian posterior analysis. An empirical application to
daily prices for contracts on crude oil with stipulated delivery dates ranging
from one to 24 months ahead show that the estimated 4-factor Svensson model
with two curvature factors provides a good parsimonious representation of the
serial correlation in the individual prices and their volatility. It also shows
that this model has a good out-of-sample forecast performance.
arXiv link: http://arxiv.org/abs/1908.07798v1
New developments in revealed preference theory: decisions under risk, uncertainty, and intertemporal choice
discusses the testable implications of theories of choice that are germane to
specific economic environments. The focus is on expected utility in risky
environments; subjected expected utility and maxmin expected utility in the
presence of uncertainty; and exponentially discounted utility for intertemporal
choice. The testable implications of these theories for data on choice from
classical linear budget sets are described, and shown to follow a common
thread. The theories all imply an inverse relation between prices and
quantities, with different qualifications depending on the functional forms in
the theory under consideration.
arXiv link: http://arxiv.org/abs/1908.07561v2
Spectral inference for large Stochastic Blockmodels with nodal covariates
between observed and unobserved factors affecting network structure. To this
end, we develop spectral estimators for both unobserved blocks and the effect
of covariates in stochastic blockmodels. On the theoretical side, we establish
asymptotic normality of our estimators for the subsequent purpose of performing
inference. On the applied side, we show that computing our estimator is much
faster than standard variational expectation--maximization algorithms and
scales well for large networks. Monte Carlo experiments suggest that the
estimator performs well under different data generating processes. Our
application to Facebook data shows evidence of homophily in gender, role and
campus-residence, while allowing us to discover unobserved communities. The
results in this paper provide a foundation for spectral estimation of the
effect of observed covariates as well as unobserved latent community structure
on the probability of link formation in networks.
arXiv link: http://arxiv.org/abs/1908.06438v2
Measuring international uncertainty using global vector autoregressions with drifting parameters
macroeconomic uncertainty shocks. We use a global vector autoregressive
specification with drifting coefficients and factor stochastic volatility in
the errors to model six economies jointly. The measure of uncertainty is
constructed endogenously by estimating a scalar driving the innovation
variances of the latent factors, which is also included in the mean of the
process. To achieve regularization, we use Bayesian techniques for estimation,
and introduce a set of hierarchical global-local priors. The adopted priors
center the model on a constant parameter specification with homoscedastic
errors, but allow for time-variation if suggested by likelihood information.
Moreover, we assume coefficients across economies to be similar, but provide
sufficient flexibility via the hierarchical prior for country-specific
idiosyncrasies. The results point towards pronounced real and financial effects
of uncertainty shocks in all countries, with differences across economies and
over time.
arXiv link: http://arxiv.org/abs/1908.06325v2
A model of discrete choice based on reinforcement learning under short-term memory
statistical averaging of choices made by a subject in a reinforcement learning
process, where the subject has short, k-term memory span. The choice
probabilities in these models combine in a non-trivial, non-linear way the
initial learning bias and the experience gained through learning. The
properties of such models are discussed and, in particular, it is shown that
probabilities deviate from Luce's Choice Axiom, even if the initial bias
adheres to it. Moreover, we shown that the latter property is recovered as the
memory span becomes large.
Two applications in utility theory are considered. In the first, we use the
discrete choice model to generate binary preference relation on simple
lotteries. We show that the preferences violate transitivity and independence
axioms of expected utility theory. Furthermore, we establish the dependence of
the preferences on frames, with risk aversion for gains, and risk seeking for
losses. Based on these findings we propose next a parametric model of choice
based on the probability maximization principle, as a model for deviations from
expected utility principle. To illustrate the approach we apply it to the
classical problem of demand for insurance.
arXiv link: http://arxiv.org/abs/1908.06133v1
Forward-Selected Panel Data Approach for Program Evaluation
work with observational data in view of limited opportunities to carry out
controlled experiments. In the potential outcome framework, the panel data
approach (Hsiao, Ching and Wan, 2012) constructs the counterfactual by
exploiting the correlation between cross-sectional units in panel data. The
choice of cross-sectional control units, a key step in its implementation, is
nevertheless unresolved in data-rich environment when many possible controls
are at the researcher's disposal. We propose the forward selection method to
choose control units, and establish validity of the post-selection inference.
Our asymptotic framework allows the number of possible controls to grow much
faster than the time dimension. The easy-to-implement algorithms and their
theoretical guarantee extend the panel data approach to big data settings.
arXiv link: http://arxiv.org/abs/1908.05894v3
Testing the Drift-Diffusion Model
diffusion (Brownian) signals, where the decision maker accumulates evidence
until the process hits a stopping boundary, and then stops and chooses the
alternative that corresponds to that boundary. This model has been widely used
in psychology, neuroeconomics, and neuroscience to explain the observed
patterns of choice and response times in a range of binary choice decision
problems. This paper provides a statistical test for DDM's with general
boundaries. We first prove a characterization theorem: we find a condition on
choice probabilities that is satisfied if and only if the choice probabilities
are generated by some DDM. Moreover, we show that the drift and the boundary
are uniquely identified. We then use our condition to nonparametrically
estimate the drift and the boundary and construct a test statistic.
arXiv link: http://arxiv.org/abs/1908.05824v1
Counting Defiers
"defiers," individuals whose treatment always runs counter to the instrument,
in the terminology of Balke and Pearl (1993) and Angrist et al. (1996). I allow
for defiers in a model with a binary instrument and a binary treatment. The
model is explicit about the randomization process that gives rise to the
instrument. I use the model to develop estimators of the counts of defiers,
always takers, compliers, and never takers. I propose separate versions of the
estimators for contexts in which the parameter of the randomization process is
unspecified, which I intend for use with natural experiments with virtual
random assignment. I present an empirical application that revisits Angrist and
Evans (1998), which examines the impact of virtual random assignment of the sex
of the first two children on subsequent fertility. I find that subsequent
fertility is much more responsive to the sex mix of the first two children when
defiers are allowed.
arXiv link: http://arxiv.org/abs/1908.05811v2
A Model of a Randomized Experiment with an Application to the PROWESS Clinical Trial
binary outcome. Potential outcomes in the intervention and control groups give
rise to four types of participants. Fixing ideas such that the outcome is
mortality, some participants would live regardless, others would be saved,
others would be killed, and others would die regardless. These potential
outcome types are not observable. However, I use the model to develop
estimators of the number of participants of each type. The model relies on the
randomization within the experiment and on deductive reasoning. I apply the
model to an important clinical trial, the PROWESS trial, and I perform a Monte
Carlo simulation calibrated to estimates from the trial. The reduced form from
the trial shows a reduction in mortality, which provided a rationale for FDA
approval. However, I find that the intervention killed two participants for
every three it saved.
arXiv link: http://arxiv.org/abs/1908.05810v2
Isotonic Regression Discontinuity Designs
at the boundary point, an object that is particularly interesting and required
in the analysis of monotone regression discontinuity designs. We show that the
isotonic regression is inconsistent in this setting and derive the asymptotic
distributions of boundary corrected estimators. Interestingly, the boundary
corrected estimators can be bootstrapped without subsampling or additional
nonparametric smoothing which is not the case for the interior point. The Monte
Carlo experiments indicate that shape restrictions can improve dramatically the
finite-sample performance of unrestricted estimators. Lastly, we apply the
isotonic regression discontinuity designs to estimate the causal effect of
incumbency in the U.S. House elections.
arXiv link: http://arxiv.org/abs/1908.05752v6
Injectivity and the Law of Demand
variety of methodologies. When a version of the law of demand holds, global
injectivity can be checked by seeing whether the demand mapping is constant
over any line segments. When we add the assumption of differentiability, we
obtain necessary and sufficient conditions for injectivity that generalize
classical gale1965jacobian conditions for quasi-definite Jacobians.
arXiv link: http://arxiv.org/abs/1908.05714v1
Nonparametric Identification of First-Price Auction with Unobserved Competition: A Density Discontinuity Framework
first-price auction models, in which the analyst only observes winning bids.
Our benchmark model assumes an exogenous number of bidders $N$. We show that,
if the bidders observe $N$, the resulting discontinuities in the winning bid
density can be used to identify the distribution of $N$. The private value
distribution can be nonparametrically identified in a second step. This
extends, under testable identification conditions, to the case where $N$ is a
number of potential buyers, who bid with some unknown probability.
Identification also holds in presence of additive unobserved heterogeneity
drawn from some parametric distributions. A parametric Bayesian estimation
procedure is proposed. An application to Shanghai Government IT procurements
finds that the imposed three bidders participation rule is not effective. This
generates loss in the range of as large as $10%$ of the appraisal budget for
small IT contracts.
arXiv link: http://arxiv.org/abs/1908.05476v3
On rank estimators in increasing dimensions
1987) as a notable example, has been widely exploited in studying regression
problems. For these estimators, although the linear index is introduced for
alleviating the impact of dimensionality, the effect of large dimension on
inference is rarely studied. This paper fills this gap via studying the
statistical properties of a larger family of M-estimators, whose objective
functions are formulated as U-processes and may be discontinuous in increasing
dimension set-up where the number of parameters, $p_{n}$, in the model is
allowed to increase with the sample size, $n$. First, we find that often in
estimation, as $p_{n}/n\rightarrow 0$, $(p_{n}/n)^{1/2}$ rate of convergence is
obtainable. Second, we establish Bahadur-type bounds and study the validity of
normal approximation, which we find often requires a much stronger scaling
requirement than $p_{n}^{2}/n\rightarrow 0.$ Third, we state conditions under
which the numerical derivative estimator of asymptotic covariance matrix is
consistent, and show that the step size in implementing the covariance
estimator has to be adjusted with respect to $p_{n}$. All theoretical results
are further backed up by simulation studies.
arXiv link: http://arxiv.org/abs/1908.05255v1
Forecast Encompassing Tests for the Expected Shortfall
Shortfall (ES). The ES currently receives much attention through its
introduction into the Basel III Accords, which stipulate its use as the primary
market risk measure for the international banking regulation. We utilize joint
loss functions for the pair ES and Value at Risk to set up three ES
encompassing test variants. The tests are built on misspecification robust
asymptotic theory and we investigate the finite sample properties of the tests
in an extensive simulation study. We use the encompassing tests to illustrate
the potential of forecast combination methods for different financial assets.
arXiv link: http://arxiv.org/abs/1908.04569v3
Zero Black-Derman-Toy interest rate model
rate tree model, which includes the possibility of a jump with small
probability at each step to a practically zero interest rate. The corresponding
BDT algorithms are consequently modified to calibrate the tree containing the
zero interest rate scenarios. This modification is motivated by the recent
2008-2009 crisis in the United States and it quantifies the risk of a future
crises in bond prices and derivatives. The proposed model is useful to price
derivatives. This exercise also provides a tool to calibrate the probability of
this event. A comparison of option prices and implied volatilities on US
Treasury bonds computed with both the proposed and the classical tree model is
provided, in six different scenarios along the different periods comprising the
years 2002-2017.
arXiv link: http://arxiv.org/abs/1908.04401v2
Maximum Approximated Likelihood Estimation
in cases where the likelihood function is analytically intractable. Most of the
theoretical literature focuses on maximum simulated likelihood (MSL)
estimators, while empirical and simulation analyzes often find that alternative
approximation methods such as quasi-Monte Carlo simulation, Gaussian
quadrature, and integration on sparse grids behave considerably better
numerically. This paper generalizes the theoretical results widely known for
MSL estimators to a general set of maximum approximated likelihood (MAL)
estimators. We provide general conditions for both the model and the
approximation approach to ensure consistency and asymptotic normality. We also
show specific examples and finite-sample simulation results.
arXiv link: http://arxiv.org/abs/1908.04110v1
Privacy-Aware Distributed Mobility Choice Modelling over Blockchain
where participants do not share personal raw data, while all computations are
done locally. Participants use Blockchain based Smart Mobility Data-market
(BSMD), where all transactions are secure and private. Nodes in blockchain can
transact information with other participants as long as both parties agree to
the transaction rules issued by the owner of the data. A case study is
presented where a mode choice model is distributed and estimated over BSMD. As
an example, the parameter estimation problem is solved on a distributed version
of simulated annealing. It is demonstrated that the estimated model parameters
are consistent and reproducible.
arXiv link: http://arxiv.org/abs/1908.03446v2
Analysis of Networks via the Sparse $β$-Model
areas, yet statistical models allowing for parameter estimates with desirable
statistical properties for sparse networks remain scarce. To address this, we
propose the Sparse $\beta$-Model (S$\beta$M), a new network model that
interpolates the celebrated Erdos-R\'enyi model and the $\beta$-model that
assigns one different parameter to each node. By a novel reparameterization of
the $\beta$-model to distinguish global and local parameters, our S$\beta$M can
drastically reduce the dimensionality of the $\beta$-model by requiring some of
the local parameters to be zero. We derive the asymptotic distribution of the
maximum likelihood estimator of the S$\beta$M when the support of the parameter
vector is known. When the support is unknown, we formulate a penalized
likelihood approach with the $\ell_0$-penalty. Remarkably, we show via a
monotonicity lemma that the seemingly combinatorial computational problem due
to the $\ell_0$-penalty can be overcome by assigning nonzero parameters to
those nodes with the largest degrees. We further show that a $\beta$-min
condition guarantees our method to identify the true model and provide excess
risk bounds for the estimated parameters. The estimation procedure enjoys good
finite sample properties as shown by simulation studies. The usefulness of the
S$\beta$M is further illustrated via the analysis of a microfinance take-up
example.
arXiv link: http://arxiv.org/abs/1908.03152v3
Efficient Estimation by Fully Modified GLS with an Application to the Environmental Kuznets Curve
Least Squares estimator for multivariate cointegrating polynomial regressions.
Such regressions allow for deterministic trends, stochastic trends and integer
powers of stochastic trends to enter the cointegrating relations. Our fully
modified estimator incorporates: (1) the direct estimation of the inverse
autocovariance matrix of the multidimensional errors, and (2) second order bias
corrections. The resulting estimator has the intuitive interpretation of
applying a weighted least squares objective function to filtered data series.
Moreover, the required second order bias corrections are convenient byproducts
of our approach and lead to standard asymptotic inference. We also study
several multivariate KPSS-type of tests for the null of cointegration. A
comprehensive simulation study shows good performance of the FM-GLS estimator
and the related tests. As a practical illustration, we reinvestigate the
Environmental Kuznets Curve (EKC) hypothesis for six early industrialized
countries as in Wagner et al. (2020).
arXiv link: http://arxiv.org/abs/1908.02552v2
Estimation of Conditional Average Treatment Effects with High-Dimensional Data
estimators for the reduced dimensional conditional average treatment effect
(CATE) function. In the first stage, the nuisance functions necessary for
identifying CATE are estimated by machine learning methods, allowing the number
of covariates to be comparable to or larger than the sample size. The second
stage consists of a low-dimensional local linear regression, reducing CATE to a
function of the covariate(s) of interest. We consider two variants of the
estimator depending on whether the nuisance functions are estimated over the
full sample or over a hold-out sample. Building on Belloni at al. (2017) and
Chernozhukov et al. (2018), we derive functional limit theory for the
estimators and provide an easy-to-implement procedure for uniform inference
based on the multiplier bootstrap. The empirical application revisits the
effect of maternal smoking on a baby's birth weight as a function of the
mother's age.
arXiv link: http://arxiv.org/abs/1908.02399v5
Semiparametric Wavelet-based JPEG IV Estimator for endogenously truncated data
in a sequence of irregular noisy data points which also accommodates a
reference-free criterion function. Our main contribution is by formulating
analytically (instead of approximating) the inverse of the transpose of
JPEGwavelet transform without involving matrices which are computationally
cumbersome. The algorithm is suitable for the widely-spread situations where
the original data distribution is unobservable such as in cases where there is
deficient representation of the entire population in the training data (in
machine learning) and thus the covariate shift assumption is violated. The
proposed estimator corrects for both biases, the one generated by endogenous
truncation and the one generated by endogenous covariates. Results from
utilizing 2,000,000 different distribution functions verify the applicability
and high accuracy of our procedure to cases in which the disturbances are
neither jointly nor marginally normally distributed.
arXiv link: http://arxiv.org/abs/1908.02166v1
Analysing Global Fixed Income Markets with Tensors
that is, they naturally reside on multi-dimensional data structures referred to
as tensors. In contrast to standard "flat-view" multivariate models that are
agnostic to data structure and only describe linear pairwise relationships, we
introduce a tensor-valued approach to model the global risks shared by multiple
interest rate curves. In this way, the estimated risk factors can be
analytically decomposed into maturity-domain and country-domain constituents,
which allows the investor to devise rigorous and tractable global portfolio
management and hedging strategies tailored to each risk domain. An empirical
analysis confirms the existence of global risk factors shared by eight
developed economies, and demonstrates their ability to compactly describe the
global macroeconomic environment.
arXiv link: http://arxiv.org/abs/1908.02101v4
Discovery of Bias and Strategic Behavior in Crowdsourced Performance Assessment
to flatter management structure, crowdsourced performance assessment gained
mainstream popularity. One fundamental challenge of crowdsourced performance
assessment is the risks that personal interest can introduce distortions of
facts, especially when the system is used to determine merit pay or promotion.
In this paper, we developed a method to identify bias and strategic behavior in
crowdsourced performance assessment, using a rich dataset collected from a
professional service firm in China. We find a pattern of "discriminatory
generosity" on the part of peer evaluation, where raters downgrade their peer
coworkers who have passed objective promotion requirements while overrating
their peer coworkers who have not yet passed. This introduces two types of
biases: the first aimed against more competent competitors, and the other
favoring less eligible peers which can serve as a mask of the first bias. This
paper also aims to bring angles of fairness-aware data mining to talent and
management computing. Historical decision records, such as performance ratings,
often contain subjective judgment which is prone to bias and strategic
behavior. For practitioners of predictive talent analytics, it is important to
investigate potential bias and strategic behavior underlying historical
decision records.
arXiv link: http://arxiv.org/abs/1908.01718v2
Uncertainty in the Hot Hand Fallacy: Detecting Streaky Alternatives to Random Bernoulli Sequences
Bernoulli sequences and their application to analyses of the human tendency to
perceive streaks of consecutive successes as overly representative of positive
dependence - the hot hand fallacy. In particular, we study permutation tests of
the null hypothesis of randomness (i.e., that trials are i.i.d.) based on test
statistics that compare the proportion of successes that directly follow k
consecutive successes with either the overall proportion of successes or the
proportion of successes that directly follow k consecutive failures. We
characterize the asymptotic distributions of these test statistics and their
permutation distributions under randomness, under a set of general stationary
processes, and under a class of Markov chain alternatives, which allow us to
derive their local asymptotic power. The results are applied to evaluate the
empirical support for the hot hand fallacy provided by four controlled
basketball shooting experiments. We establish that substantially larger data
sets are required to derive an informative measurement of the deviation from
randomness in basketball shooting. In one experiment, for which we were able to
obtain data, multiple testing procedures reveal that one shooter exhibits a
shooting pattern significantly inconsistent with randomness - supplying strong
evidence that basketball shooting is not random for all shooters all of the
time. However, we find that the evidence against randomness in this experiment
is limited to this shooter. Our results provide a mathematical and statistical
foundation for the design and validation of experiments that directly compare
deviations from randomness with human beliefs about deviations from randomness,
and thereby constitute a direct test of the hot hand fallacy.
arXiv link: http://arxiv.org/abs/1908.01406v6
Estimating Unobserved Individual Heterogeneity Using Pairwise Comparisons
heterogeneity. Based on model-implied pairwise inequalities, the method
classifies individuals in the sample into groups defined by discrete unobserved
heterogeneity with unknown support. We establish conditions under which the
groups are identified and consistently estimated through our method. We show
that the method performs well in finite samples through Monte Carlo simulation.
We then apply the method to estimate a model of lowest-price procurement
auctions with unobserved bidder heterogeneity, using data from the California
highway procurement market.
arXiv link: http://arxiv.org/abs/1908.01272v3
The Use of Binary Choice Forests to Model and Estimate Discrete Choices
used to capture the choice behavior of customers when offered an assortment of
products. When estimating DCMs using transaction data, flexible models (such as
machine learning models or nonparametric models) are typically not
interpretable and hard to estimate, while tractable models (such as the
multinomial logit model) tend to misspecify the complex behavior represeted in
the data. Methodology/results. In this study, we use a forest of binary
decision trees to represent DCMs. This approach is based on random forests, a
popular machine learning algorithm. The resulting model is interpretable: the
decision trees can explain the decision-making process of customers during the
purchase. We show that our approach can predict the choice probability of any
DCM consistently and thus never suffers from misspecification. Moreover, our
algorithm predicts assortments unseen in the training data. The mechanism and
errors can be theoretically analyzed. We also prove that the random forest can
recover preference rankings of customers thanks to the splitting criterion such
as the Gini index and information gain ratio. Managerial implications. The
framework has unique practical advantages. It can capture customers' behavioral
patterns such as irrationality or sequential searches when purchasing a
product. It handles nonstandard formats of training data that result from
aggregation. It can measure product importance based on how frequently a random
customer would make decisions depending on the presence of the product. It can
also incorporate price information and customer features. Our numerical
experiments using synthetic and real data show that using random forests to
estimate customer choices can outperform existing methods.
arXiv link: http://arxiv.org/abs/1908.01109v6
Heterogeneous Endogenous Effects in Networks
network. Prior works use spatial autoregression models (SARs) which implicitly
assume that each individual in the network has the same peer effects on others.
Mechanically, they conclude the key player in the network to be the one with
the highest centrality. However, when some individuals are more influential
than others, centrality may fail to be a good measure. I develop a model that
allows for individual-specific endogenous effects and propose a two-stage LASSO
procedure to identify influential individuals in a network. Under an assumption
of sparsity: only a subset of individuals (which can increase with sample size
n) is influential, I show that my 2SLSS estimator for individual-specific
endogenous effects is consistent and achieves asymptotic normality. I also
develop robust inference including uniformly valid confidence intervals. These
results also carry through to scenarios where the influential individuals are
not sparse. I extend the analysis to allow for multiple types of connections
(multiple networks), and I show how to use the sparse group LASSO to detect
which of the multiple connection types is more influential. Simulation evidence
shows that my estimator has good finite sample performance. I further apply my
method to the data in Banerjee et al. (2013) and my proposed procedure is able
to identify leaders and effective networks.
arXiv link: http://arxiv.org/abs/1908.00663v1
Testing for Externalities in Network Formation Using Simulation
and Graham (2019): testing for interdependencies in preferences over links
among N (possibly heterogeneous) agents in a network. We describe an exact test
which conditions on a sufficient statistic for the nuisance parameter
characterizing any agent-level heterogeneity. Employing an algorithm due to
Blitzstein and Diaconis (2011), we show how to simulate the null distribution
of the test statistic in order to estimate critical values and/or p-values. We
illustrate our methods using the Nyakatoke risk-sharing network. We find that
the transitivity of the Nyakatoke network far exceeds what can be explained by
degree heterogeneity across households alone.
arXiv link: http://arxiv.org/abs/1908.00099v1
Kernel Density Estimation for Undirected Dyadic Data
random variables (i.e., random variables defined for all
ndef{\equiv}N{2} unordered pairs of agents/nodes in a
weighted network of order N). These random variables satisfy a local dependence
property: any random variables in the network that share one or two indices may
be dependent, while those sharing no indices in common are independent. In this
setting, we show that density functions may be estimated by an application of
the kernel estimation method of Rosenblatt (1956) and Parzen (1962). We suggest
an estimate of their asymptotic variances inspired by a combination of (i)
Newey's (1994) method of variance estimation for kernel estimators in the
"monadic" setting and (ii) a variance estimator for the (estimated) density of
a simple network first suggested by Holland and Leinhardt (1976). More unusual
are the rates of convergence and asymptotic (normal) distributions of our
dyadic density estimates. Specifically, we show that they converge at the same
rate as the (unconditional) dyadic sample mean: the square root of the number,
N, of nodes. This differs from the results for nonparametric estimation of
densities and regression functions for monadic data, which generally have a
slower rate of convergence than their corresponding sample mean.
arXiv link: http://arxiv.org/abs/1907.13630v1
Detecting Identification Failure in Moment Condition Models
condition models. This is achieved by introducing a quasi-Jacobian matrix
computed as the slope of a linear approximation of the moments on an estimate
of the identified set. It is asymptotically singular when local and/or global
identification fails, and equivalent to the usual Jacobian matrix which has
full rank when the model is point and locally identified. Building on this
property, a simple test with chi-squared critical values is introduced to
conduct subvector inferences allowing for strong, semi-strong, and weak
identification without a priori knowledge about the underlying
identification structure. Monte-Carlo simulations and an empirical application
to the Long-Run Risks model illustrate the results.
arXiv link: http://arxiv.org/abs/1907.13093v5
Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions
and machine learning methods in a credit scoring application. In order to do
so, the models' performance is evaluated over four different data sets in
combination with five data sampling strategies to tackle existing class
imbalances in the data. Six different performance measures are used to cover
different aspects of predictive performance. The results indicate a strong
superiority of ensemble methods and show that simple sampling strategies
deliver better results than more sophisticated ones.
arXiv link: http://arxiv.org/abs/1907.12996v1
A Comparison of First-Difference and Forward Orthogonal Deviations GMM
two-step generalized method of moments (GMM) based on the forward orthogonal
deviations transformation is numerically equivalent to two-step GMM based on
the first-difference transformation. The condition also tells us when system
GMM, based on differencing, can be computed using forward orthogonal
deviations. Additionally, it tells us when forward orthogonal deviations and
differencing do not lead to the same GMM estimator. When estimators based on
these two transformations differ, Monte Carlo simulations indicate that
estimators based on forward orthogonal deviations have better finite sample
properties than estimators based on differencing.
arXiv link: http://arxiv.org/abs/1907.12880v1
Robust tests for ARCH in the presence of the misspecified conditional mean: A comparison of nonparametric approches
the presence of the misspecified conditional mean. The approaches employed in
this study are based on two nonparametric regressions for the conditional mean.
First is the ARCH test using Nadayara-Watson kernel regression. Second is the
ARCH test using the polynomial approximation regression. The two approaches do
not require specification of the conditional mean and can adapt to various
nonlinear models, which are unknown a priori. Accordingly, they are robust to
misspecified conditional mean models. Simulation results show that ARCH tests
based on the polynomial approximation regression approach have better
statistical properties than ARCH tests using Nadayara-Watson kernel regression
approach for various nonlinear models.
arXiv link: http://arxiv.org/abs/1907.12752v2
Testing for time-varying properties under misspecified conditional mean and variance
properties under misspecified conditional mean and variance. When we test for
time-varying properties of the conditional mean in the case in which data have
no time-varying mean but have time-varying variance, asymptotic tests have size
distortions. This is improved by the use of a bootstrap method. Similarly, when
we test for time-varying properties of the conditional variance in the case in
which data have time-varying mean but no time-varying variance, asymptotic
tests have large size distortions. This is not improved even by the use of
bootstrap methods. We show that tests for time-varying properties of the
conditional mean by the bootstrap are robust regardless of the time-varying
variance model, whereas tests for time-varying properties of the conditional
variance do not perform well in the presence of misspecified time-varying mean.
arXiv link: http://arxiv.org/abs/1907.12107v2
X-model: further development and possible modifications
Steinert (2016) has neither bin been widely studied nor further developed. And
yet, the possibilities to improve the model are as numerous as the fields it
can be applied to. The present paper takes advantage of a technique proposed by
Coulon et al. (2014) to enhance the X-model. Instead of using the wholesale
supply and demand curves as inputs for the model, we rely on the transformed
versions of these curves with a perfectly inelastic demand. As a result,
computational requirements of our X-model reduce and its forecasting power
increases substantially. Moreover, our X-model becomes more robust towards
outliers present in the initial auction curves data.
arXiv link: http://arxiv.org/abs/1907.09206v1
On the simulation of the Hawkes process via Lambert-W functions
The oldest approach is the inverse sampling transform (ITS) suggested in
ozaki1979maximum, but rapidly abandoned in favor of more efficient
alternatives. This manuscript shows that the ITS approach can be conveniently
discussed in terms of Lambert-W functions. An optimized and efficient
implementation suggests that this approach is computationally more performing
than more recent alternatives available for the simulation of the Hawkes
process.
arXiv link: http://arxiv.org/abs/1907.09162v1
Rebuttal of "On Nonparametric Identification of Treatment Effects in Duration Models"
result (Proposition 3) in Abbring and Van den Berg (2003b) does not hold. We
show that their claim is incorrect. At a certain point within their line of
reasoning, they make a rather basic error while transforming one random
variable into another random variable, and this leads them to draw incorrect
conclusions. As a result, their paper can be discarded.
arXiv link: http://arxiv.org/abs/1907.09886v1
A Vine-copula extension for the HAR model
distribution of the four partial-volatility terms therein involved. Namely,
today's, yesterday's, last week's and last month's volatility components. The
joint distribution relies on a (C-) Vine copula construction, allowing to
conveniently extract volatility forecasts based on the conditional expectation
of today's volatility given its past terms. The proposed empirical application
involves more than seven years of high-frequency transaction prices for ten
stocks and evaluates the in-sample, out-of-sample and one-step-ahead forecast
performance of our model for daily realized-kernel measures. The model proposed
in this paper is shown to outperform the HAR counterpart under different models
for marginal distributions, copula construction methods, and forecasting
settings.
arXiv link: http://arxiv.org/abs/1907.08522v1
Product Aesthetic Design: A Machine Learning Augmentation
industry, an improved aesthetic design can boost sales by 30% or more. Firms
invest heavily in designing and testing aesthetics. A single automotive "theme
clinic" can cost over $100,000, and hundreds are conducted annually. We propose
a model to augment the commonly-used aesthetic design process by predicting
aesthetic scores and automatically generating innovative and appealing product
designs. The model combines a probabilistic variational autoencoder (VAE) with
adversarial components from generative adversarial networks (GAN) and a
supervised learning component. We train and evaluate the model with data from
an automotive partner-images of 203 SUVs evaluated by targeted consumers and
180,000 high-quality unrated images. Our model predicts well the appeal of new
aesthetic designs-43.5% improvement relative to a uniform baseline and
substantial improvement over conventional machine learning models and
pretrained deep neural networks. New automotive designs are generated in a
controllable manner for use by design teams. We empirically verify that
automatically generated designs are (1) appealing to consumers and (2) resemble
designs which were introduced to the market five years after our data were
collected. We provide an additional proof-of-concept application using
opensource images of dining room chairs.
arXiv link: http://arxiv.org/abs/1907.07786v2
Testing for Unobserved Heterogeneity via k-means Clustering
applications. This paper proposes a formal testing procedure to determine
whether a null hypothesis of a single cluster, indicating homogeneity of the
data, can be rejected in favor of multiple clusters. The test is simple to
implement, valid under relatively mild conditions (including non-normality, and
heterogeneity of the data in aspects beyond those in the clustering analysis),
and applicable in a range of contexts (including clustering when the time
series dimension is small, or clustering on parameters other than the mean). We
verify that the test has good size control in finite samples, and we illustrate
the test in applications to clustering vehicle manufacturers and U.S. mutual
funds.
arXiv link: http://arxiv.org/abs/1907.07582v1
Testing for Quantile Sample Selection
conditional quantile functions. The first test is an omitted predictor test
with the propensity score as the omitted variable. As with any omnibus test, in
the case of rejection we cannot distinguish between rejection due to genuine
selection or to misspecification. Thus, we suggest a second test to provide
supporting evidence whether the cause for rejection at the first stage was
solely due to selection or not. Using only individuals with propensity score
close to one, this second test relies on an `identification at infinity'
argument, but accommodates cases of irregular identification. Importantly,
neither of the two tests requires parametric assumptions on the selection
equation nor a continuous exclusion restriction. Data-driven bandwidth
procedures are proposed, and Monte Carlo evidence suggests a good finite sample
performance in particular of the first test. Finally, we also derive an
extension of the first test to nonparametric conditional mean functions, and
apply our procedure to test for selection in log hourly wages using UK Family
Expenditure Survey data as AB2017.
arXiv link: http://arxiv.org/abs/1907.07412v5
On the inconsistency of matching without replacement
produces estimators that generally are inconsistent for the average treatment
effect of the treated. To achieve consistency, practitioners must either assume
that no units exist with propensity scores greater than one-half or assume that
there is no confounding among such units. The result is not driven by the use
of propensity scores, and similar artifacts arise when matching on other scores
as long as it is without replacement.
arXiv link: http://arxiv.org/abs/1907.07288v2
Shrinkage in the Time-Varying Parameter Model Framework Using the R Package shrinkTVP
to flexibly deal with processes which gradually change over time. However, the
risk of overfitting in TVP models is well known. This issue can be dealt with
using appropriate global-local shrinkage priors, which pull time-varying
parameters towards static ones. In this paper, we introduce the R package
shrinkTVP (Knaus, Bitto-Nemling, Cadonna, and Fr\"uhwirth-Schnatter 2019),
which provides a fully Bayesian implementation of shrinkage priors for TVP
models, taking advantage of recent developments in the literature, in
particular that of Bitto and Fr\"uhwirth-Schnatter (2019). The package
shrinkTVP allows for posterior simulation of the parameters through an
efficient Markov Chain Monte Carlo (MCMC) scheme. Moreover, summary and
visualization methods, as well as the possibility of assessing predictive
performance through log predictive density scores (LPDSs), are provided. The
computationally intensive tasks have been implemented in C++ and interfaced
with R. The paper includes a brief overview of the models and shrinkage priors
implemented in the package. Furthermore, core functionalities are illustrated,
both with simulated and real data.
arXiv link: http://arxiv.org/abs/1907.07065v3
Information processing constraints in travel behaviour modelling: A generative learning approach
processing constraints. These behavioural conditions can be characterized by a
generative learning process. We propose a data-driven generative model version
of rational inattention theory to emulate these behavioural representations. We
outline the methodology of the generative model and the associated learning
process as well as provide an intuitive explanation of how this process
captures the value of prior information in the choice utility specification. We
demonstrate the effects of information heterogeneity on a travel choice,
analyze the econometric interpretation, and explore the properties of our
generative model. Our findings indicate a strong correlation with rational
inattention behaviour theory, which suggest that individuals may ignore certain
exogenous variables and rely on prior information for evaluating decisions
under uncertainty. Finally, the principles demonstrated in this study can be
formulated as a generalized entropy and utility based multinomial logit model.
arXiv link: http://arxiv.org/abs/1907.07036v2
Audits as Evidence: Experiments, Ensembles, and Enforcement
discrimination by individual employers. Employers violate US employment law if
their propensity to contact applicants depends on protected characteristics
such as race or sex. We establish identification of higher moments of the
causal effects of protected characteristics on callback rates as a function of
the number of fictitious applications sent to each job ad. These moments are
used to bound the fraction of jobs that illegally discriminate. Applying our
results to three experimental datasets, we find evidence of significant
employer heterogeneity in discriminatory behavior, with the standard deviation
of gaps in job-specific callback probabilities across protected groups
averaging roughly twice the mean gap. In a recent experiment manipulating
racially distinctive names, we estimate that at least 85% of jobs that contact
both of two white applications and neither of two black applications are
engaged in illegal discrimination. To assess the tradeoff between type I and II
errors presented by these patterns, we consider the performance of a series of
decision rules for investigating suspicious callback behavior under a simple
two-type model that rationalizes the experimental data. Though, in our
preferred specification, only 17% of employers are estimated to discriminate on
the basis of race, we find that an experiment sending 10 applications to each
job would enable accurate detection of 7-10% of discriminators while falsely
accusing fewer than 0.2% of non-discriminators. A minimax decision rule
acknowledging partial identification of the joint distribution of callback
rates yields higher error rates but more investigations than our baseline
two-type model. Our results suggest illegal labor market discrimination can be
reliably monitored with relatively small modifications to existing audit
designs.
arXiv link: http://arxiv.org/abs/1907.06622v2
Simple Adaptive Size-Exact Testing for Full-Vector and Subvector Inference in Moment Inequality Models
normal models with known variance and has uniformly asymptotically exact size
more generally. The test compares the quasi-likelihood ratio statistic to a
chi-squared critical value, where the degree of freedom is the rank of the
inequalities that are active in finite samples. The test requires no simulation
and thus is computationally fast and especially suitable for constructing
confidence sets for parameters by test inversion. It uses no tuning parameter
for moment selection and yet still adapts to the slackness of the moment
inequalities. Furthermore, we show how the test can be easily adapted for
inference on subvectors for the common empirical setting of conditional moment
inequalities with nuisance parameters entering linearly.
arXiv link: http://arxiv.org/abs/1907.06317v2
On the Evolution of U.S. Temperature Dynamics
particular, have important implications for urbanization, agriculture, health,
productivity, and poverty, among other things. While much research has
documented rising mean temperature levels, we also examine range-based
measures of daily temperature volatility. Specifically, using data for
select U.S. cities over the past half-century, we compare the evolving time
series dynamics of the average temperature level, AVG, and the diurnal
temperature range, DTR (the difference between the daily maximum and minimum
temperatures). We characterize trend and seasonality in these two series using
linear models with time-varying coefficients. These straightforward yet
flexible approximations provide evidence of evolving DTR seasonality and stable
AVG seasonality.
arXiv link: http://arxiv.org/abs/1907.06303v3
On the residues vectors of a rational class of complex functions. Application to autoregressive processes
their characteristics it is of extensive interest to other sciences. This work
begins with a particular class of rational functions of a complex variable;
over this is deduced two elementals properties concerning the residues and is
proposed one results which establishes one lower bound for the p-norm of the
residues vector. Applications to the autoregressive processes are presented and
the exemplifications are indicated in historical data of electric generation
and econometric series.
arXiv link: http://arxiv.org/abs/1907.05949v1
Identification and Estimation of Discrete Choice Models with Unobserved Choice Sets
discrete choice models with unobserved choice sets. We recover the joint
distribution of choice sets and preferences from a panel dataset on choices. We
assume that either the latent choice sets are sparse or that the panel is
sufficiently long. Sparsity requires the number of possible choice sets to be
relatively small. It is satisfied, for instance, when the choice sets are
nested, or when they form a partition. Our estimation procedure is
computationally fast and uses mixed-integer optimization to recover the sparse
support of choice sets. Analyzing the ready-to-eat cereal industry using a
household scanner dataset, we find that ignoring the unobservability of choice
sets can lead to biased estimates of preferences due to significant latent
heterogeneity in choice sets.
arXiv link: http://arxiv.org/abs/1907.04853v3
Adaptive inference for a semiparametric generalized autoregressive conditional heteroskedasticity model
heteroskedasticity (S-GARCH) model. For this model, we first estimate the
time-varying long run component for unconditional variance by the kernel
estimator, and then estimate the non-time-varying parameters in GARCH-type
short run component by the quasi maximum likelihood estimator (QMLE). We show
that the QMLE is asymptotically normal with the parametric convergence rate.
Next, we construct a Lagrange multiplier test for linear parameter constraint
and a portmanteau test for model checking, and obtain their asymptotic null
distributions. Our entire statistical inference procedure works for the
non-stationary data with two important features: first, our QMLE and two tests
are adaptive to the unknown form of the long run component; second, our QMLE
and two tests share the same efficiency and testing power as those in variance
targeting method when the S-GARCH model is stationary.
arXiv link: http://arxiv.org/abs/1907.04147v4
Competing Models
have different models: they predict using different explanatory variables. We
study which agent believes they have the best predictive ability -- as measured
by the smallest subjective posterior mean squared prediction error -- and show
how it depends on the sample size. With small samples, we present results
suggesting it is an agent using a low-dimensional model. With large samples, it
is generally an agent with a high-dimensional model, possibly including
irrelevant variables, but never excluding relevant ones. We apply our results
to characterize the winning model in an auction of productive assets, to argue
that entrepreneurs and investors with simple models will be over-represented in
new sectors, and to understand the proliferation of "factors" that explain the
cross-sectional variation of expected stock returns in the asset-pricing
literature.
arXiv link: http://arxiv.org/abs/1907.03809v5
Artificial Intelligence Alter Egos: Who benefits from Robo-investing?
daily lives. Financial decision-making is no exception to this. We introduce
the notion of AI Alter Egos, which are shadow robo-investors, and use a unique
data set covering brokerage accounts for a large cross-section of investors
over a sample from January 2003 to March 2012, which includes the 2008
financial crisis, to assess the benefits of robo-investing. We have detailed
investor characteristics and records of all trades. Our data set consists of
investors typically targeted for robo-advising. We explore robo-investing
strategies commonly used in the industry, including some involving advanced
machine learning methods. The man versus machine comparison allows us to shed
light on potential benefits the emerging robo-advising industry may provide to
certain segments of the population, such as low income and/or high risk averse
investors.
arXiv link: http://arxiv.org/abs/1907.03370v1
Random Forest Estimation of the Ordered Choice Model
models based on the random forest. The proposed Ordered Forest flexibly
estimates the conditional choice probabilities while taking the ordering
information explicitly into account. In addition to common machine learning
estimators, it enables the estimation of marginal effects as well as conducting
inference and thus provides the same output as classical econometric
estimators. An extensive simulation study reveals a good predictive
performance, particularly in settings with non-linearities and
near-multicollinearity. An empirical application contrasts the estimation of
marginal effects and their standard errors with an ordered logit model. A
software implementation of the Ordered Forest is provided both in R and Python
in the package orf available on CRAN and PyPI, respectively.
arXiv link: http://arxiv.org/abs/1907.02436v3
Heterogeneous Choice Sets and Preferences
sets are unobserved. Our core model assumes nothing about agents' choice sets
apart from their minimum size. Importantly, it leaves unrestricted the
dependence, conditional on observables, between choice sets and preferences. We
first characterize the sharp identification region of the model's parameters by
a finite set of conditional moment inequalities. We then apply our theoretical
findings to learn about households' risk preferences and choice sets from data
on their deductible choices in auto collision insurance. We find that the data
can be explained by expected utility theory with low levels of risk aversion
and heterogeneous non-singleton choice sets, and that more than three in four
households require limited choice sets to explain their deductible choices. We
also provide simulation evidence on the computational tractability of our
method in applications with larger feasible sets or higher-dimensional
unobserved heterogeneity.
arXiv link: http://arxiv.org/abs/1907.02337v2
Optimal transport on large networks, a practitioner's guide
problem in a large geographic market and gives examples of applications. In our
settings, the market is described by a network that maps the cost of travel
between each pair of adjacent locations. Two types of agents are located at the
nodes of this network. The buyers choose the most competitive sellers depending
on their prices and the cost to reach them. Their utility is assumed additive
in both these quantities. Each seller, taking as given other sellers prices,
sets her own price to have a demand equal to the one we observed. We give a
linear programming formulation for the equilibrium conditions. After formally
introducing our model we apply it on two examples: prices offered by petrol
stations and quality of services provided by maternity wards. These examples
illustrate the applicability of our model to aggregate demand, rank prices and
estimate cost structure over the network. We insist on the possibility of
applications to large scale data sets using modern linear programming solvers
such as Gurobi. In addition to this paper we released a R toolbox to implement
our results and an online tutorial (http://optimalnetwork.github.io)
arXiv link: http://arxiv.org/abs/1907.02320v2
Heterogeneous Regression Models for Clusters of Spatial Dependent Data
characteristics, and economic models on such regions tend to have similar
covariate effects. In this paper, we propose a Bayesian clustered regression
for spatially dependent data in order to detect clusters in the covariate
effects. Our proposed method is based on the Dirichlet process which provides a
probabilistic framework for simultaneous inference of the number of clusters
and the clustering configurations. The usage of our method is illustrated both
in simulation studies and an application to a housing cost dataset of Georgia.
arXiv link: http://arxiv.org/abs/1907.02212v4
The Informativeness of Estimation Moments
precision of parameter estimates in GMM settings. For example, one of the
measures asks what would happen to the variance of the parameter estimates if a
particular moment was dropped from the estimation. The measures are all easy to
compute. We illustrate the usefulness of the measures through two simple
examples as well as an application to a model of joint retirement planning of
couples. We estimate the model using the UK-BHPS, and we find evidence of
complementarities in leisure. Our sensitivity measures illustrate that the
estimate of the complementarity is primarily informed by the distribution of
differences in planned retirement dates. The estimated econometric model can be
interpreted as a bivariate ordered choice model that allows for simultaneity.
This makes the model potentially useful in other applications.
arXiv link: http://arxiv.org/abs/1907.02101v2
An Econometric Perspective on Algorithmic Subsampling
bottlenecks often frustrate a complete analysis of the data. While more data
are better than less, diminishing returns suggest that we may not need
terabytes of data to estimate a parameter or test a hypothesis. But which rows
of data should we analyze, and might an arbitrary subset of rows preserve the
features of the original data? This paper reviews a line of work that is
grounded in theoretical computer science and numerical linear algebra, and
which finds that an algorithmically desirable sketch, which is a randomly
chosen subset of the data, must preserve the eigenstructure of the data, a
property known as a subspace embedding. Building on this work, we study how
prediction and inference can be affected by data sketching within a linear
regression setup. We show that the sketching error is small compared to the
sample size effect which a researcher can control. As a sketch size that is
algorithmically optimal may not be suitable for prediction and inference, we
use statistical arguments to provide 'inference conscious' guides to the sketch
size. When appropriately implemented, an estimator that pools over different
sketches can be nearly as efficient as the infeasible one using the full
sample.
arXiv link: http://arxiv.org/abs/1907.01954v4
Adaptive Pricing in Insurance: Generalized Linear Models and Gaussian Process Regression Approaches
online revenue management problem where the insurance company looks to set
prices to optimize the long-run revenue from selling a new insurance product.
We develop two pricing models: an adaptive Generalized Linear Model (GLM) and
an adaptive Gaussian Process (GP) regression model. Both balance between
exploration, where we choose prices in order to learn the distribution of
demands & claims for the insurance product, and exploitation, where we
myopically choose the best price from the information gathered so far. The
performance of the pricing policies is measured in terms of regret: the
expected revenue loss caused by not using the optimal price. As is commonplace
in insurance, we model demand and claims by GLMs. In our adaptive GLM design,
we use the maximum quasi-likelihood estimation (MQLE) to estimate the unknown
parameters. We show that, if prices are chosen with suitably decreasing
variability, the MQLE parameters eventually exist and converge to the correct
values, which in turn implies that the sequence of chosen prices will also
converge to the optimal price. In the adaptive GP regression model, we sample
demand and claims from Gaussian Processes and then choose selling prices by the
upper confidence bound rule. We also analyze these GLM and GP pricing
algorithms with delayed claims. Although similar results exist in other
domains, this is among the first works to consider dynamic pricing problems in
the field of insurance. We also believe this is the first work to consider
Gaussian Process regression in the context of insurance pricing. These initial
findings suggest that online machine learning algorithms could be a fruitful
area of future investigation and application in insurance.
arXiv link: http://arxiv.org/abs/1907.05381v1
Large Volatility Matrix Prediction with High-Frequency Data
high-frequency data by applying eigen-decomposition to daily realized
volatility matrix estimators and capturing eigenvalue dynamics with ARMA
models. Given a sequence of daily volatility matrix estimators, we compute the
aggregated eigenvectors and obtain the corresponding eigenvalues. Eigenvalues
in the same relative magnitude form a time series and the ARMA models are
further employed to model the dynamics within each eigenvalue time series to
produce a predictor. We predict future large volatility matrix based on the
predicted eigenvalues and the aggregated eigenvectors, and demonstrate the
advantages of the proposed method in volatility prediction and portfolio
allocation problems.
arXiv link: http://arxiv.org/abs/1907.01196v2
Simulation smoothing for nowcasting with large mixed-frequency VARs
(VAR) models. VARs are popular tools for macroeconomic forecasting and use of
larger models has been demonstrated to often improve the forecasting ability
compared to more traditional small-scale models. Mixed-frequency VARs deal with
data sampled at different frequencies while remaining within the realms of
VARs. Estimation of mixed-frequency VARs makes use of simulation smoothing, but
using the standard procedure these models quickly become prohibitive in
nowcasting situations as the size of the model grows. We propose two algorithms
that alleviate the computational efficiency of the simulation smoothing
algorithm. Our preferred choice is an adaptive algorithm, which augments the
state vector as necessary to sample also monthly variables that are missing at
the end of the sample. For large VARs, we find considerable improvements in
speed using our adaptive algorithm. The algorithm therefore provides a crucial
building block for bringing the mixed-frequency VARs to the high-dimensional
regime.
arXiv link: http://arxiv.org/abs/1907.01075v1
Permutation inference with a finite number of heterogeneous clusters
hypotheses about the effect of a binary treatment in the presence of a finite
number of large, heterogeneous clusters when the treatment effect is identified
by comparisons across clusters. The procedure asymptotically controls size by
applying a level-adjusted permutation test to a suitable statistic. The
adjustments needed for most empirically relevant situations are tabulated in
the paper. The adjusted permutation test is easy to implement in practice and
performs well at conventional levels of significance with at least four treated
clusters and a similar number of control clusters. It is particularly robust to
situations where some clusters are much more variable than others. Examples and
an empirical application are provided.
arXiv link: http://arxiv.org/abs/1907.01049v2
Bounding Causes of Effects with Mediators
knowledge of the distribution of Y, given application of X. From this we know
the average causal effect of X on Y. We are now interested in assessing, for a
case that was exposed and exhibited a positive outcome, whether it was the
exposure that caused the outcome. The relevant "probability of causation", PC,
typically is not identified by the distribution of Y given X, but bounds can be
placed on it, and these bounds can be improved if we have further information
about the causal process. Here we consider cases where we know the
probabilistic structure for a sequence of complete mediators between X and Y.
We derive a general formula for calculating bounds on PC for any pattern of
data on the mediators (including the case with no data). We show that the
largest and smallest upper and lower bounds that can result from any complete
mediation process can be obtained in processes with at most two steps. We also
consider homogeneous processes with many mediators. PC can sometimes be
identified as 0 with negative data, but it cannot be identified at 1 even with
positive data on an infinite set of mediators. The results have implications
for learning about causation from knowledge of general processes and of data on
cases.
arXiv link: http://arxiv.org/abs/1907.00399v1
Relaxing the Exclusion Restriction in Shift-Share Instrumental Variable Estimation
Often, all shares need to fulfil an exclusion restriction, making the
identifying assumption strict. This paper proposes to use methods that relax
the exclusion restriction by selecting invalid shares. I apply the methods in
two empirical examples: the effect of immigration on wages and of Chinese
import exposure on employment. In the first application, the coefficient
becomes lower and often changes sign, but this is reconcilable with arguments
made in the literature. In the second application, the findings are mostly
robust to the use of the new methods.
arXiv link: http://arxiv.org/abs/1907.00222v4
Dealing with Stochastic Volatility in Time Series Using the R Package stochvol
heteroskedasticity modeling within the framework of stochastic volatility. It
utilizes Markov chain Monte Carlo (MCMC) samplers to conduct inference by
obtaining draws from the posterior distribution of parameters and latent
variables which can then be used for predicting future volatilities. The
package can straightforwardly be employed as a stand-alone tool; moreover, it
allows for easy incorporation into other MCMC samplers. The main focus of this
paper is to show the functionality of stochvol. In addition, it provides a
brief mathematical description of the model, an overview of the sampling
schemes used, and several illustrative examples using exchange rate data.
arXiv link: http://arxiv.org/abs/1906.12134v1
Modeling Univariate and Multivariate Stochastic Volatility in R with stochvol and factorstochvol
increasing popularity for fitting and predicting heteroskedastic time series.
However, due to the large number of latent quantities, their efficient
estimation is non-trivial and software that allows to easily fit SV models to
data is rare. We aim to alleviate this issue by presenting novel
implementations of four SV models delivered in two R packages. Several unique
features are included and documented. As opposed to previous versions, stochvol
is now capable of handling linear mean models, heavy-tailed SV, and SV with
leverage. Moreover, we newly introduce factorstochvol which caters for
multivariate SV. Both packages offer a user-friendly interface through the
conventional R generics and a range of tailor-made methods. Computational
efficiency is achieved via interfacing R to C++ and doing the heavy work in the
latter. In the paper at hand, we provide a detailed discussion on Bayesian SV
estimation and showcase the use of the new software through various examples.
arXiv link: http://arxiv.org/abs/1906.12123v3
Estimation of the size of informal employment based on administrative records with non-ignorable selection mechanism
Labour Inspectorate and The Polish Social Insurance Institution in order to
estimate the prevalence of informal employment in Poland. Since the selection
mechanism is non-ignorable we employed a generalization of Heckman's sample
selection model assuming non-Gaussian correlation of errors and clustering by
incorporation of random effects. We found that 5.7% (4.6%, 7.1%; 95% CI) of
registered enterprises in Poland, to some extent, take advantage of the
informal labour force. Our study exemplifies a new approach to measuring
informal employment, which can be implemented in other countries. It also
contributes to the existing literature by providing, to the best of our
knowledge, the first estimates of informal employment at the level of companies
based solely on administrative data.
arXiv link: http://arxiv.org/abs/1906.10957v1
Understanding the explosive trend in EU ETS prices -- fundamentals or speculation?
experienced a run-up from persistently low levels in previous years. Regulators
attribute this to a comprehensive reform in the same year, and are confident
the new price level reflects an anticipated tighter supply of allowances. We
ask if this is indeed the case, or if it is an overreaction of the market
driven by speculation. We combine several econometric methods - time-varying
coefficient regression, formal bubble detection as well as time stamping and
crash odds prediction - to juxtapose the regulators' claim versus the
concurrent explanation. We find evidence of a long period of explosive
behaviour in allowance prices, starting in March 2018 when the reform was
adopted. Our results suggest that the reform triggered market participants into
speculation, and question regulators' confidence in its long-term outcome. This
has implications for both the further development of the EU ETS, and the long
lasting debate about taxes versus emission trading schemes.
arXiv link: http://arxiv.org/abs/1906.10572v5
Forecasting the Remittances of the Overseas Filipino Workers in the Philippines
remittance in the Philippines. Forecasts of OFW's remittance for the years 2018
and 2019 will be generated using the appropriate time series model. The data
were retrieved from the official website of Bangko Sentral ng Pilipinas. There
are 108 observations, 96 of which were used in model building and the remaining
12 observations were used in forecast evaluation. ACF and PACF were used to
examine the stationarity of the series. Augmented Dickey Fuller test was used
to confirm the stationarity of the series. The data was found to have a
seasonal component, thus, seasonality has been considered in the final model
which is SARIMA (2,1,0)x(0,0,2)_12. There are no significant spikes in the ACF
and PACF of residuals of the final model and the L-jung Box Q* test confirms
further that the residuals of the model are uncorrelated. Also, based on the
result of the Shapiro-Wilk test for the forecast errors, the forecast errors
can be considered a Gaussian white noise. Considering the results of diagnostic
checking and forecast evaluation, SARIMA (2,1,0)x(0,0,2)_12 is an appropriate
model for the series. All necessary computations were done using the R
statistical software.
arXiv link: http://arxiv.org/abs/1906.10422v1
Policy Targeting under Network Interference
presence of spillover effects, using information from a (quasi-)experiment. I
introduce a method that maximizes the sample analog of average social welfare
when spillovers occur. I construct semi-parametric welfare estimators with
known and unknown propensity scores and cast the optimization problem into a
mixed-integer linear program, which can be solved using off-the-shelf
algorithms. I derive a strong set of guarantees on regret, i.e., the difference
between the maximum attainable welfare and the welfare evaluated at the
estimated policy. The proposed method presents attractive features for
applications: (i) it does not require network information of the target
population; (ii) it exploits heterogeneity in treatment effects for targeting
individuals; (iii) it does not rely on the correct specification of a
particular structural model; and (iv) it accommodates constraints on the policy
function. An application for targeting information on social networks
illustrates the advantages of the method.
arXiv link: http://arxiv.org/abs/1906.10258v14
Empirical Process Results for Exchangeable Arrays
between units of a sample. Jointly exchangeable arrays are well suited to
dyadic data, where observed random variables are indexed by two units from the
same population. Examples include trade flows between countries or
relationships in a network. Separately exchangeable arrays are well suited to
multiway clustering, where units sharing the same cluster (e.g. geographical
areas or sectors of activity when considering individual wages) may be
dependent in an unrestricted way. We prove uniform laws of large numbers and
central limit theorems for such exchangeable arrays. We obtain these results
under the same moment restrictions and conditions on the class of functions as
those typically assumed with i.i.d. data. We also show the convergence of
bootstrap processes adapted to such arrays.
arXiv link: http://arxiv.org/abs/1906.11293v4
Semi-parametric Realized Nonlinear Conditional Autoregressive Expectile and Expected Shortfall
is proposed. The framework is extended through incorporating a measurement
equation which models the contemporaneous dependence between the realized
measures and the latent conditional expectile. Nonlinear threshold
specification is further incorporated into the proposed framework. A Bayesian
Markov Chain Monte Carlo method is adapted for estimation, whose properties are
assessed and compared with maximum likelihood via a simulation study.
One-day-ahead VaR and ES forecasting studies, with seven market indices,
provide empirical support to the proposed models.
arXiv link: http://arxiv.org/abs/1906.09961v1
On the probability of a causal inference is robust for internal validity
this study, we define the counterfactuals as the unobserved sample and intend
to quantify its relationship with the null hypothesis statistical testing
(NHST). We propose the probability of a causal inference is robust for internal
validity, i.e., the PIV, as a robustness index of causal inference. Formally,
the PIV is the probability of rejecting the null hypothesis again based on both
the observed sample and the counterfactuals, provided the same null hypothesis
has already been rejected based on the observed sample. Under either
frequentist or Bayesian framework, one can bound the PIV of an inference based
on his bounded belief about the counterfactuals, which is often needed when the
unconfoundedness assumption is dubious. The PIV is equivalent to statistical
power when the NHST is thought to be based on both the observed sample and the
counterfactuals. We summarize the process of evaluating internal validity with
the PIV into an eight-step procedure and illustrate it with an empirical
example (i.e., Hong and Raudenbush (2005)).
arXiv link: http://arxiv.org/abs/1906.08726v1
From Local to Global: External Validity in a Fertility Natural Experiment
100 replications of the Angrist and Evans (1998) natural experiment on the
effects of sibling sex composition on fertility and labor supply. The
replications are based on census data from around the world going back to 1960.
We decompose sources of error in predicting treatment effects in external
contexts in terms of macro and micro sources of variation. In our empirical
setting, we find that macro covariates dominate over micro covariates for
reducing errors in predicting treatments, an issue that past studies of
external validity have been unable to evaluate. We develop methods for two
applications to evidence-based decision-making, including determining where to
locate an experiment and whether policy-makers should commission new
experiments or rely on an existing evidence base for making a policy decision.
arXiv link: http://arxiv.org/abs/1906.08096v1
Sparse structures with LASSO through Principal Components: forecasting GDP components in the short-run
the chain-linked volume sense, expenditure components of the US and EU GDP in
the short-run sooner than the national institutions of statistics officially
release the data. We estimate current quarter nowcasts along with 1- and
2-quarter forecasts by bridging quarterly data with available monthly
information announced with a much smaller delay. We solve the
high-dimensionality problem of the monthly dataset by assuming sparse
structures of leading indicators, capable of adequately explaining the dynamics
of analyzed data. For variable selection and estimation of the forecasts, we
use the sparse methods - LASSO together with its recent modifications. We
propose an adjustment that combines LASSO cases with principal components
analysis that deemed to improve the forecasting performance. We evaluate
forecasting performance conducting pseudo-real-time experiments for gross fixed
capital formation, private consumption, imports and exports over the sample of
2005-2019, compared with benchmark ARMA and factor models. The main results
suggest that sparse methods can outperform the benchmarks and to identify
reasonable subsets of explanatory variables. The proposed LASSO-PC modification
show further improvement in forecast accuracy.
arXiv link: http://arxiv.org/abs/1906.07992v2
Signatures of crypto-currency market decoupling from the Forex
and professional trading platform that aims to bring Bitcoin and other
cryptocurrencies into the mainstream, the multiscale cross-correlations
involving the Bitcoin (BTC), Ethereum (ETH), Euro (EUR) and US dollar (USD) are
studied over the period between July 1, 2016 and December 31, 2018. It is shown
that the multiscaling characteristics of the exchange rate fluctuations related
to the cryptocurrency market approach those of the Forex. This, in particular,
applies to the BTC/ETH exchange rate, whose Hurst exponent by the end of 2018
started approaching the value of 0.5, which is characteristic of the mature
world markets. Furthermore, the BTC/ETH direct exchange rate has already
developed multifractality, which manifests itself via broad singularity
spectra. A particularly significant result is that the measures applied for
detecting cross-correlations between the dynamics of the BTC/ETH and EUR/USD
exchange rates do not show any noticeable relationships. This may be taken as
an indication that the cryptocurrency market has begun decoupling itself from
the Forex.
arXiv link: http://arxiv.org/abs/1906.07834v2
Nonparametric estimation in a regression model with additive and multiplicative noise
general nonparametric regression model with the feature of having both
multiplicative and additive noise.We propose two new wavelet estimators in this
general context. We prove that they achieve fast convergence rates under the
mean integrated square error over Besov spaces. The obtained rates have the
particularity of being established under weak conditions on the model. A
numerical study in a context comparable to stochastic frontier estimation (with
the difference that the boundary is not necessarily a production function)
supports the theory.
arXiv link: http://arxiv.org/abs/1906.07695v2
Shape Matters: Evidence from Machine Learning on Body Shape-Income Relationship
a novel data which has 3-dimensional body scans to mitigate the issue of
reporting errors and measurement errors observed in most previous studies. We
apply machine learning to obtain intrinsic features consisting of human body
and take into account a possible issue of endogenous body shapes. The
estimation results show that there is a significant relationship between
physical appearance and family income and the associations are different across
the gender. This supports the hypothesis on the physical attractiveness premium
and its heterogeneity across the gender.
arXiv link: http://arxiv.org/abs/1906.06747v1
Detecting p-hacking
distributions of $p$-values across multiple studies. We provide general results
for when such distributions have testable restrictions (are non-increasing)
under the null of no $p$-hacking. We find novel additional testable
restrictions for $p$-values based on $t$-tests. Specifically, the shape of the
power functions results in both complete monotonicity as well as bounds on the
distribution of $p$-values. These testable restrictions result in more powerful
tests for the null hypothesis of no $p$-hacking. When there is also publication
bias, our tests are joint tests for $p$-hacking and publication bias. A
reanalysis of two prominent datasets shows the usefulness of our new tests.
arXiv link: http://arxiv.org/abs/1906.06711v5
On the Properties of the Synthetic Control Estimator with Many Periods and Many Controls
when both the number of pre-treatment periods and control units are large. If
potential outcomes follow a linear factor model, we provide conditions under
which the factor loadings of the SC unit converge in probability to the factor
loadings of the treated unit. This happens when there are weights diluted among
an increasing number of control units such that a weighted average of the
factor loadings of the control units asymptotically reconstructs the factor
loadings of the treated unit. In this case, the SC estimator is asymptotically
unbiased even when treatment assignment is correlated with time-varying
unobservables. This result can be valid even when the number of control units
is larger than the number of pre-treatment periods.
arXiv link: http://arxiv.org/abs/1906.06665v5
lpdensity: Local Polynomial Density Estimation and Inference
When the underlying distribution has compact support, conventional kernel-based
density estimators are no longer consistent near or at the boundary because of
their well-known boundary bias. Alternative smoothing methods are available to
handle boundary points in density estimation, but they all require additional
tuning parameter choices or other typically ad hoc modifications depending on
the evaluation point and/or approach considered. This article discusses the R
and Stata package lpdensity implementing a novel local polynomial density
estimator proposed and studied in Cattaneo, Jansson, and Ma (2020, 2021), which
is boundary adaptive and involves only one tuning parameter. The methods
implemented also cover local polynomial estimation of the cumulative
distribution function and density derivatives. In addition to point estimation
and graphical procedures, the package offers consistent variance estimators,
mean squared error optimal bandwidth selection, robust bias-corrected
inference, and confidence bands construction, among other features. A
comparison with other density estimation packages available in R using a Monte
Carlo experiment is provided.
arXiv link: http://arxiv.org/abs/1906.06529v3
Proxy expenditure weights for Consumer Price Index: Audit sampling inference for big data statistics
expenditure on items that are the most troublesome to collect in the
traditional expenditure survey. Due to the sheer amount of proxy data, the bias
due to coverage and selection errors completely dominates the variance. We
develop tests for bias based on audit sampling, which makes use of available
survey data that cannot be linked to the proxy data source at the individual
level. However, audit sampling fails to yield a meaningful mean squared error
estimate, because the sampling variance is too large compared to the bias of
the big data estimate. We propose a novel accuracy measure that is applicable
in such situations. This can provide a necessary part of the statistical
argument for the uptake of big data source, in replacement of traditional
survey sampling. An application to disaggregated food price index is used to
demonstrate the proposed approach.
arXiv link: http://arxiv.org/abs/1906.11208v1
Posterior Average Effects
distributions of unobservables, such as moments of individual fixed-effects, or
average partial effects in discrete choice models. For such quantities, we
propose and study posterior average effects (PAE), where the average is
computed conditional on the sample, in the spirit of empirical Bayes and
shrinkage methods. While the usefulness of shrinkage for prediction is
well-understood, a justification of posterior conditioning to estimate
population averages is currently lacking. We show that PAE have minimum
worst-case specification error under various forms of misspecification of the
parametric distribution of unobservables. In addition, we introduce a measure
of informativeness of the posterior conditioning, which quantifies the
worst-case specification error of PAE relative to parametric model-based
estimators. As illustrations, we report PAE estimates of distributions of
neighborhood effects in the US, and of permanent and transitory components in a
model of income dynamics.
arXiv link: http://arxiv.org/abs/1906.06360v6
Sparse Approximate Factor Estimation for High-Dimensional Covariance Matrices
$l_1$-regularized approximate factor model. Our sparse approximate factor (SAF)
covariance estimator allows for the existence of weak factors and hence relaxes
the pervasiveness assumption generally adopted for the standard approximate
factor model. We prove consistency of the covariance matrix estimator under the
Frobenius norm as well as the consistency of the factor loadings and the
factors.
Our Monte Carlo simulations reveal that the SAF covariance estimator has
superior properties in finite samples for low and high dimensions and different
designs of the covariance matrix. Moreover, in an out-of-sample portfolio
forecasting application the estimator uniformly outperforms alternative
portfolio strategies based on alternative covariance estimation approaches and
modeling strategies including the $1/N$-strategy.
arXiv link: http://arxiv.org/abs/1906.05545v1
Nonparametric Identification and Estimation with Independent, Discrete Instruments
conventional moment independence assumption towards full statistical
independence between instrument and error term. This allows us to prove
identification results and develop estimators for a structural function of
interest when the instrument is discrete, and in particular binary. When the
regressor of interest is also discrete with more mass points than the
instrument, we state straightforward conditions under which the structural
function is partially identified, and give modified assumptions which imply
point identification. These stronger assumptions are shown to hold outside of a
small set of conditional moments of the error term. Estimators for the
identified set are given when the structural function is either partially or
point identified. When the regressor is continuously distributed, we prove that
if the instrument induces a sufficiently rich variation in the joint
distribution of the regressor and error term then point identification of the
structural function is still possible. This approach is relatively tractable,
and under some standard conditions we demonstrate that our point identifying
assumption holds on a topologically generic set of density functions for the
joint distribution of regressor, error, and instrument. Our method also applies
to a well-known nonparametric quantile regression framework, and we are able to
state analogous point identification results in that context.
arXiv link: http://arxiv.org/abs/1906.05231v1
Generalized Beta Prime Distribution: Stochastic Model of Economic Exchange and Properties of Inequality Indices
distribution is a Generalized Beta Prime (also known as GB2), and some unique
properties of the latter, are the reason for GB2's success in describing
wealth/income distributions. We use housing sale prices as a proxy to
wealth/income distribution to numerically illustrate this point. We also
explore parametric limits of the distribution to do so analytically. We discuss
parametric properties of the inequality indices -- Gini, Hoover, Theil T and
Theil L -- vis-a-vis those of GB2 and introduce a new inequality index, which
serves a similar purpose. We argue that Hoover and Theil L are more appropriate
measures for distributions with power-law dependencies, especially fat tails,
such as GB2.
arXiv link: http://arxiv.org/abs/1906.04822v1
Bias-Aware Inference in Fuzzy Regression Discontinuity Designs
parameter in fuzzy designs. Our CSs are based on local linear regression, and
are bias-aware, in the sense that they take possible bias explicitly into
account. Their construction shares similarities with that of Anderson-Rubin CSs
in exactly identified instrumental variable models, and thereby avoids issues
with "delta method" approximations that underlie most commonly used existing
inference methods for fuzzy regression discontinuity analysis. Our CSs are
asymptotically equivalent to existing procedures in canonical settings with
strong identification and a continuous running variable. However, due to their
particular construction they are also valid under a wide range of empirically
relevant conditions in which existing methods can fail, such as setups with
discrete running variables, donut designs, and weak identification.
arXiv link: http://arxiv.org/abs/1906.04631v4
Regional economic convergence and spatial quantile regression
to be analyzed. In this paper, we adopt a quantile regression approach in
analyzing economic convergence. While previous work has performed quantile
regression at the national level, we focus on 187 European NUTS2 regions for
the period 1981-2009 and use spatial quantile regression to account for spatial
dependence.
arXiv link: http://arxiv.org/abs/1906.04613v1
The Regression Discontinuity Design
discontinuity design, covering identification, estimation, inference, and
falsification methods.
arXiv link: http://arxiv.org/abs/1906.04242v2
Efficient Bayesian estimation for GARCH-type models via Sequential Monte Carlo
parameter estimation and model selection methods for GARCH (Generalized
AutoRegressive Conditional Heteroskedasticity) style models. It provides an
alternative method for quantifying estimation uncertainty relative to classical
inference. Even with long time series, it is demonstrated that the posterior
distribution of model parameters are non-normal, highlighting the need for a
Bayesian approach and an efficient posterior sampling method. Efficient
approaches for both constructing the sequence of distributions in SMC, and
leave-one-out cross-validation, for long time series data are also proposed.
Finally, an unbiased estimator of the likelihood is developed for the Bad
Environment-Good Environment model, a complex GARCH-type model, which permits
exact Bayesian inference not previously available in the literature.
arXiv link: http://arxiv.org/abs/1906.03828v2
A Statistical Recurrent Stochastic Volatility Model for Stock Markets
financial sector while recurrent neural network (RNN) models are successfully
used in many large-scale industrial applications of Deep Learning. Our article
combines these two methods in a non-trivial way and proposes a model, which we
call the Statistical Recurrent Stochastic Volatility (SR-SV) model, to capture
the dynamics of stochastic volatility. The proposed model is able to capture
complex volatility effects (e.g., non-linearity and long-memory
auto-dependence) overlooked by the conventional SV models, is statistically
interpretable and has an impressive out-of-sample forecast performance. These
properties are carefully discussed and illustrated through extensive simulation
studies and applications to five international stock index datasets: The German
stock index DAX30, the Hong Kong stock index HSI50, the France market index
CAC40, the US stock market index SP500 and the Canada market index TSX250. An
user-friendly software package together with the examples reported in the paper
are available at https://github.com/vbayeslab.
arXiv link: http://arxiv.org/abs/1906.02884v3
Counterfactual Inference for Consumer Choice Across Many Product Categories
discrete choices, where the consumer chooses at most one product in a category,
but selects from multiple categories in parallel. The consumer's utility is
additive in the different categories. Her preferences about product attributes
as well as her price sensitivity vary across products and are in general
correlated across products. We build on techniques from the machine learning
literature on probabilistic models of matrix factorization, extending the
methods to account for time-varying product attributes and products going out
of stock. We evaluate the performance of the model using held-out data from
weeks with price changes or out of stock products. We show that our model
improves over traditional modeling approaches that consider each category in
isolation. One source of the improvement is the ability of the model to
accurately estimate heterogeneity in preferences (by pooling information across
categories); another source of improvement is its ability to estimate the
preferences of consumers who have rarely or never made a purchase in a given
category in the training data. Using held-out data, we show that our model can
accurately distinguish which consumers are most price sensitive to a given
product. We consider counterfactuals such as personally targeted price
discounts, showing that using a richer model such as the one we propose
substantially increases the benefits of personalization in discounts.
arXiv link: http://arxiv.org/abs/1906.02635v2
Indirect Inference for Locally Stationary Models
complex locally stationary models. We develop a local indirect inference
algorithm and establish the asymptotic properties of the proposed estimator.
Due to the nonparametric nature of locally stationary models, the resulting
indirect inference estimator exhibits nonparametric rates of convergence. We
validate our methodology with simulation studies in the confines of a locally
stationary moving average model and a new locally stationary multiplicative
stochastic volatility model. Using this indirect inference methodology and the
new locally stationary volatility model, we obtain evidence of non-linear,
time-varying volatility trends for monthly returns on several Fama-French
portfolios.
arXiv link: http://arxiv.org/abs/1906.01768v2
Assessing Disparate Impacts of Personalized Interventions: Identifiability and Bounds
leverage individual-level causal effect predictions in order to give the best
treatment to each individual or to prioritize program interventions for the
individuals most likely to benefit. While the sensitivity of these domains
compels us to evaluate the fairness of such policies, we show that actually
auditing their disparate impacts per standard observational metrics, such as
true positive rates, is impossible since ground truths are unknown. Whether our
data is experimental or observational, an individual's actual outcome under an
intervention different than that received can never be known, only predicted
based on features. We prove how we can nonetheless point-identify these
quantities under the additional assumption of monotone treatment response,
which may be reasonable in many applications. We further provide a sensitivity
analysis for this assumption by means of sharp partial-identification bounds
under violations of monotonicity of varying strengths. We show how to use our
results to audit personalized interventions using partially-identified ROC and
xROC curves and demonstrate this in a case study of a French job training
dataset.
arXiv link: http://arxiv.org/abs/1906.01552v1
The Laws of Motion of the Broker Call Rate in the United States
margin loan pricing, we analyze $1,367$ monthly observations of the U.S. broker
call money rate, which is the interest rate at which stock brokers can borrow
to fund their margin loans to retail clients. We describe the basic features
and mean-reverting behavior of this series and juxtapose the
empirically-derived laws of motion with the author's prior theories of margin
loan pricing (Garivaltis 2019a-b). This allows us to derive stochastic
differential equations that govern the evolution of the margin loan interest
rate and the leverage ratios of sophisticated brokerage clients (namely,
continuous time Kelly gamblers). Finally, we apply Merton's (1974) arbitrage
theory of corporate liability pricing to study theoretical constraints on the
risk premia that could be generated in the market for call money. Apparently,
if there is no arbitrage in the U.S. financial markets, the implication is that
the total volume of call loans must constitute north of $70%$ of the value of
all leveraged portfolios.
arXiv link: http://arxiv.org/abs/1906.00946v2
Stress Testing Network Reconstruction via Graphical Causal Model
plausible hypotheses about the multiple interconnections between the
macroeconomic variables and the risk parameters. In this paper, we propose a
graphical model for the reconstruction of the causal structure that links the
multiple macroeconomic variables and the assessed risk parameters, it is this
structure that we call Stress Testing Network (STN). In this model, the
relationships between the macroeconomic variables and the risk parameter define
a "relational graph" among their time-series, where related time-series are
connected by an edge. Our proposal is based on the temporal causal models, but
unlike, we incorporate specific conditions in the structure which correspond to
intrinsic characteristics this type of networks. Using the proposed model and
given the high-dimensional nature of the problem, we used regularization
methods to efficiently detect causality in the time-series and reconstruct the
underlying causal structure. In addition, we illustrate the use of model in
credit risk data of a portfolio. Finally, we discuss its uses and practical
benefits in stress testing.
arXiv link: http://arxiv.org/abs/1906.01468v3
Bayesian nonparametric graphical models for time-varying parameters VAR
statistical methods for analysing high-dimensional data and complex non-linear
relationships. A common approach for addressing dimensionality issues relies on
the use of static graphical structures for extracting the most significant
dependence interrelationships between the variables of interest. Recently,
Bayesian nonparametric techniques have become popular for modelling complex
phenomena in a flexible and efficient manner, but only few attempts have been
made in econometrics. In this paper, we provide an innovative Bayesian
nonparametric (BNP) time-varying graphical framework for making inference in
high-dimensional time series. We include a Bayesian nonparametric dependent
prior specification on the matrix of coefficients and the covariance matrix by
mean of a Time-Series DPP as in Nieto-Barajas et al. (2012). Following Billio
et al. (2019), our hierarchical prior overcomes over-parametrization and
over-fitting issues by clustering the vector autoregressive (VAR) coefficients
into groups and by shrinking the coefficients of each group toward a common
location. Our BNP timevarying VAR model is based on a spike-and-slab
construction coupled with dependent Dirichlet Process prior (DPP) and allows
to: (i) infer time-varying Granger causality networks from time series; (ii)
flexibly model and cluster non-zero time-varying coefficients; (iii)
accommodate for potential non-linearities. In order to assess the performance
of the model, we study the merits of our approach by considering a well-known
macroeconomic dataset. Moreover, we check the robustness of the method by
comparing two alternative specifications, with Dirac and diffuse spike prior
distributions.
arXiv link: http://arxiv.org/abs/1906.02140v1
The Age-Period-Cohort-Interaction Model for Describing and Investigating Inter-Cohort Deviations and Intra-Cohort Life-Course Dynamics
of age, period, and cohort, but disaggregation of the three dimensions is
difficult because cohort = period - age. We argue that this technical
difficulty reflects a disconnection between how cohort effect is conceptualized
and how it is modeled in the traditional age-period-cohort framework. We
propose a new method, called the age-period-cohort-interaction (APC-I) model,
that is qualitatively different from previous methods in that it represents
Ryder's (1965) theoretical account about the conditions under which cohort
differentiation may arise. This APC-I model does not require problematic
statistical assumptions and the interpretation is straightforward. It
quantifies inter-cohort deviations from the age and period main effects and
also permits hypothesis testing about intra-cohort life-course dynamics. We
demonstrate how this new model can be used to examine age, period, and cohort
patterns in women's labor force participation.
arXiv link: http://arxiv.org/abs/1906.08357v1
The Theory of Weak Revealed Preference
preference (WGARP) for both finite and infinite data sets of consumer choice.
We call it maximin rationalization, in which each pairwise choice is associated
with a "local" utility function. We develop its associated weak
revealed-preference theory. We show that preference recoverability and welfare
analysis \`a la Varian (1982) may not be informative enough, when the weak
axiom holds, but when consumers are not utility maximizers. We clarify the
reasons for this failure and provide new informative bounds for the consumer's
true preferences.
arXiv link: http://arxiv.org/abs/1906.00296v1
At What Level Should One Cluster Standard Errors in Paired and Small-Strata Experiments?
assigned to treatment, to estimate treatment effects, researchers often regress
their outcome on a treatment indicator and pair fixed effects, clustering
standard errors at the unit-ofrandomization level. We show that even if the
treatment has no effect, a 5%-level t-test based on this regression will
wrongly conclude that the treatment has an effect up to 16.5% of the time. To
fix this problem, researchers should instead cluster standard errors at the
pair level. Using simulations, we show that similar results apply to clustered
experiments with small strata.
arXiv link: http://arxiv.org/abs/1906.00288v10
Kernel Instrumental Variable Regression
relationships in observational data. If measurements of input X and output Y
are confounded, the causal relationship can nonetheless be identified if an
instrumental variable Z is available that influences X directly, but is
conditionally independent of Y given X and the unmeasured confounder. The
classic two-stage least squares algorithm (2SLS) simplifies the estimation
problem by modeling all relationships as linear functions. We propose kernel
instrumental variable regression (KIV), a nonparametric generalization of 2SLS,
modeling relations among X, Y, and Z as nonlinear functions in reproducing
kernel Hilbert spaces (RKHSs). We prove the consistency of KIV under mild
assumptions, and derive conditions under which convergence occurs at the
minimax optimal rate for unconfounded, single-stage RKHS regression. In doing
so, we obtain an efficient ratio between training sample sizes used in the
algorithm's first and second stages. In experiments, KIV outperforms state of
the art alternatives for nonparametric IV regression.
arXiv link: http://arxiv.org/abs/1906.00232v6
lspartition: Partitioning-Based Least Squares Regression
tool in empirical work. Common examples include regressions based on splines,
wavelets, and piecewise polynomials. This article discusses the main
methodological and numerical features of the R software package lspartition,
which implements modern estimation and inference results for partitioning-based
least squares (series) regression estimation. This article discusses the main
methodological and numerical features of the R software package lspartition,
which implements results for partitioning-based least squares (series)
regression estimation and inference from Cattaneo and Farrell (2013) and
Cattaneo, Farrell, and Feng (2019). These results cover the multivariate
regression function as well as its derivatives. First, the package provides
data-driven methods to choose the number of partition knots optimally,
according to integrated mean squared error, yielding optimal point estimation.
Second, robust bias correction is implemented to combine this point estimator
with valid inference. Third, the package provides estimates and inference for
the unknown function both pointwise and uniformly in the conditioning
variables. In particular, valid confidence bands are provided. Finally, an
extension to two-sample analysis is developed, which can be used in
treatment-control comparisons and related problems
arXiv link: http://arxiv.org/abs/1906.00202v2
nprobust: Nonparametric Kernel-Based Estimation and Robust Bias-Corrected Inference
very popular in Statistics, Economics, and many other disciplines. They are
routinely employed in applied work, either as part of the main empirical
analysis or as a preliminary ingredient entering some other estimation or
inference procedure. This article describes the main methodological and
numerical features of the software package nprobust, which offers an array of
estimation and inference procedures for nonparametric kernel-based density and
local polynomial regression methods, implemented in both the R and Stata
statistical platforms. The package includes not only classical bandwidth
selection, estimation, and inference methods (Wand and Jones, 1995; Fan and
Gijbels, 1996), but also other recent developments in the statistics and
econometrics literatures such as robust bias-corrected inference and coverage
error optimal bandwidth selection (Calonico, Cattaneo and Farrell, 2018, 2019).
Furthermore, this article also proposes a simple way of estimating optimal
bandwidths in practice that always delivers the optimal mean square error
convergence rate regardless of the specific evaluation point, that is, no
matter whether it is implemented at a boundary or interior point. Numerical
performance is illustrated using an empirical application and simulated data,
where a detailed numerical comparison with other R packages is given.
arXiv link: http://arxiv.org/abs/1906.00198v1
Counterfactual Analysis under Partial Identification Using Locally Robust Refinement
models with multiple equilibria, pose challenges in practice, especially when
parameters are set-identified and the identified set is large. In such cases,
researchers often choose to focus on a particular subset of equilibria for
counterfactual analysis, but this choice can be hard to justify. This paper
shows that some parameter values can be more "desirable" than others for
counterfactual analysis, even if they are empirically equivalent given the
data. In particular, within the identified set, some counterfactual predictions
can exhibit more robustness than others, against local perturbations of the
reduced forms (e.g. the equilibrium selection rule). We provide a
representation of this subset which can be used to simplify the implementation.
We illustrate our message using moment inequality models, and provide an
empirical application based on a model with top-coded data.
arXiv link: http://arxiv.org/abs/1906.00003v3
On Policy Evaluation with Aggregate Time-Series Shocks
endogenous and researchers have access to aggregate instruments. Our method
addresses the critical identification challenge -- unobserved confounding,
which renders conventional estimators invalid. Our proposal relies on a new
data-driven aggregation scheme that eliminates the unobserved confounders. We
illustrate the advantages of our algorithm using data from Nakamura and
Steinsson (2014) study of local fiscal multipliers. We introduce a finite
population model with aggregate uncertainty to analyze our estimator. We
establish conditions for consistency and asymptotic normality and show how to
use our estimator to conduct valid inference.
arXiv link: http://arxiv.org/abs/1905.13660v8
Learned Sectors: A fundamentals-driven sector reclassification project
modern Global economy. We analyze existing sectorization heuristics, and
observe that the most popular - the GICS (which informs the S&P 500), and the
NAICS (published by the U.S. Government) - are not entirely quantitatively
driven, but rather appear to be highly subjective and rooted in dogma. Building
on inferences from analysis of the capital structure irrelevance principle and
the Modigliani-Miller theoretic universe conditions, we postulate that
corporation fundamentals - particularly those components specific to the
Modigliani-Miller universe conditions - would be optimal descriptors of the
true economic domain of operation of a company. We generate a set of potential
candidate learned sector universes by varying the linkage method of a
hierarchical clustering algorithm, and the number of resulting sectors derived
from the model (ranging from 5 to 19), resulting in a total of 60 candidate
learned sector universes. We then introduce reIndexer, a backtest-driven sector
universe evaluation research tool, to rank the candidate sector universes
produced by our learned sector classification heuristic. This rank was utilized
to identify the risk-adjusted return optimal learned sector universe as being
the universe generated under CLINK (i.e. complete linkage), with 17 sectors.
The optimal learned sector universe was tested against the benchmark GICS
classification universe with reIndexer, outperforming on both absolute
portfolio value, and risk-adjusted return over the backtest period. We conclude
that our fundamentals-driven Learned Sector classification heuristic provides a
superior risk-diversification profile than the status quo classification
heuristic.
arXiv link: http://arxiv.org/abs/1906.03935v1
Threshold Regression with Nonparametric Sample Splitting
relationship between two variables nonparametrically determines the threshold.
We allow the observations to be cross-sectionally dependent so that the model
can be applied to determine an unknown spatial border for sample splitting over
a random field. We derive the uniform rate of convergence and the nonstandard
limiting distribution of the nonparametric threshold estimator. We also obtain
the root-n consistency and the asymptotic normality of the regression
coefficient estimator. Our model has broad empirical relevance as illustrated
by estimating the tipping point in social segregation problems as a function of
demographic characteristics; and determining metropolitan area boundaries using
nighttime light intensity collected from satellite imagery. We find that the
new empirical results are substantially different from those in the existing
studies.
arXiv link: http://arxiv.org/abs/1905.13140v3
Heterogeneity in demand and optimal price conditioning for local rail transport
LLC "Perm Local Rail Company". In this study we propose a regression tree based
approach for estimation of demand function for local rail tickets considering
high degree of demand heterogeneity by various trip directions and the goals of
travel. Employing detailed data on ticket sales for 5 years we estimate the
parameters of demand function and reveal the significant variation in price
elasticity of demand. While in average the demand is elastic by price, near a
quarter of trips is characterized by weakly elastic demand. Lower elasticity of
demand is correlated with lower degree of competition with other transport and
inflexible frequency of travel.
arXiv link: http://arxiv.org/abs/1905.12859v1
Deep Generalized Method of Moments for Instrumental Variable Analysis
effects when randomization or full control of confounders is not possible. The
application of standard methods such as 2SLS, GMM, and more recent variants are
significantly impeded when the causal effects are complex, the instruments are
high-dimensional, and/or the treatment is high-dimensional. In this paper, we
propose the DeepGMM algorithm to overcome this. Our algorithm is based on a new
variational reformulation of GMM with optimal inverse-covariance weighting that
allows us to efficiently control very many moment conditions. We further
develop practical techniques for optimization and model selection that make it
particularly successful in practice. Our algorithm is also computationally
tractable and can handle large-scale datasets. Numerical results show our
algorithm matches the performance of the best tuned methods in standard
settings and continues to work in high-dimensional settings where even recent
methods break.
arXiv link: http://arxiv.org/abs/1905.12495v2
Centered and non-centered variance inflation factor
linear regression from auxiliary centered regressions (with intercept) and
non-centered (without intercept). From these auxiliary regression, the centered
and non-centered Variance Inflation Factors are calculated, respectively. It is
also presented an expression that relate both of them.
arXiv link: http://arxiv.org/abs/1905.12293v1
The Income Fluctuation Problem and the Evolution of Wealth
on assets, non-financial income and impatience are all state dependent and
fluctuate over time. All three processes can be serially correlated and
mutually dependent. Rewards can be bounded or unbounded and wealth can be
arbitrarily large. Extending classic results from an earlier literature, we
determine conditions under which (a) solutions exist, are unique and are
globally computable, (b) the resulting wealth dynamics are stationary, ergodic
and geometrically mixing, and (c) the wealth distribution has a Pareto tail. We
show how these results can be used to extend recent studies of the wealth
distribution. Our conditions have natural economic interpretations in terms of
asymptotic growth rates for discounting and return on savings.
arXiv link: http://arxiv.org/abs/1905.13045v3
Matching on What Matters: A Pseudo-Metric Learning Approach to Matching Estimation in High Dimensions
each unit with maximally similar peers that had an alternative treatment
status--essentially replicating a randomized block design. However, as one
considers a growing number of continuous features, a curse of dimensionality
applies making asymptotically valid inference impossible (Abadie and Imbens,
2006). The alternative of ignoring plausibly relevant features is certainly no
better, and the resulting trade-off substantially limits the application of
matching methods to "wide" datasets. Instead, Li and Fu (2017) recasts the
problem of matching in a metric learning framework that maps features to a
low-dimensional space that facilitates "closer matches" while still capturing
important aspects of unit-level heterogeneity. However, that method lacks key
theoretical guarantees and can produce inconsistent estimates in cases of
heterogeneous treatment effects. Motivated by straightforward extension of
existing results in the matching literature, we present alternative techniques
that learn latent matching features through either MLPs or through siamese
neural networks trained on a carefully selected loss function. We benchmark the
resulting alternative methods in simulations as well as against two
experimental data sets--including the canonical NSW worker training program
data set--and find superior performance of the neural-net-based methods.
arXiv link: http://arxiv.org/abs/1905.12020v1
Graph-based era segmentation of international financial integration
in macroeconometrics, often addressed by visual inspections searching for data
patterns. Econophysics literature enables us to build complementary,
data-driven measures of financial integration using graphs. The present
contribution investigates the potential and interests of a novel 3-step
approach that combines several state-of-the-art procedures to i) compute
graph-based representations of the multivariate dependence structure of asset
prices time series representing the financial states of 32 countries world-wide
(1955-2015); ii) compute time series of 5 graph-based indices that characterize
the time evolution of the topologies of the graph; iii) segment these time
evolutions in piece-wise constant eras, using an optimization framework
constructed on a multivariate multi-norm total variation penalized functional.
The method shows first that it is possible to find endogenous stable eras of
world-wide financial integration. Then, our results suggest that the most
relevant globalization eras would be based on the historical patterns of global
capital flows, while the major regulatory events of the 1970s would only appear
as a cause of sub-segmentation.
arXiv link: http://arxiv.org/abs/1905.11842v1
Local Asymptotic Equivalence of the Bai and Ng (2004) and Moon and Perron (2004) Frameworks for Panel Unit Root Testing
panels with cross-sectional dependence generated by unobserved factors. We
reconsider the two prevalent approaches in the literature, that of Moon and
Perron (2004) and the PANIC setup proposed in Bai and Ng (2004). While these
have been considered as completely different setups, we show that, in case of
Gaussian innovations, the frameworks are asymptotically equivalent in the sense
that both experiments are locally asymptotically normal (LAN) with the same
central sequence. Using Le Cam's theory of statistical experiments we determine
the local asymptotic power envelope and derive an optimal test jointly in both
setups. We show that the popular Moon and Perron (2004) and Bai and Ng (2010)
tests only attain the power envelope in case there is no heterogeneity in the
long-run variance of the idiosyncratic components. The new test is
asymptotically uniformly most powerful irrespective of possible heterogeneity.
Moreover, it turns out that for any test, satisfying a mild regularity
condition, the size and local asymptotic power are the same under both data
generating processes. Thus, applied researchers do not need to decide on one of
the two frameworks to conduct unit root tests. Monte-Carlo simulations
corroborate our asymptotic results and document significant gains in
finite-sample power if the variances of the idiosyncratic shocks differ
substantially among the cross sectional units.
arXiv link: http://arxiv.org/abs/1905.11184v1
Score-Driven Exponential Random Graphs: A New Class of Time-Varying Parameter Models for Dynamical Networks
that exhibit dynamical features, we propose an extension of the Exponential
Random Graph Models (ERGMs) that accommodates the time variation of its
parameters. Inspired by the fast-growing literature on Dynamic Conditional
Score models, each parameter evolves according to an updating rule driven by
the score of the ERGM distribution. We demonstrate the flexibility of
score-driven ERGMs (SD-ERGMs) as data-generating processes and filters and show
the advantages of the dynamic version over the static one. We discuss two
applications to temporal networks from financial and political systems. First,
we consider the prediction of future links in the Italian interbank credit
network. Second, we show that the SD-ERGM allows discriminating between static
or time-varying parameters when used to model the U.S. Congress co-voting
network dynamics.
arXiv link: http://arxiv.org/abs/1905.10806v3
Inducing Sparsity and Shrinkage in Time-Varying Parameter Models
over-parameterized, particularly when the number of variables in the model is
large. Global-local priors are increasingly used to induce shrinkage in such
models. But the estimates produced by these priors can still have appreciable
uncertainty. Sparsification has the potential to reduce this uncertainty and
improve forecasts. In this paper, we develop computationally simple methods
which both shrink and sparsify TVP models. In a simulated data exercise we show
the benefits of our shrink-then-sparsify approach in a variety of sparse and
dense TVP regressions. In a macroeconomic forecasting exercise, we find our
approach to substantially improve forecast performance relative to shrinkage
alone.
arXiv link: http://arxiv.org/abs/1905.10787v2
Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments
machine learning methods in the presence of unobserved confounders with the aid
of a valid instrument. Such settings arise in A/B tests with an intent-to-treat
structure, where the experimenter randomizes over which user will receive a
recommendation to take an action, and we are interested in the effect of the
downstream action. We develop a statistical learning approach to the estimation
of heterogeneous effects, reducing the problem to the minimization of an
appropriate loss function that depends on a set of auxiliary models (each
corresponding to a separate prediction task). The reduction enables the use of
all recent algorithmic advances (e.g. neural nets, forests). We show that the
estimated effect model is robust to estimation errors in the auxiliary models,
by showing that the loss satisfies a Neyman orthogonality criterion. Our
approach can be used to estimate projections of the true effect model on
simpler hypothesis spaces. When these spaces are parametric, then the parameter
estimates are asymptotically normal, which enables construction of confidence
sets. We applied our method to estimate the effect of membership on downstream
webpage engagement on TripAdvisor, using as an instrument an intent-to-treat
A/B test among 4 million TripAdvisor users, where some users received an easier
membership sign-up process. We also validate our method on synthetic data and
on public datasets for the effects of schooling on income.
arXiv link: http://arxiv.org/abs/1905.10176v3
Semi-Parametric Efficient Policy Learning with Continuous Actions
spaces. We focus on observational data where the data collection policy is
unknown and needs to be estimated. We take a semi-parametric approach where the
value function takes a known parametric form in the treatment, but we are
agnostic on how it depends on the observed contexts. We propose a doubly robust
off-policy estimate for this setting and show that off-policy optimization
based on this estimate is robust to estimation errors of the policy function or
the regression model. Our results also apply if the model does not satisfy our
semi-parametric form, but rather we measure regret in terms of the best
projection of the true value function to this functional space. Our work
extends prior approaches of policy optimization from observational data that
only considered discrete actions. We provide an experimental evaluation of our
method in a synthetic data example motivated by optimal personalized pricing
and costly resource allocation.
arXiv link: http://arxiv.org/abs/1905.10116v2
Smoothing quantile regressions
check function, in a linear quantile regression context. Not only does the
resulting smoothed quantile regression estimator yield a lower mean squared
error and a more accurate Bahadur-Kiefer representation than the standard
estimator, but it is also asymptotically differentiable. We exploit the latter
to propose a quantile density estimator that does not suffer from the curse of
dimensionality. This means estimating the conditional density function without
worrying about the dimension of the covariate vector. It also allows for
two-stage efficient quantile regression estimation. Our asymptotic theory holds
uniformly with respect to the bandwidth and quantile level. Finally, we propose
a rule of thumb for choosing the smoothing bandwidth that should approximate
well the optimal bandwidth. Simulations confirm that our smoothed quantile
regression estimator indeed performs very well in finite samples.
arXiv link: http://arxiv.org/abs/1905.08535v3
Demand forecasting techniques for build-to-order lean manufacturing supply chains
such as electronics, automotive and fashion. They enable building products
based on individual requirements with a short lead time and minimum inventory
and production costs. Due to their nature, they differ significantly from
traditional supply chains. However, there have not been studies dedicated to
demand forecasting methods for this type of setting. This work makes two
contributions. First, it presents a new and unique data set from a manufacturer
in the BTO sector. Second, it proposes a novel data transformation technique
for demand forecasting of BTO products. Results from thirteen forecasting
methods show that the approach compares well to the state-of-the-art while
being easy to implement and to explain to decision-makers.
arXiv link: http://arxiv.org/abs/1905.07902v1
Conformal Prediction Interval Estimations with an Application to Day-Ahead and Intraday Power Markets
While initially stemming from the world of machine learning, it was never
applied or analyzed in the context of short-term electricity price forecasting.
Therefore, we elaborate the aspects that render Conformal Prediction worthwhile
to know and explain why its simple yet very efficient idea has worked in other
fields of application and why its characteristics are promising for short-term
power applications as well. We compare its performance with different
state-of-the-art electricity price forecasting models such as quantile
regression averaging (QRA) in an empirical out-of-sample study for three
short-term electricity time series. We combine Conformal Prediction with
various underlying point forecast models to demonstrate its versatility and
behavior under changing conditions. Our findings suggest that Conformal
Prediction yields sharp and reliable prediction intervals in short-term power
markets. We further inspect the effect each of Conformal Prediction's model
components has and provide a path-based guideline on how to find the best CP
model for each market.
arXiv link: http://arxiv.org/abs/1905.07886v2
Time Series Analysis and Forecasting of the US Housing Starts using Econometric and Machine Learning Model
the monthly value of housing starts for the year 2019 using several econometric
methods - ARIMA(X), VARX, (G)ARCH and machine learning algorithms - artificial
neural networks, ridge regression, K-Nearest Neighbors, and support vector
regression, and created an ensemble model. The ensemble model stacks the
predictions from various individual models, and gives a weighted average of all
predictions. The analyses suggest that the ensemble model has performed the
best among all the models as the prediction errors are the lowest, while the
econometric models have higher error rates.
arXiv link: http://arxiv.org/abs/1905.07848v1
Iterative Estimation of Nonparametric Regressions with Continuous Endogenous Variables and Discrete Instruments
independent variables when only discrete instruments are available that are
independent of the error term. Although this framework is very relevant for
applied research, its implementation is challenging, as the regression function
becomes the solution to a nonlinear integral equation. We propose a simple
iterative procedure to estimate such models and showcase some of its asymptotic
properties. In a simulation experiment, we detail its implementation in the
case when the instrumental variable is binary. We conclude with an empirical
application to returns to education.
arXiv link: http://arxiv.org/abs/1905.07812v3
Cointegration in high frequency data
when two asset prices are generated by a driftless It\^{o}-semimartingale
featuring jumps with infinite activity, observed regularly and synchronously at
high frequency. We develop a regression based estimation of the cointegrated
relations method and show the related consistency and central limit theory when
there is cointegration within that framework. We also provide a Dickey-Fuller
type residual based test for the null of no cointegration against the
alternative of cointegration, along with its limit theory. Under no
cointegration, the asymptotic limit is the same as that of the original
Dickey-Fuller residual based test, so that critical values can be easily
tabulated in the same way. Finite sample indicates adequate size and good power
properties in a variety of realistic configurations, outperforming original
Dickey-Fuller and Phillips-Perron type residual based tests, whose sizes are
distorted by non ergodic time-varying variance and power is altered by price
jumps. Two empirical examples consolidate the Monte-Carlo evidence that the
adapted tests can be rejected while the original tests are not, and vice versa.
arXiv link: http://arxiv.org/abs/1905.07081v2
A Comment on "Estimating Dynamic Discrete Choice Models with Hyperbolic Discounting" by Hanming Fang and Yang Wang
identification of time preferences in dynamic discrete choice under exclusion
restrictions (e.g. Yao et al., 2012; Lee, 2013; Ching et al., 2013; Norets and
Tang, 2014; Dub\'e et al., 2014; Gordon and Sun, 2015; Bajari et al., 2016;
Chan, 2017; Gayle et al., 2018). Fang and Wang's Proposition 2 claims generic
identification of a dynamic discrete choice model with hyperbolic discounting.
This claim uses a definition of "generic" that does not preclude the
possibility that a generically identified model is nowhere identified. To
illustrate this point, we provide two simple examples of models that are
generically identified in Fang and Wang's sense, but that are, respectively,
everywhere and nowhere identified. We conclude that Proposition 2 is void: It
has no implications for identification of the dynamic discrete choice model. We
show that its proof is incorrect and incomplete and suggest alternative
approaches to identification.
arXiv link: http://arxiv.org/abs/1905.07048v2
The Empirical Saddlepoint Estimator
(ESP) approximation of the distribution of solutions to empirical moment
conditions. We call it the ESP estimator. We prove its existence, consistency
and asymptotic normality, and we propose novel test statistics. We also show
that the ESP estimator corresponds to the MM (method of moments) estimator
shrunk toward parameter values with lower estimated variance, so it reduces the
documented instability of existing moment-based estimators. In the case of
just-identified moment conditions, which is the case we focus on, the ESP
estimator is different from the MM estimator, unlike the recently proposed
alternatives, such as the empirical-likelihood-type estimators.
arXiv link: http://arxiv.org/abs/1905.06977v1
Inference in a class of optimization problems: Confidence regions and finite sample bounds on errors in coverage probabilities
on partially identified parameters that are solutions to a class of
optimization problems. Applications in which the optimization problems arise
include estimation under shape restrictions, estimation of models of discrete
games, and estimation based on grouped data. The partially identified
parameters are characterized by restrictions that involve the unknown
population means of observed random variables in addition to structural
parameters. Inference consists of finding confidence intervals for functions of
the structural parameters. Our theory provides finite-sample lower bounds on
the coverage probabilities of the confidence intervals under three sets of
assumptions of increasing strength. With the moderate sample sizes found in
most economics applications, the bounds become tighter as the assumptions
strengthen. We discuss estimation of population parameters that the bounds
depend on and contrast our methods with alternative methods for obtaining
confidence intervals for partially identified parameters. The results of Monte
Carlo experiments and empirical examples illustrate the usefulness of our
method.
arXiv link: http://arxiv.org/abs/1905.06491v6
mRSC: Multi-dimensional Robust Synthetic Control
possible to conduct a randomized control trial. In settings where only
observational data is available, Synthetic Control (SC) methods provide a
popular data-driven approach to estimate a "synthetic" control by combining
measurements of "similar" units (donors). Recently, Robust SC (RSC) was
proposed as a generalization of SC to overcome the challenges of missing data
high levels of noise, while removing the reliance on domain knowledge for
selecting donors. However, SC, RSC, and their variants, suffer from poor
estimation when the pre-intervention period is too short. As the main
contribution, we propose a generalization of unidimensional RSC to
multi-dimensional RSC, mRSC. Our proposed mechanism incorporates multiple
metrics to estimate a synthetic control, thus overcoming the challenge of poor
inference from limited pre-intervention data. We show that the mRSC algorithm
with $K$ metrics leads to a consistent estimator of the synthetic control for
the target unit under any metric. Our finite-sample analysis suggests that the
prediction error decays to zero at a rate faster than the RSC algorithm by a
factor of $K$ and $K$ for the training and testing periods (pre- and
post-intervention), respectively. Additionally, we provide a diagnostic test
that evaluates the utility of including additional metrics. Moreover, we
introduce a mechanism to validate the performance of mRSC: time series
prediction. That is, we propose a method to predict the future evolution of a
time series based on limited data when the notion of time is relative and not
absolute, i.e., we have access to a donor pool that has undergone the desired
future evolution. Finally, we conduct experimentation to establish the efficacy
of mRSC on synthetic data and two real-world case studies (retail and Cricket).
arXiv link: http://arxiv.org/abs/1905.06400v3
Analyzing Subjective Well-Being Data with Misclassification
non-classical measurement error in reported life satisfaction (LS) and study
the potential effects from ignoring it. Our dataset comes from Wave 3 of the UK
Understanding Society that is surveyed from 35,000 British households. Our test
finds evidence of measurement error in reported LS for the entire dataset as
well as for 26 out of 32 socioeconomic subgroups in the sample. We estimate the
joint distribution of reported and latent LS nonparametrically in order to
understand the mis-reporting behavior. We show this distribution can then be
used to estimate parametric models of latent LS. We find measurement error bias
is not severe enough to distort the main drivers of LS. But there is an
important difference that is policy relevant. We find women tend to over-report
their latent LS relative to men. This may help explain the gender puzzle that
questions why women are reportedly happier than men despite being worse off on
objective outcomes such as income and employment.
arXiv link: http://arxiv.org/abs/1905.06037v1
Sustainable Investing and the Cross-Section of Returns and Maximum Drawdown
of returns and maximum drawdown for stocks in the US equity market. Our data
run from January 1970 to December 2019 and our analysis includes ordinary least
squares, penalized linear regressions, tree-based models, and neural networks.
We find that the most important predictors tended to be consistent across
models, and that non-linear models had better predictive power than linear
models. Predictive power was higher in calm periods than in stressed periods.
Environmental, social, and governance indicators marginally impacted the
predictive power of non-linear models in our data, despite their negative
correlation with maximum drawdown and positive correlation with returns. Upon
exploring whether ESG variables are captured by some models, we find that ESG
data contribute to the prediction nonetheless.
arXiv link: http://arxiv.org/abs/1905.05237v2
Regression Discontinuity Design with Multiple Groups for Heterogeneous Causal Effect Estimation
utilizes a regression discontinuity (RD) design for multiple datasets with
different thresholds. The standard RD design is frequently used in applied
researches, but the result is very limited in that the average treatment
effects is estimable only at the threshold on the running variable. In
application studies it is often the case that thresholds are different among
databases from different regions or firms. For example thresholds for
scholarship differ with states. The proposed estimator based on the augmented
inverse probability weighted local linear estimator can estimate the average
effects at an arbitrary point on the running variable between the thresholds
under mild conditions, while the method adjust for the difference of the
distributions of covariates among datasets. We perform simulations to
investigate the performance of the proposed estimator in the finite samples.
arXiv link: http://arxiv.org/abs/1905.04443v1
Demand and Welfare Analysis in Discrete Choice Models with Social Interactions
causing targeted policies to have spillover-effects. This paper develops novel
empirical tools for analyzing demand and welfare-effects of
policy-interventions in binary choice settings with social interactions.
Examples include subsidies for health-product adoption and vouchers for
attending a high-achieving school. We establish the connection between
econometrics of large games and Brock-Durlauf-type interaction models, under
both I.I.D. and spatially correlated unobservables. We develop new convergence
results for associated beliefs and estimates of preference-parameters under
increasing-domain spatial asymptotics. Next, we show that even with fully
parametric specifications and unique equilibrium, choice data, that are
sufficient for counterfactual demand-prediction under interactions, are
insufficient for welfare-calculations. This is because distinct underlying
mechanisms producing the same interaction coefficient can imply different
welfare-effects and deadweight-loss from a policy-intervention. Standard
index-restrictions imply distribution-free bounds on welfare. We illustrate our
results using experimental data on mosquito-net adoption in rural Kenya.
arXiv link: http://arxiv.org/abs/1905.04028v2
Identifying Present-Bias from the Timing of Choices
report, or complete a task at work. We ask whether time preferences can be
inferred when only task completion is observed. To answer this
question, we analyze the following model: each period a decision maker faces
the choice whether to complete the task today or to postpone it to later. Cost
and benefits of task completion cannot be directly observed by the analyst, but
the analyst knows that net benefits are drawn independently between periods
from a time-invariant distribution and that the agent has time-separable
utility. Furthermore, we suppose the analyst can observe the agent's exact
stopping probability. We establish that for any agent with quasi-hyperbolic
$\beta,\delta$-preferences and given level of partial naivete $\beta$,
the probability of completing the task conditional on not having done it
earlier increases towards the deadline. And conversely, for any given
preference parameters $\beta,\delta$ and (weakly increasing) profile of task
completion probability, there exists a stationary payoff distribution that
rationalizes her behavior as long as the agent is either sophisticated or fully
naive. An immediate corollary being that, without parametric assumptions, it is
impossible to rule out time-consistency even when imposing an a priori
assumption on the permissible long-run discount factor. We also provide an
exact partial identification result when the analyst can, in addition to the
stopping probability, observe the agent's continuation value.
arXiv link: http://arxiv.org/abs/1905.03959v1
The Likelihood of Mixed Hitting Times
model that specifies durations as the first time a latent L\'evy process
crosses a heterogeneous threshold. This likelihood is not generally known in
closed form, but its Laplace transform is. Our approach to its computation
relies on numerical methods for inverting Laplace transforms that exploit
special properties of the first passage times of L\'evy processes. We use our
method to implement a maximum likelihood estimator of the mixed hitting-time
model in MATLAB. We illustrate the application of this estimator with an
analysis of Kennan's (1985) strike data.
arXiv link: http://arxiv.org/abs/1905.03463v2
Lasso under Multi-way Clustering: Estimation and Post-selection Inference
sampled under multi-way clustering. First, we establish convergence rates for
the lasso and post-lasso estimators. Second, we propose a novel inference
method based on a post-double-selection procedure and show its asymptotic
validity. Our procedure can be easily implemented with existing statistical
packages. Simulation results demonstrate that the proposed procedure works well
in finite sample. We illustrate the proposed method with a couple of empirical
applications to development and growth economics.
arXiv link: http://arxiv.org/abs/1905.02107v3
Estimation of high-dimensional factor models and its application in power data analysis
reducing dimensions and extracting relevant information. The spectrum of
covariance matrices from power data exhibits two aspects: 1) bulk, which arises
from random noise or fluctuations and 2) spikes, which represents factors
caused by anomaly events. In this paper, we propose a new approach to the
estimation of high-dimensional factor models, minimizing the distance between
the empirical spectral density (ESD) of covariance matrices of the residuals of
power data that are obtained by subtracting principal components and the
limiting spectral density (LSD) from a multiplicative covariance structure
model. The free probability theory (FPT) is used to derive the spectral density
of the multiplicative covariance model, which efficiently solves the
computational difficulties. The proposed approach connects the estimation of
the number of factors to the LSD of covariance matrices of the residuals, which
provides estimators of the number of factors and the correlation structure
information in the residuals. Considering a lot of measurement noise is
contained in the power data and the correlation structure is complex for the
residuals, the approach prefers approaching the ESD of covariance matrices of
the residuals through a multiplicative covariance model, which avoids making
crude assumptions or simplifications on the complex structure of the data.
Theoretical studies show the proposed approach is robust against noise and
sensitive to the presence of weak factors. The synthetic data from IEEE 118-bus
power system is used to validate the effectiveness of the approach.
Furthermore, the application to the analysis of the real-world online
monitoring data in a power grid shows that the estimators in the approach can
be used to indicate the system behavior.
arXiv link: http://arxiv.org/abs/1905.02061v2
Non-standard inference for augmented double autoregressive models with null volatility coefficients
allows null volatility coefficients to circumvent the over-parameterization
problem in the DAR model. Since the volatility coefficients might be on the
boundary, the statistical inference methods based on the Gaussian quasi-maximum
likelihood estimation (GQMLE) become non-standard, and their asymptotics
require the data to have a finite sixth moment, which narrows applicable scope
in studying heavy-tailed data. To overcome this deficiency, this paper develops
a systematic statistical inference procedure based on the self-weighted GQMLE
for the augmented DAR model. Except for the Lagrange multiplier test statistic,
the Wald, quasi-likelihood ratio and portmanteau test statistics are all shown
to have non-standard asymptotics. The entire procedure is valid as long as the
data is stationary, and its usefulness is illustrated by simulation studies and
one real example.
arXiv link: http://arxiv.org/abs/1905.01798v1
A Uniform Bound on the Operator Norm of Sub-Gaussian Random Matrices and Its Applications
sub-Gaussian entries $x_{it}(\beta)$ that may depend on a possibly
infinite-dimensional parameter $\beta\in B$, we obtain a uniform bound
on its operator norm of the form $E \sup_{\beta \in B}
||X(\beta)|| \leq CK \left(\max(N,T) +
\gamma_2(B,d_B)\right)$, where $C$ is an absolute constant,
$K$ controls the tail behavior of (the increments of) $x_{it}(\cdot)$, and
$\gamma_2(B,d_B)$ is Talagrand's functional, a measure of
multi-scale complexity of the metric space $(B,d_B)$. We
illustrate how this result may be used for estimation that seeks to minimize
the operator norm of moment conditions as well as for estimation of the maximal
number of factors with functional data.
arXiv link: http://arxiv.org/abs/1905.01096v4
Sparsity Double Robust Inference of Average Treatment Effects
under high-dimensional confounding require strong "ultra-sparsity" assumptions
that may be difficult to validate in practice. To alleviate this difficulty, we
here study a new method for average treatment effect estimation that yields
asymptotically exact confidence intervals assuming that either the conditional
response surface or the conditional probability of treatment allows for an
ultra-sparse representation (but not necessarily both). This guarantee allows
us to provide valid inference for average treatment effect in high dimensions
under considerably more generality than available baselines. In addition, we
showcase that our results are semi-parametrically efficient.
arXiv link: http://arxiv.org/abs/1905.00744v1
Variational Bayesian Inference for Mixed Logit Models with Unobserved Inter- and Intra-Individual Heterogeneity
fast and scalable estimation of complex probabilistic models. Thus far,
applications of VB in discrete choice analysis have been limited to mixed logit
models with unobserved inter-individual taste heterogeneity. However, such a
model formulation may be too restrictive in panel data settings, since tastes
may vary both between individuals as well as across choice tasks encountered by
the same individual. In this paper, we derive a VB method for posterior
inference in mixed logit models with unobserved inter- and intra-individual
heterogeneity. In a simulation study, we benchmark the performance of the
proposed VB method against maximum simulated likelihood (MSL) and Markov chain
Monte Carlo (MCMC) methods in terms of parameter recovery, predictive accuracy
and computational efficiency. The simulation study shows that VB can be a fast,
scalable and accurate alternative to MSL and MCMC estimation, especially in
applications in which fast predictions are paramount. VB is observed to be
between 2.8 and 17.7 times faster than the two competing methods, while
affording comparable or superior accuracy. Besides, the simulation study
demonstrates that a parallelised implementation of the MSL estimator with
analytical gradients is a viable alternative to MCMC in terms of both
estimation accuracy and computational efficiency, as the MSL estimator is
observed to be between 0.9 and 2.1 times faster than MCMC.
arXiv link: http://arxiv.org/abs/1905.00419v3
Boosting: Why You Can Use the HP Filter
methods in applied macroeconomic research. Like all nonparametric methods, the
HP filter depends critically on a tuning parameter that controls the degree of
smoothing. Yet in contrast to modern nonparametric methods and applied work
with these procedures, empirical practice with the HP filter almost universally
relies on standard settings for the tuning parameter that have been suggested
largely by experimentation with macroeconomic data and heuristic reasoning. As
recent research (Phillips and Jin, 2015) has shown, standard settings may not
be adequate in removing trends, particularly stochastic trends, in economic
data.
This paper proposes an easy-to-implement practical procedure of iterating the
HP smoother that is intended to make the filter a smarter smoothing device for
trend estimation and trend elimination. We call this iterated HP technique the
boosted HP filter in view of its connection to $L_{2}$-boosting in machine
learning. The paper develops limit theory to show that the boosted HP (bHP)
filter asymptotically recovers trend mechanisms that involve unit root
processes, deterministic polynomial drifts, and polynomial drifts with
structural breaks. A stopping criterion is used to automate the iterative HP
algorithm, making it a data-determined method that is ready for modern
data-rich environments in economic research. The methodology is illustrated
using three real data examples that highlight the differences between simple HP
filtering, the data-determined boosted filter, and an alternative
autoregressive approach. These examples show that the bHP filter is helpful in
analyzing a large collection of heterogeneous macroeconomic time series that
manifest various degrees of persistence, trend behavior, and volatility.
arXiv link: http://arxiv.org/abs/1905.00175v3
A Factor-Augmented Markov Switching (FAMS) Model
context of Markov switching models with time varying transition probabilities.
Markov switching models are commonly employed in empirical macroeconomic
research and policy work. However, the information used to model the switching
process is usually limited drastically to ensure stability of the model.
Increasing the number of included variables to enlarge the information set
might even result in decreasing precision of the model. Moreover, it is often
not clear a priori which variables are actually relevant when it comes to
informing the switching behavior. Building strongly on recent contributions in
the field of factor analysis, we introduce a general type of Markov switching
autoregressive models for non-linear time series analysis. Large numbers of
time series are allowed to inform the switching process through a factor
structure. This factor-augmented Markov switching (FAMS) model overcomes
estimation issues that are likely to arise in previous assessments of the
modeling framework. More accurate estimates of the switching behavior as well
as improved model fit result. The performance of the FAMS model is illustrated
in a simulated data example as well as in an US business cycle application.
arXiv link: http://arxiv.org/abs/1904.13194v2
Fast Mesh Refinement in Pseudospectral Optimal Control
--- simply increase the order $N$ of the Lagrange interpolating polynomial and
the mathematics of convergence automates the distribution of the grid points.
Unfortunately, as $N$ increases, the condition number of the resulting linear
algebra increases as $N^2$; hence, spectral efficiency and accuracy are lost in
practice. In this paper, we advance Birkhoff interpolation concepts over an
arbitrary grid to generate well-conditioned PS optimal control discretizations.
We show that the condition number increases only as $N$ in general, but
is independent of $N$ for the special case of one of the boundary points being
fixed. Hence, spectral accuracy and efficiency are maintained as $N$ increases.
The effectiveness of the resulting fast mesh refinement strategy is
demonstrated by using polynomials of over a thousandth order to
solve a low-thrust, long-duration orbit transfer problem.
arXiv link: http://arxiv.org/abs/1904.12992v1
Exact Testing of Many Moment Inequalities Against Multiple Violations
the number of moment inequalities ($p$) is possibly larger than the sample size
($n$). Chernozhukov et al. (2019) proposed asymptotic tests for this problem
using the maximum $t$ statistic. We observe that such tests can have low power
if multiple inequalities are violated. As an alternative, we propose novel
randomization tests based on a maximum non-negatively weighted combination of
$t$ statistics. We provide a condition guaranteeing size control in large
samples. Simulations show that the tests control size in small samples ($n =
30$, $p = 1000$), and often has substantially higher power against alternatives
with multiple violations than tests based on the maximum $t$ statistic.
arXiv link: http://arxiv.org/abs/1904.12775v3
Working women and caste in India: A study of social disadvantage using feature attribution
historically been engaged in labour-intensive, blue-collar work. We study
whether there has been any change in the ability to predict a woman's
work-status and work-type based on her caste by interpreting machine learning
models using feature attribution. We find that caste is now a less important
determinant of work for the younger generation of women compared to the older
generation. Moreover, younger women from disadvantaged castes are now more
likely to be working in white-collar jobs.
arXiv link: http://arxiv.org/abs/1905.03092v2
Nonparametric Estimation and Inference in Economic and Psychological Experiments
and inference in psychological and economic experiments. We consider an
experimental framework in which each of $n$subjects provides $T$ responses to a
vector of $T$ stimuli. We propose to estimate the unknown function $f$ linking
stimuli to responses through a nonparametric sieve estimator. We give
conditions for consistency when either $n$ or $T$ or both diverge. The rate of
convergence depends upon the error covariance structure, that is allowed to
differ across subjects. With these results we derive the optimal divergence
rate of the dimension of the sieve basis with both $n$ and $T$. We provide
guidance about the optimal balance between the number of subjects and questions
in a laboratory experiment and argue that a large $n$is often better than a
large $T$. We derive conditions for asymptotic normality of functionals of the
estimator of $T$ and apply them to obtain the asymptotic distribution of the
Wald test when the number of constraints under the null is finite and when it
diverges along with other asymptotic parameters. Lastly, we investigate the
previous properties when the conditional covariance matrix is replaced by an
estimator.
arXiv link: http://arxiv.org/abs/1904.11156v3
Forecasting in Big Data Environments: an Adaptable and Automated Shrinkage Estimation of Neural Networks (AAShNet)
settings, with high-dimension predictors ("big data" environments). To overcome
the curse of dimensionality and manage data and model complexity, we examine
shrinkage estimation of a back-propagation algorithm of a deep neural net with
skip-layer connections. We expressly include both linear and nonlinear
components. This is a high-dimensional learning approach including both
sparsity L1 and smoothness L2 penalties, allowing high-dimensionality and
nonlinearity to be accommodated in one step. This approach selects significant
predictors as well as the topology of the neural network. We estimate optimal
values of shrinkage hyperparameters by incorporating a gradient-based
optimization technique resulting in robust predictions with improved
reproducibility. The latter has been an issue in some approaches. This is
statistically interpretable and unravels some network structure, commonly left
to a black box. An additional advantage is that the nonlinear part tends to get
pruned if the underlying process is linear. In an application to forecasting
equity returns, the proposed approach captures nonlinear dynamics between
equities to enhance forecast performance. It offers an appreciable improvement
over current univariate and multivariate models by RMSE and actual portfolio
performance.
arXiv link: http://arxiv.org/abs/1904.11145v1
Identification of Regression Models with a Misclassified and Endogenous Binary Regressor
misclassified and endogenous binary regressor when an instrument is correlated
with misclassification error. We show that the regression function is
nonparametrically identified if one binary instrument variable and one binary
covariate satisfy the following conditions. The instrumental variable corrects
endogeneity; the instrumental variable must be correlated with the unobserved
true underlying binary variable, must be uncorrelated with the error term in
the outcome equation, but is allowed to be correlated with the
misclassification error. The covariate corrects misclassification; this
variable can be one of the regressors in the outcome equation, must be
correlated with the unobserved true underlying binary variable, and must be
uncorrelated with the misclassification error. We also propose a mixture-based
framework for modeling unobserved heterogeneous treatment effects with a
misclassified and endogenous binary regressor and show that treatment effects
can be identified if the true treatment effect is related to an observed
regressor and another observable variable.
arXiv link: http://arxiv.org/abs/1904.11143v3
Normal Approximation in Large Network Models
interactions and homophilous agents. Since data often consists of observations
on a single large network, we consider an asymptotic framework in which the
network size diverges. We argue that a modification of “stabilization”
conditions from the literature on geometric graphs provides a useful high-level
formulation of weak dependence which we utilize to establish an abstract
central limit theorem. Using results in branching process theory, we derive
interpretable primitive conditions for stabilization. The main conditions
restrict the strength of strategic interactions and equilibrium selection
mechanism. We discuss practical inference procedures justified by our results.
arXiv link: http://arxiv.org/abs/1904.11060v7
Average Density Estimators: Efficiency and Bootstrap Consistency
bootstrap consistency in the context of a canonical semiparametric estimation
problem, namely the problem of estimating the average density. It is shown that
although simple plug-in estimators suffer from bias problems preventing them
from achieving semiparametric efficiency under minimal smoothness conditions,
the nonparametric bootstrap automatically corrects for this bias and that, as a
result, these seemingly inferior estimators achieve bootstrap consistency under
minimal smoothness conditions. In contrast, several "debiased" estimators that
achieve semiparametric efficiency under minimal smoothness conditions do not
achieve bootstrap consistency under those same conditions.
arXiv link: http://arxiv.org/abs/1904.09372v2
Location-Sector Analysis of International Profit Shifting on a Multilayer Ownership-Tax Network
utilize their own tax revenues and carry out their own development for solving
poverty in their countries. However, developing countries cannot earn tax
revenues like developed countries partly because they do not have effective
countermeasures against international tax avoidance. Our analysis focuses on
treaty shopping among various ways to conduct international tax avoidance
because tax revenues of developing countries have been heavily damaged through
treaty shopping. To analyze the location and sector of conduit firms likely to
be used for treaty shopping, we constructed a multilayer ownership-tax network
and proposed multilayer centrality. Because multilayer centrality can consider
not only the value owing in the ownership network but also the withholding tax
rate, it is expected to grasp precisely the locations and sectors of conduit
firms established for the purpose of treaty shopping. Our analysis shows that
firms in the sectors of Finance & Insurance and Wholesale & Retail trade etc.
are involved with treaty shopping. We suggest that developing countries make a
clause focusing on these sectors in the tax treaties they conclude.
arXiv link: http://arxiv.org/abs/1904.09165v1
Ridge regularization for Mean Squared Error Reduction in Regression with Weak Instruments
are highly unstable with weak instruments. I propose a ridge estimator (ridge
IV) and show that it is asymptotically normal even with weak instruments,
whereas 2SLS is severely distorted and un-bounded. I motivate the ridge IV
estimator as a convex optimization problem with a GMM objective function and an
L2 penalty. I show that ridge IV leads to sizable mean squared error reductions
theoretically and validate these results in a simulation study inspired by data
designs of papers published in the American Economic Review.
arXiv link: http://arxiv.org/abs/1904.08580v1
Sharp Bounds for the Marginal Treatment Effect with Sample Selection
into the treatment group and into the observed sample. As a theoretical
contribution, I propose pointwise sharp bounds for the marginal treatment
effect (MTE) of interest within the always-observed subpopulation under
monotonicity assumptions. Moreover, I impose an extra mean dominance assumption
to tighten the previous bounds. I further discuss how to identify those bounds
when the support of the propensity score is either continuous or discrete.
Using these results, I estimate bounds for the MTE of the Job Corps Training
Program on hourly wages for the always-employed subpopulation and find that it
is decreasing in the likelihood of attending the program within the
Non-Hispanic group. For example, the Average Treatment Effect on the Treated is
between $.33 and $.99 while the Average Treatment Effect on the Untreated is
between $.71 and $3.00.
arXiv link: http://arxiv.org/abs/1904.08522v1
A Generalized Continuous-Multinomial Response Model with a t-distributed Error Kernel
utility are generally modeled using Gumbel or normal distributions. This study
makes a strong case to substitute these thin-tailed distributions with a
t-distribution. First, we demonstrate that a model with a t-distributed error
kernel better estimates and predicts preferences, especially in
class-imbalanced datasets. Our proposed specification also implicitly accounts
for decision-uncertainty behavior, i.e. the degree of certainty that
decision-makers hold in their choices relative to the variation in the indirect
utility of any alternative. Second, after applying a t-distributed error kernel
in a multinomial response model for the first time, we extend this
specification to a generalized continuous-multinomial (GCM) model and derive
its full-information maximum likelihood estimator. The likelihood involves an
open-form expression of the cumulative density function of the multivariate
t-distribution, which we propose to compute using a combination of the
composite marginal likelihood method and the separation-of-variables approach.
Third, we establish finite sample properties of the GCM model with a
t-distributed error kernel (GCM-t) and highlight its superiority over the GCM
model with a normally-distributed error kernel (GCM-N) in a Monte Carlo study.
Finally, we compare GCM-t and GCM-N in an empirical setting related to
preferences for electric vehicles (EVs). We observe that accounting for
decision-uncertainty behavior in GCM-t results in lower elasticity estimates
and a higher willingness to pay for improving the EV attributes than those of
the GCM-N model. These differences are relevant in making policies to expedite
the adoption of EVs.
arXiv link: http://arxiv.org/abs/1904.08332v3
Subgeometric ergodicity and $β$-mixing
$\beta$-mixing (absolutely regular) with geometrically decaying mixing
coefficients. Furthermore, for initial distributions other than the stationary
one, geometric ergodicity implies $\beta$-mixing under suitable moment
assumptions. In this note we show that similar results hold also for
subgeometrically ergodic Markov chains. In particular, for both stationary and
other initial distributions, subgeometric ergodicity implies $\beta$-mixing
with subgeometrically decaying mixing coefficients. Although this result is
simple it should prove very useful in obtaining rates of mixing in situations
where geometric ergodicity can not be established. To illustrate our results we
derive new subgeometric ergodicity and $\beta$-mixing results for the
self-exciting threshold autoregressive model.
arXiv link: http://arxiv.org/abs/1904.07103v2
Subgeometrically ergodic autoregressions
chain theory can be exploited to study stationarity and ergodicity of nonlinear
time series models. Subgeometric ergodicity means that the transition
probability measures converge to the stationary measure at a rate slower than
geometric. Specifically, we consider suitably defined higher-order nonlinear
autoregressions that behave similarly to a unit root process for large values
of the observed series but we place almost no restrictions on their dynamics
for moderate values of the observed series. Results on the subgeometric
ergodicity of nonlinear autoregressions have previously appeared only in the
first-order case. We provide an extension to the higher-order case and show
that the autoregressions we consider are, under appropriate conditions,
subgeometrically ergodic. As useful implications we also obtain stationarity
and $\beta$-mixing with subgeometrically decaying mixing coefficients.
arXiv link: http://arxiv.org/abs/1904.07089v3
Estimation of Cross-Sectional Dependence in Large Panels
data analysis is paramount to further statistical analysis on the data under
study. Grouping more data with weak relations (cross{sectional dependence)
together often results in less efficient dimension reduction and worse
forecasting. This paper describes cross-sectional dependence among a large
number of objects (time series) via a factor model and parameterizes its extent
in terms of strength of factor loadings. A new joint estimation method,
benefiting from unique feature of dimension reduction for high dimensional time
series, is proposed for the parameter representing the extent and some other
parameters involved in the estimation procedure. Moreover, a joint asymptotic
distribution for a pair of estimators is established. Simulations illustrate
the effectiveness of the proposed estimation method in the finite sample
performance. Applications in cross-country macro-variables and stock returns
from S&P 500 are studied.
arXiv link: http://arxiv.org/abs/1904.06843v1
Peer Effects in Random Consideration Sets
into random consideration sets. We characterize the equilibrium behavior and
study the empirical content of the model. In our setup, changes in the choices
of friends affect the distribution of the consideration sets. We exploit this
variation to recover the ranking of preferences, attention mechanisms, and
network connections. These nonparametric identification results allow
unrestricted heterogeneity across people and do not rely on the variation of
either covariates or the set of available options. Our methodology leads to a
maximum-likelihood estimator that performs well in simulations. We apply our
results to an experimental dataset that has been designed to study the visual
focus of attention.
arXiv link: http://arxiv.org/abs/1904.06742v3
Complex Network Construction of Internet Financial risk
payment, capital borrowing and lending and transaction processing. In order to
study the internal risks, this paper uses the Internet financial risk elements
as the network node to construct the complex network of Internet financial risk
system. Different from the study of macroeconomic shocks and financial
institution data, this paper mainly adopts the perspective of complex system to
analyze the systematic risk of Internet finance. By dividing the entire
financial system into Internet financial subnet, regulatory subnet and
traditional financial subnet, the paper discusses the relationship between
contagion and contagion among different risk factors, and concludes that risks
are transmitted externally through the internal circulation of Internet
finance, thus discovering potential hidden dangers of systemic risks. The
results show that the nodes around the center of the whole system are the main
objects of financial risk contagion in the Internet financial network. In
addition, macro-prudential regulation plays a decisive role in the control of
the Internet financial system, and points out the reasons why the current
regulatory measures are still limited. This paper summarizes a research model
which is still in its infancy, hoping to open up new prospects and directions
for us to understand the cascading behaviors of Internet financial risks.
arXiv link: http://arxiv.org/abs/1904.06640v3
Pólygamma Data Augmentation to address Non-conjugacy in the Bayesian Estimation of Mixed Multinomial Logit Models
sampling from conditional densities of utility parameters using
Metropolis-Hastings (MH) algorithm due to unavailability of conjugate prior for
logit kernel. To address this non-conjugacy concern, we propose the application
of P\'olygamma data augmentation (PG-DA) technique for the MMNL estimation. The
posterior estimates of the augmented and the default Gibbs sampler are similar
for two-alternative scenario (binary choice), but we encounter empirical
identification issues in the case of more alternatives ($J \geq 3$).
arXiv link: http://arxiv.org/abs/1904.07688v1
Distribution Regression in Duration Analysis: an Application to Unemployment Spells
in duration analysis using randomly right-censored data. This generalizes
classical duration models by allowing situations where explanatory variables'
marginal effects freely vary with duration time. The article discusses
applications to testing uniform restrictions on the varying coefficients,
inferences on average marginal effects, and others involving conditional
distribution estimates. Finite sample properties of the proposed method are
studied by means of Monte Carlo experiments. Finally, we apply our proposal to
study the effects of unemployment benefits on unemployment duration.
arXiv link: http://arxiv.org/abs/1904.06185v2
Identification of Noncausal Models by Quantile Autoregressions
noncausal models in the framework of quantile autoregressions (QAR). We also
present asymptotics for the i.i.d. case with regularly varying distributed
innovations in QAR. This new modelling perspective is appealing for
investigating the presence of bubbles in economic and financial time series,
and is an alternative to approximate maximum likelihood methods. We illustrate
our analysis using hyperinflation episodes in Latin American countries.
arXiv link: http://arxiv.org/abs/1904.05952v1
On the construction of confidence intervals for ratios of expectations
expectations. The main approach to construct confidence intervals for such
parameters is the delta method. However, this asymptotic procedure yields
intervals that may not be relevant for small sample sizes or, more generally,
in a sequence-of-model framework that allows the expectation in the denominator
to decrease to $0$ with the sample size. In this setting, we prove a
generalization of the delta method for ratios of expectations and the
consistency of the nonparametric percentile bootstrap. We also investigate
finite-sample inference and show a partial impossibility result: nonasymptotic
uniform confidence intervals can be built for ratios of expectations but not at
every level. Based on this, we propose an easy-to-compute index to appraise the
reliability of the intervals based on the delta method. Simulations and an
application illustrate our results and the practical usefulness of our rule of
thumb.
arXiv link: http://arxiv.org/abs/1904.07111v1
Solving Dynamic Discrete Choice Models Using Smoothing and Sieve Methods
solve for either the integrated or expected value function in a general class
of dynamic discrete choice (DDC) models. We use importance sampling to
approximate the Bellman operators defining the two functions. The random
Bellman operators, and therefore also the corresponding solutions, are
generally non-smooth which is undesirable. To circumvent this issue, we
introduce a smoothed version of the random Bellman operator and solve for the
corresponding smoothed value function using sieve methods. We show that one can
avoid using sieves by generalizing and adapting the `self-approximating' method
of Rust (1997) to our setting. We provide an asymptotic theory for the
approximate solutions and show that they converge with root-N-rate, where $N$
is number of Monte Carlo draws, towards Gaussian processes. We examine their
performance in practice through a set of numerical experiments and find that
both methods perform well with the sieve method being particularly attractive
in terms of computational speed and accuracy.
arXiv link: http://arxiv.org/abs/1904.05232v2
Local Polynomial Estimation of Time-Varying Parameters in Nonlinear Models
of time-varying parameters in a broad class of nonlinear time series models. We
show the proposed estimators are consistent and follow normal distributions in
large samples under weak conditions. We also provide a precise characterisation
of the leading bias term due to smoothing, which has not been done before. We
demonstrate the usefulness of our general results by establishing primitive
conditions for local (quasi-)maximum-likelihood estimators of time-varying
models threshold autoregressions, ARCH models and Poisson autogressions with
exogenous co--variates, to be normally distributed in large samples and
characterise their leading biases. An empirical study of US corporate default
counts demonstrates the applicability of the proposed local linear estimator
for Poisson autoregression, shedding new light on the dynamic properties of US
corporate defaults.
arXiv link: http://arxiv.org/abs/1904.05209v3
Fixed Effects Binary Choice Models: Estimation and Inference with Long Panels
binary choice models mainly for two reasons: the incidental parameter problem
and the computational challenge even in moderately large panels. Using the
example of binary choice models with individual and time fixed effects, we show
how both issues can be alleviated by combining asymptotic bias corrections with
computational advances. Because unbalancedness is often encountered in applied
work, we investigate its consequences on the finite sample properties of
various (bias corrected) estimators. In simulation experiments we find that
analytical bias corrections perform particularly well, whereas split-panel
jackknife estimators can be severely biased in unbalanced panels.
arXiv link: http://arxiv.org/abs/1904.04217v3
Bayesian Estimation of Mixed Multinomial Logit Models: Advances and Simulation-Based Evaluations
computationally-efficient alternative to Markov chain Monte Carlo (MCMC)
methods for scalable Bayesian estimation of mixed multinomial logit (MMNL)
models. It has been established that VB is substantially faster than MCMC at
practically no compromises in predictive accuracy. In this paper, we address
two critical gaps concerning the usage and understanding of VB for MMNL. First,
extant VB methods are limited to utility specifications involving only
individual-specific taste parameters. Second, the finite-sample properties of
VB estimators and the relative performance of VB, MCMC and maximum simulated
likelihood estimation (MSLE) are not known. To address the former, this study
extends several VB methods for MMNL to admit utility specifications including
both fixed and random utility parameters. To address the latter, we conduct an
extensive simulation-based evaluation to benchmark the extended VB methods
against MCMC and MSLE in terms of estimation times, parameter recovery and
predictive accuracy. The results suggest that all VB variants with the
exception of the ones relying on an alternative variational lower bound
constructed with the help of the modified Jensen's inequality perform as well
as MCMC and MSLE at prediction and parameter recovery. In particular, VB with
nonconjugate variational message passing and the delta-method (VB-NCVMP-Delta)
is up to 16 times faster than MCMC and MSLE. Thus, VB-NCVMP-Delta can be an
attractive alternative to MCMC and MSLE for fast, scalable and accurate
estimation of MMNL models.
arXiv link: http://arxiv.org/abs/1904.03647v4
Second-order Inductive Inference: an axiomatic approach
instance a search engine ranking webpages given past searches. Resampling past
cases leads to different rankings and the extraction of deeper information. Yet
a rich database, with sufficiently diverse rankings, is often beyond reach.
Inexperience demands either "on the fly" learning-by-doing or prudence: the
arrival of a novel case does not force (i) a revision of current rankings, (ii)
dogmatism towards new rankings, or (iii) intransitivity. For this higher-order
framework of inductive inference, we derive a suitably unique numerical
representation of these rankings via a matrix on eventualities x cases and
describe a robust test of prudence. Applications include: the success/failure
of startups; the veracity of fake news; and novel conditions for the existence
of a yield curve that is robustly arbitrage-free.
arXiv link: http://arxiv.org/abs/1904.02934v5
Synthetic learner: model-free inference on treatments over time
many areas of interest, ranging from political economics, marketing to
healthcare. In this paper, we develop a non-parametric algorithm for detecting
the effects of treatment over time in the context of Synthetic Controls. The
method builds on counterfactual predictions from many algorithms without
necessarily assuming that the algorithms correctly capture the model. We
introduce an inferential procedure for detecting treatment effects and show
that the testing procedure is asymptotically valid for stationary, beta mixing
processes without imposing any restriction on the set of base algorithms under
consideration. We discuss consistency guarantees for average treatment effect
estimates and derive regret bounds for the proposed methodology. The class of
algorithms may include Random Forest, Lasso, or any other machine-learning
estimator. Numerical studies and an application illustrate the advantages of
the method.
arXiv link: http://arxiv.org/abs/1904.01490v2
Matching Points: Supplementing Instruments with Covariates in Triangular Models
the instrument takes on too few values. This paper presents a new method that
matches pairs of covariates and instruments to restore point identification in
this scenario in a triangular model. The model consists of a structural
function for a continuous outcome and a selection model for the discrete
endogenous variable. The structural outcome function must be continuous and
monotonic in a scalar disturbance, but it can be nonseparable. The selection
model allows for unrestricted heterogeneity. Global identification is obtained
under weak conditions. The paper also provides estimators of the structural
outcome function. Two empirical examples of the return to education and
selection into Head Start illustrate the value and limitations of the method.
arXiv link: http://arxiv.org/abs/1904.01159v3
Dynamically Optimal Treatment Allocation
evidence from randomized control trials can be utilized to guide personalized
decisions in challenging dynamic environments with budget and capacity
constraints. Recent advances in reinforcement learning now enable the solution
of many complex, real-world problems for the first time. We allow for
restricted classes of policy functions and prove that their regret decays at
rate n^(-0.5), the same as in the static case. Applying our methods to job
training, we find that by exploiting the problem's dynamic structure, we
achieve significantly higher welfare compared to static approaches.
arXiv link: http://arxiv.org/abs/1904.01047v5
Counterfactual Sensitivity and Robustness
parametric assumptions about the distribution of latent variables in structural
models. In particular, we derive bounds on counterfactuals as the distribution
of latent variables spans nonparametric neighborhoods of a given parametric
specification while other "structural" features of the model are maintained.
Our approach recasts the infinite-dimensional problem of optimizing the
counterfactual with respect to the distribution of latent variables (subject to
model constraints) as a finite-dimensional convex program. We also develop an
MPEC version of our method to further simplify computation in models with
endogenous parameters (e.g., value functions) defined by equilibrium
constraints. We propose plug-in estimators of the bounds and two methods for
inference. We also show that our bounds converge to the sharp nonparametric
bounds on counterfactuals as the neighborhood size becomes large. To illustrate
the broad applicability of our procedure, we present empirical applications to
matching models with transferable utility and dynamic discrete choice models.
arXiv link: http://arxiv.org/abs/1904.00989v4
Post-Selection Inference in Three-Dimensional Panel Data
Researchers use various combinations of fixed effects for three-dimensional
panels. When one imposes a parsimonious model and the true model is rich, then
it incurs mis-specification biases. When one employs a rich model and the true
model is parsimonious, then it incurs larger standard errors than necessary. It
is therefore useful for researchers to know correct models. In this light, Lu,
Miao, and Su (2018) propose methods of model selection. We advance this
literature by proposing a method of post-selection inference for regression
parameters. Despite our use of the lasso technique as means of model selection,
our assumptions allow for many and even all fixed effects to be nonzero.
Simulation studies demonstrate that the proposed method is more precise than
under-fitting fixed effect estimators, is more efficient than over-fitting
fixed effect estimators, and allows for as accurate inference as the oracle
estimator.
arXiv link: http://arxiv.org/abs/1904.00211v2
Simple subvector inference on sharp identified set in affine models
components of the parameter vector in the case in which the identified set is a
polygon. The proposed regularized estimator has three important properties: (i)
it has a uniform asymptotic Gaussian limit in the presence of flat faces in the
absence of redundant (or overidentifying) constraints (or vice versa); (ii) the
bias from regularization does not enter the first-order limiting distribution;
(iii) the estimator remains consistent for sharp (non-enlarged) identified set
for the individual components even in the non-regualar case. These properties
are used to construct uniformly valid confidence sets for an element
$\theta_{1}$ of a parameter vector $\theta\inR^{d}$ that is partially
identified by affine moment equality and inequality conditions. The proposed
confidence sets can be computed as a solution to a small number of linear and
convex quadratic programs, leading to a substantial decrease in computation
time and guarantees a global optimum. As a result, the method provides a
uniformly valid inference in applications in which the dimension of the
parameter space, $d$, and the number of inequalities, $k$, were previously
computationally unfeasible ($d,k=100$). The proposed approach can be extended
to construct confidence sets for intersection bounds, to construct joint
polygon-shaped confidence sets for multiple components of $\theta$, and to find
the set of solutions to a linear program. Inference for coefficients in the
linear IV regression model with an interval outcome is used as an illustrative
example.
arXiv link: http://arxiv.org/abs/1904.00111v3
Testing for Differences in Stochastic Network Structure
introduction of a social program or trade shock, alters agents' incentives to
form links in a network? This paper proposes analogues of a two-sample
Kolmogorov-Smirnov test, widely used in the literature to test the null
hypothesis of "no treatment effects", for network data. It first specifies a
testing problem in which the null hypothesis is that two networks are drawn
from the same random graph model. It then describes two randomization tests
based on the magnitude of the difference between the networks' adjacency
matrices as measured by the $2\to2$ and $\infty\to1$ operator norms. Power
properties of the tests are examined analytically, in simulation, and through
two real-world applications. A key finding is that the test based on the
$\infty\to1$ norm can be substantially more powerful than that based on the
$2\to2$ norm for the kinds of sparse and degree-heterogeneous networks common
in economics.
arXiv link: http://arxiv.org/abs/1903.11117v5
On the Effect of Imputation on the 2SLS Variance
investigate how both jointly affect inference on causal parameters.
Conventional methods to estimate the variance, which treat the imputed data as
if it was observed in the first place, are not reliable. We derive the
asymptotic variance and propose a heteroskedasticity robust variance estimator
for two-stage least squares which accounts for the imputation. Monte Carlo
simulations support our theoretical findings.
arXiv link: http://arxiv.org/abs/1903.11004v1
Time series models for realized covariance matrices based on the matrix-F distribution
realized covariance (RCOV) matrices. This CBF model is capable of capturing
heavy-tailed RCOV, which is an important stylized fact but could not be handled
adequately by the Wishart-based models. To further mimic the long memory
feature of the RCOV, a special CBF model with the conditional heterogeneous
autoregressive (HAR) structure is introduced. Moreover, we give a systematical
study on the probabilistic properties and statistical inferences of the CBF
model, including exploring its stationarity, establishing the asymptotics of
its maximum likelihood estimator, and giving some new inner-product-based tests
for its model checking. In order to handle a large dimensional RCOV matrix, we
construct two reduced CBF models -- the variance-target CBF model (for moderate
but fixed dimensional RCOV matrix) and the factor CBF model (for high
dimensional RCOV matrix). For both reduced models, the asymptotic theory of the
estimated parameters is derived. The importance of our entire methodology is
illustrated by simulation results and two real examples.
arXiv link: http://arxiv.org/abs/1903.12077v2
Ensemble Methods for Causal Effects in Panel Data Settings
effects of an intervention by predicting the counterfactual values of outcomes
for treated units, had they not received the treatment. Several approaches have
been proposed for this problem, including regression methods, synthetic control
methods and matrix completion methods. This paper considers an ensemble
approach, and shows that it performs better than any of the individual methods
in several economic datasets. Matrix completion methods are often given the
most weight by the ensemble, but this clearly depends on the setting. We argue
that ensemble methods present a fruitful direction for further research in the
causal panel data setting.
arXiv link: http://arxiv.org/abs/1903.10079v1
Machine Learning Methods Economists Should Know About
economics and econometrics. First we discuss the differences in goals, methods
and settings between the ML literature and the traditional econometrics and
statistics literatures. Then we discuss some specific methods from the machine
learning literature that we view as important for empirical researchers in
economics. These include supervised learning methods for regression and
classification, unsupervised learning methods, as well as matrix completion
methods. Finally, we highlight newly developed methods at the intersection of
ML and econometrics, methods that typically perform better than either
off-the-shelf ML or more traditional econometric methods when applied to
particular classes of problems, problems that include causal inference for
average treatment effects, optimal policy estimation, and estimation of the
counterfactual effect of price changes in consumer choice models.
arXiv link: http://arxiv.org/abs/1903.10075v1
Identification and Estimation of a Partially Linear Regression Model using Network Data
latent driver of link formation in a network. Rather than specify and fit a
parametric network formation model, I introduce a new method based on matching
pairs of agents with similar columns of the squared adjacency matrix, the ijth
entry of which contains the number of other agents linked to both agents i and
j. The intuition behind this approach is that for a large class of network
formation models the columns of the squared adjacency matrix characterize all
of the identifiable information about individual linking behavior. In this
paper, I describe the model, formalize this intuition, and provide consistent
estimators for the parameters of the regression model. Auerbach (2021)
considers inference and an application to network peer effects.
arXiv link: http://arxiv.org/abs/1903.09679v3
Feature quantization for parsimonious and interpretable predictive models
widely used. To improve prediction accuracy and interpretability, a
preprocessing step quantizing both continuous and categorical data is usually
performed: continuous features are discretized and, if numerous, levels of
categorical features are grouped. An even better predictive accuracy can be
reached by embedding this quantization estimation step directly into the
predictive estimation step itself. But doing so, the predictive loss has to be
optimized on a huge set. To overcome this difficulty, we introduce a specific
two-step optimization strategy: first, the optimization problem is relaxed by
approximating discontinuous quantization functions by smooth functions; second,
the resulting relaxed optimization problem is solved via a particular neural
network. The good performances of this approach, which we call glmdisc, are
illustrated on simulated and real data from the UCI library and Cr\'edit
Agricole Consumer Finance (a major European historic player in the consumer
credit market).
arXiv link: http://arxiv.org/abs/1903.08920v1
Omitted variable bias of Lasso-based inference methods: A finite sample analysis
post double Lasso and debiased Lasso. We show that these methods can exhibit
substantial omitted variable biases (OVBs) due to Lasso not selecting relevant
controls. This phenomenon can occur even when the coefficients are sparse and
the sample size is large and larger than the number of controls. Therefore,
relying on the existing asymptotic inference theory can be problematic in
empirical applications. We compare the Lasso-based inference methods to modern
high-dimensional OLS-based methods and provide practical guidance.
arXiv link: http://arxiv.org/abs/1903.08704v9
State-Building through Public Land Disposal? An Application of Matrix Completion for Counterfactual Prediction
for settlement, influenced the development of American frontier states. It uses
a treatment propensity-weighted matrix completion model to estimate the
counterfactual size of these states without homesteading. In simulation
studies, the method shows lower bias and variance than other estimators,
particularly in higher complexity scenarios. The empirical analysis reveals
that homestead policies significantly and persistently reduced state government
expenditure and revenue. These findings align with continuous
difference-in-differences estimates using 1.46 million land patent records.
This study's extension of the matrix completion method to include propensity
score weighting for causal effect estimation in panel data, especially in
staggered treatment contexts, enhances policy evaluation by improving the
precision of long-term policy impact assessments.
arXiv link: http://arxiv.org/abs/1903.08028v4
Bayesian MIDAS Penalized Regressions: Estimation, Selection, and Prediction
high-dimensional environment that resorts to Group Lasso penalization and
Bayesian techniques for estimation and inference. In particular, to improve the
prediction properties of the model and its sparse recovery ability, we consider
a Group Lasso with a spike-and-slab prior. Penalty hyper-parameters governing
the model shrinkage are automatically tuned via an adaptive MCMC algorithm. We
establish good frequentist asymptotic properties of the posterior of the
in-sample and out-of-sample prediction error, we recover the optimal posterior
contraction rate, and we show optimality of the posterior predictive density.
Simulations show that the proposed models have good selection and forecasting
performance in small samples, even when the design matrix presents
cross-correlation. When applied to forecasting U.S. GDP, our penalized
regressions can outperform many strong competitors. Results suggest that
financial variables may have some, although very limited, short-term predictive
content.
arXiv link: http://arxiv.org/abs/1903.08025v3
An Integrated Panel Data Approach to Modelling Economic Growth
parameter heterogeneity and cross-sectional dependence --- which are addressed
independently from each other in most studies. The purpose of this study is to
propose an integrated framework that extends the conventional linear growth
regression model to allow for parameter heterogeneity and cross-sectional error
dependence, while simultaneously performing variable selection. We also derive
the asymptotic properties of the estimator under both low and high dimensions,
and further investigate the finite sample performance of the estimator through
Monte Carlo simulations. We apply the framework to a dataset of 89 countries
over the period from 1960 to 2014. Our results reveal some cross-country
patterns not found in previous studies (e.g., "middle income trap hypothesis",
"natural resources curse hypothesis", "religion works via belief, not
practice", etc.).
arXiv link: http://arxiv.org/abs/1903.07948v1
Deciding with Judgment
boundary of the confidence interval. This statistical decision rule is
admissible and does not perform worse than the judgmental decision with a
probability equal to the confidence level, which is interpreted as a
coefficient of statistical risk aversion. The confidence level is related to
the decision maker's aversion to uncertainty and can be elicited with
laboratory experiments using urns a la Ellsberg. The decision rule is applied
to a problem of asset allocation for an investor whose judgmental decision is
to keep all her wealth in cash.
arXiv link: http://arxiv.org/abs/1903.06980v1
Inference for First-Price Auctions with Guerre, Perrigne, and Vuong's Estimator
first-price sealed-bid auctions model within the independent private value
paradigm. We show the asymptotic normality of the two-step nonparametric
estimator of Guerre, Perrigne, and Vuong (2000) (GPV), and propose an easily
implementable and consistent estimator of the asymptotic variance. We prove the
validity of the pointwise percentile bootstrap confidence intervals based on
the GPV estimator. Lastly, we use the intermediate Gaussian approximation
approach to construct bootstrap-based asymptotically valid uniform confidence
bands for the density of the valuations.
arXiv link: http://arxiv.org/abs/1903.06401v1
A statistical analysis of time trends in atmospheric ethane
and an important precursor of tropospheric ozone through various chemical
pathways. Ethane is also an indirect greenhouse gas (global warming potential),
influencing the atmospheric lifetime of methane through the consumption of the
hydroxyl radical (OH). Understanding the development of trends and identifying
trend reversals in atmospheric ethane is therefore crucial. Our dataset
consists of four series of daily ethane columns obtained from ground-based FTIR
measurements. As many other decadal time series, our data are characterized by
autocorrelation, heteroskedasticity, and seasonal effects. Additionally,
missing observations due to instrument failure or unfavorable measurement
conditions are common in such series. The goal of this paper is therefore to
analyze trends in atmospheric ethane with statistical tools that correctly
address these data features. We present selected methods designed for the
analysis of time trends and trend reversals. We consider bootstrap inference on
broken linear trends and smoothly varying nonlinear trends. In particular, for
the broken trend model, we propose a bootstrap method for inference on the
break location and the corresponding changes in slope. For the smooth trend
model we construct simultaneous confidence bands around the nonparametrically
estimated trend. Our autoregressive wild bootstrap approach, combined with a
seasonal filter, is able to handle all issues mentioned above.
arXiv link: http://arxiv.org/abs/1903.05403v2
From interpretability to inference: an estimation framework for universal approximators
class of universal approximators. Estimation is based on the decomposition of
model predictions into Shapley values. Inference relies on analyzing the bias
and variance properties of individual Shapley components. We show that Shapley
value estimation is asymptotically unbiased, and we introduce Shapley
regressions as a tool to uncover the true data generating process from noisy
data alone. The well-known case of the linear regression is the special case in
our framework if the model is linear in parameters. We present theoretical,
numerical, and empirical results for the estimation of heterogeneous treatment
effects as our guiding example.
arXiv link: http://arxiv.org/abs/1903.04209v6
Estimating Dynamic Conditional Spread Densities to Optimise Daily Storage Trading of Electricity
similar representations, to model and forecast electricity price spreads
between different hours of the day. This supports an optimal day ahead storage
and discharge schedule, and thereby facilitates a bidding strategy for a
merchant arbitrage facility into the day-ahead auctions for wholesale
electricity. The four latent moments of the density functions are dynamic and
conditional upon exogenous drivers, thereby permitting the mean, variance,
skewness and kurtosis of the densities to respond hourly to such factors as
weather and demand forecasts. The best specification for each spread is
selected based on the Pinball Loss function, following the closed form
analytical solutions of the cumulative density functions. Those analytical
properties also allow the calculation of risk associated with the spread
arbitrages. From these spread densities, the optimal daily operation of a
battery storage facility is determined.
arXiv link: http://arxiv.org/abs/1903.06668v1
A Varying Coefficient Model for Assessing the Returns to Growth to Account for Poverty and Inequality
of the middle class for economic growth. When explaining why these measures of
the income distribution are added to the growth regression, it is often
mentioned that poor people behave different which may translate to the economy
as a whole. However, simply adding explanatory variables does not reflect this
behavior. By a varying coefficient model we show that the returns to growth
differ a lot depending on poverty and inequality. Furthermore, we investigate
how these returns differ for the poorer and for the richer part of the
societies. We argue that the differences in the coefficients impede, on the one
hand, that the means coefficients are informative, and, on the other hand,
challenge the credibility of the economic interpretation. In short, we show
that, when estimating mean coefficients without accounting for poverty and
inequality, the estimation is likely to suffer from a serious endogeneity bias.
arXiv link: http://arxiv.org/abs/1903.02390v1
The Africa-Dummy: Gone with the Millennium?
and estimate the Africa-Dummy in one regression step so that its correct
standard errors as well as correlations to other coefficients can easily be
estimated. We can estimate the Nickel bias and found it to be negligibly tiny.
Semiparametric extensions check whether the Africa-Dummy is simply a result of
misspecification of the functional form. In particular, we show that the
returns to growth factors are different for Sub-Saharan African countries
compared to the rest of the world. For example, returns to population growth
are positive and beta-convergence is faster. When extending the model to
identify the development of the Africa-Dummy over time we see that it has been
changing dramatically over time and that the punishment for Sub-Saharan African
countries has been decreasing incrementally to reach insignificance around the
turn of the millennium.
arXiv link: http://arxiv.org/abs/1903.02357v1
Experimenting in Equilibrium
unit does not affect other units. There are many important settings, however,
where this non-interference assumption does not hold, as when running
experiments on supply-side incentives on a ride-sharing platform or subsidies
in an energy marketplace. In this paper, we introduce a new approach to
experimental design in large-scale stochastic systems with considerable
cross-unit interference, under an assumption that the interference is
structured enough that it can be captured via mean-field modeling. Our approach
enables us to accurately estimate the effect of small changes to system
parameters by combining unobstrusive randomization with lightweight modeling,
all while remaining in equilibrium. We can then use these estimates to optimize
the system by gradient descent. Concretely, we focus on the problem of a
platform that seeks to optimize supply-side payments p in a centralized
marketplace where different suppliers interact via their effects on the overall
supply-demand equilibrium, and show that our approach enables the platform to
optimize p in large systems using vanishingly small perturbations.
arXiv link: http://arxiv.org/abs/1903.02124v5
ppmlhdfe: Fast Poisson Estimation with High-Dimensional Fixed Effects
(pseudo) Poisson regression models with multiple high-dimensional fixed effects
(HDFE). Estimation is implemented using a modified version of the iteratively
reweighted least-squares (IRLS) algorithm that allows for fast estimation in
the presence of HDFE. Because the code is built around the reghdfe package, it
has similar syntax, supports many of the same functionalities, and benefits
from reghdfe's fast convergence properties for computing high-dimensional least
squares problems.
Performance is further enhanced by some new techniques we introduce for
accelerating HDFE-IRLS estimation specifically. ppmlhdfe also implements a
novel and more robust approach to check for the existence of (pseudo) maximum
likelihood estimates.
arXiv link: http://arxiv.org/abs/1903.01690v3
When do common time series estimands have nonparametric causal meaning?
framework for analyzing dynamic causal effects of assignments on outcomes in
observational time series settings. We provide conditions under which common
predictive time series estimands, such as the impulse response function,
generalized impulse response function, local projection, and local projection
instrumental variables, have a nonparametric causal interpretation in terms of
dynamic causal effects. The direct potential outcome system therefore provides
a foundation for analyzing popular reduced-form methods for estimating the
causal effect of macroeconomic shocks on outcomes in time series settings.
arXiv link: http://arxiv.org/abs/1903.01637v4
Verifying the existence of maximum likelihood estimates for generalized linear models
estimates are not guaranteed to exist. Though nonexistence is a well known
problem in the binary choice literature, it presents significant challenges for
other models as well and is not as well understood in more general settings.
These challenges are only magnified for models that feature many fixed effects
and other high-dimensional parameters. We address the current ambiguity
surrounding this topic by studying the conditions that govern the existence of
estimates for (pseudo-)maximum likelihood estimators used to estimate a wide
class of generalized linear models (GLMs). We show that some, but not all, of
these GLM estimators can still deliver consistent estimates of at least some of
the linear parameters when these conditions fail to hold. We also demonstrate
how to verify these conditions in models with high-dimensional parameters, such
as panel data models with multiple levels of fixed effects.
arXiv link: http://arxiv.org/abs/1903.01633v7
Finite Sample Inference for the Maximum Score Estimand
a semiparametric binary response model under a conditional median restriction
originally studied by Manski (1975, 1985). Our inference method is valid for
any sample size and irrespective of whether the structural parameters are point
identified or partially identified, for example due to the lack of a
continuously distributed covariate with large support. Our inference approach
exploits distributional properties of observable outcomes conditional on the
observed sequence of exogenous variables. Moment inequalities conditional on
this size n sequence of exogenous covariates are constructed, and the test
statistic is a monotone function of violations of sample moment inequalities.
The critical value used for inference is provided by the appropriate quantile
of a known function of n independent Rademacher random variables. We
investigate power properties of the underlying test and provide simulation
studies to support the theoretical findings.
arXiv link: http://arxiv.org/abs/1903.01511v2
Limit Theorems for Network Dependent Random Variables
observations are interconnected through an observed network. Following Doukhan
and Louhichi (1999), we measure the strength of dependence by covariances of
nonlinearly transformed variables. We provide a law of large numbers and
central limit theorem for network dependent variables. We also provide a method
of calculating standard errors robust to general forms of network dependence.
For that purpose, we rely on a network heteroskedasticity and autocorrelation
consistent (HAC) variance estimator, and show its consistency. The results rely
on conditions characterized by tradeoffs between the rate of decay of
dependence across a network and network's denseness. Our approach can
accommodate data generated by network formation models, random fields on
graphs, conditional dependency graphs, and large functional-causal systems of
equations.
arXiv link: http://arxiv.org/abs/1903.01059v6
Model Selection in Utility-Maximizing Binary Prediction
viewed as cost-sensitive binary classification; thus, its in-sample overfitting
issue is similar to that of perceptron learning. A utility-maximizing
prediction rule (UMPR) is constructed to alleviate the in-sample overfitting of
the maximum utility estimation. We establish non-asymptotic upper bounds on the
difference between the maximal expected utility and the generalized expected
utility of the UMPR. Simulation results show that the UMPR with an appropriate
data-dependent penalty achieves larger generalized expected utility than common
estimators in the binary classification if the conditional probability of the
binary outcome is misspecified.
arXiv link: http://arxiv.org/abs/1903.00716v3
Approximation Properties of Variational Bayes for Vector Autoregressions
It has the merit of being a fast and scalable alternative to Markov Chain Monte
Carlo (MCMC) but its approximation error is often unknown. In this paper, we
derive the approximation error of VB in terms of mean, mode, variance,
predictive density and KL divergence for the linear Gaussian multi-equation
regression. Our results indicate that VB approximates the posterior mean
perfectly. Factors affecting the magnitude of underestimation in posterior
variance and mode are revealed. Importantly, We demonstrate that VB estimates
predictive densities accurately.
arXiv link: http://arxiv.org/abs/1903.00617v1
Robust Nearly-Efficient Estimation of Large Panels with Factor Structures
heterogeneous coefficients, when both the regressors and the residual contain a
possibly common, latent, factor structure. Our theory is (nearly) efficient,
because based on the GLS principle, and also robust to the specification of
such factor structure because it does not require any information on the number
of factors nor estimation of the factor structure itself. We first show how the
unfeasible GLS estimator not only affords an efficiency improvement but, more
importantly, provides a bias-adjusted estimator with the conventional limiting
distribution, for situations where the OLS is affected by a first-order bias.
The technical challenge resolved in the paper is to show how these properties
are preserved for a class of feasible GLS estimators in a double-asymptotics
setting. Our theory is illustrated by means of Monte Carlo exercises and, then,
with an empirical application using individual asset returns and firms'
characteristics data.
arXiv link: http://arxiv.org/abs/1902.11181v1
Integrability and Identification in Multinomial Choice Models
workhorse of applied research. We establish shape-restrictions under which
multinomial choice-probability functions can be rationalized via random-utility
models with nonparametric unobserved heterogeneity and general income-effects.
When combined with an additional restriction, the above conditions are
equivalent to the canonical Additive Random Utility Model. The
sufficiency-proof is constructive, and facilitates nonparametric identification
of preference-distributions without requiring identification-at-infinity type
arguments. A corollary shows that Slutsky-symmetry, a key condition for
previous rationalizability results, is equivalent to absence of income-effects.
Our results imply theory-consistent nonparametric bounds for
choice-probabilities on counterfactual budget-sets. They also apply to widely
used random-coefficient models, upon conditioning on observable choice
characteristics. The theory of partial differential equations plays a key role
in our analysis.
arXiv link: http://arxiv.org/abs/1902.11017v4
The Empirical Content of Binary Choice Models
prediction on counterfactual budget sets arising from potential
policy-interventions. Such predictions are more credible when made without
arbitrary functional-form/distributional assumptions, and instead based solely
on economic rationality, i.e. that choice is consistent with utility
maximization by a heterogeneous population. This paper investigates
nonparametric economic rationality in the empirically important context of
binary choice. We show that under general unobserved heterogeneity, economic
rationality is equivalent to a pair of Slutsky-like shape-restrictions on
choice-probability functions. The forms of these restrictions differ from
Slutsky-inequalities for continuous goods. Unlike McFadden-Richter's stochastic
revealed preference, our shape-restrictions (a) are global, i.e. their forms do
not depend on which and how many budget-sets are observed, (b) are closed-form,
hence easy to impose on parametric/semi/non-parametric models in practical
applications, and (c) provide computationally simple, theory-consistent bounds
on demand and welfare predictions on counterfactual budget-sets.
arXiv link: http://arxiv.org/abs/1902.11012v4
Granger Causality Testing in High-Dimensional VARs: a Post-Double-Selection Procedure
based on penalized least squares estimations. To obtain a test retaining the
appropriate size after the variable selection done by the lasso, we propose a
post-double-selection procedure to partial out effects of nuisance variables
and establish its uniform asymptotic validity. We conduct an extensive set of
Monte-Carlo simulations that show our tests perform well under different data
generating processes, even without sparsity. We apply our testing procedure to
find networks of volatility spillovers and we find evidence that causal
relationships become clearer in high-dimensional compared to standard
low-dimensional VARs.
arXiv link: http://arxiv.org/abs/1902.10991v4
Estimation of Dynamic Panel Threshold Model using Stata
estimation of the dynamic panel threshold model, which Seo and Shin (2016,
Journal of Econometrics 195: 169-186) have proposed. Furthermore, We derive the
asymptotic variance formula for a kink constrained GMM estimator of the dynamic
threshold model and include an estimation algorithm. We also propose a fast
bootstrap algorithm to implement the bootstrap for the linearity test. The use
of the command is illustrated through a Monte Carlo simulation and an economic
application.
arXiv link: http://arxiv.org/abs/1902.10318v1
Penalized Sieve GEL for Weighted Average Derivatives of Nonparametric Quantile IV Regressions
derivative (WAD) of a nonparametric quantile instrumental variables regression
(NPQIV). NPQIV is a non-separable and nonlinear ill-posed inverse problem,
which might be why there is no published work on the asymptotic properties of
any estimator of its WAD. We first characterize the semiparametric efficiency
bound for a WAD of a NPQIV, which, unfortunately, depends on an unknown
conditional derivative operator and hence an unknown degree of ill-posedness,
making it difficult to know if the information bound is singular or not. In
either case, we propose a penalized sieve generalized empirical likelihood
(GEL) estimation and inference procedure, which is based on the unconditional
WAD moment restriction and an increasing number of unconditional moments that
are implied by the conditional NPQIV restriction, where the unknown quantile
function is approximated by a penalized sieve. Under some regularity
conditions, we show that the self-normalized penalized sieve GEL estimator of
the WAD of a NPQIV is asymptotically standard normal. We also show that the
quasi likelihood ratio statistic based on the penalized sieve GEL criterion is
asymptotically chi-square distributed regardless of whether or not the
information bound is singular.
arXiv link: http://arxiv.org/abs/1902.10100v1
Semiparametric estimation of heterogeneous treatment effects under the nonignorable assignment condition
heterogeneous treatment effects (HTE). HTE is the solution to certain integral
equation which belongs to the class of Fredholm integral equations of the first
kind, which is known to be ill-posed problem. Naive semi/nonparametric methods
do not provide stable solution to such problems. Then we propose to approximate
the function of interest by orthogonal series under the constraint which makes
the inverse mapping of integral to be continuous and eliminates the
ill-posedness. We illustrate the performance of the proposed estimator through
simulation experiments.
arXiv link: http://arxiv.org/abs/1902.09978v1
Binscatter Regressions
developed by Cattaneo, Crump, Farrell, and Feng (2024b,a). The package includes
seven commands: binsreg, binslogit, binsprobit, binsqreg, binstest, binspwc,
and binsregselect. The first four commands implement binscatter plotting, point
estimation, and uncertainty quantification (confidence intervals and confidence
bands) for least squares linear binscatter regression (binsreg) and for
nonlinear binscatter regression (binslogit for Logit regression, binsprobit for
Probit regression, and binsqreg for quantile regression). The next two commands
focus on pointwise and uniform inference: binstest implements hypothesis
testing procedures for parametric specifications and for nonparametric shape
restrictions of the unknown regression function, while binspwc implements
multi-group pairwise statistical comparisons. Finally, the command
binsregselect implements data-driven number of bins selectors. The commands
offer binned scatter plots, and allow for covariate adjustment, weighting,
clustering, and multi-sample analysis, which is useful when studying treatment
effect heterogeneity in randomized and observational studies, among many other
features.
arXiv link: http://arxiv.org/abs/1902.09615v5
On Binscatter
conducting informal specification testing. We study the properties of this
method formally and develop enhanced visualization and econometric binscatter
tools. These include estimating conditional means with optimal binning and
quantifying uncertainty. We also highlight a methodological problem related to
covariate adjustment that can yield incorrect conclusions. We revisit two
applications using our methodology and find substantially different results
relative to those obtained using prior informal binscatter methods. General
purpose software in Python, R, and Stata is provided. Our technical work is of
independent interest for the nonparametric partition-based estimation
literature.
arXiv link: http://arxiv.org/abs/1902.09608v5
Robust Principal Component Analysis with Non-Sparse Errors
matrix and a random error matrix with independent entries, the low-rank
component can be consistently estimated by solving a convex minimization
problem. We develop a new theoretical argument to establish consistency without
assuming sparsity or the existence of any moments of the error matrix, so that
fat-tailed continuous random errors such as Cauchy are allowed. The results are
illustrated by simulations.
arXiv link: http://arxiv.org/abs/1902.08735v2
Counterfactual Inference in Duration Models with Random Censoring
exogenous covariates and unobserved heterogeneity of unrestricted
dimensionality in duration models with random censoring. Under some regularity
conditions, we establish the joint weak convergence of the proposed
counterfactual estimator and the unconditional Kaplan-Meier (1958) estimator.
Applying the functional delta method, we make inference on the cumulative
hazard policy effect, that is, the change of duration dependence in response to
a counterfactual policy. We also evaluate the finite sample performance of the
proposed counterfactual estimation method in a Monte Carlo study.
arXiv link: http://arxiv.org/abs/1902.08502v1
Nonparametric Counterfactuals in Random Utility Models
utility model of demand, i.e. if observable choices are repeated cross-sections
and one allows for unrestricted, unobserved heterogeneity. In this setting,
tight bounds are developed on counterfactual discrete choice probabilities and
on the expectation and c.d.f. of (functionals of) counterfactual stochastic
demand.
arXiv link: http://arxiv.org/abs/1902.08350v2
Robust Ranking of Happiness Outcomes: A Median Regression Perspective
mean ranking of happiness outcomes (and other ordinal data) across groups.
However, it has been recently highlighted that such ranking may not be
identified in most happiness applications. We suggest researchers focus on
median comparison instead of the mean. This is because the median rank can be
identified even if the mean rank is not. Furthermore, median ranks in probit
and logit models can be readily estimated using standard statistical softwares.
The median ranking, as well as ranking for other quantiles, can also be
estimated semiparametrically and we provide a new constrained mixed integer
optimization procedure for implementation. We apply it to estimate a happiness
equation using General Social Survey data of the US.
arXiv link: http://arxiv.org/abs/1902.07696v3
Eliciting ambiguity with mixing bets
single event. The validity of the approach is discussed for multiple preference
classes including maxmin, maxmax, variational, and smooth second-order
preferences. An experimental implementation suggests that participants perceive
almost as much ambiguity for the stock index and actions of other participants
as for the Ellsberg urn, indicating the importance of ambiguity in real-world
decision-making.
arXiv link: http://arxiv.org/abs/1902.07447v5
Estimation and Inference for Synthetic Control Methods with Spillover Effects
with panel data where only a few units are treated and a small number of
post-treatment periods are available. Current estimation and inference
procedures for synthetic control methods do not allow for the existence of
spillover effects, which are plausible in many applications. In this paper, we
consider estimation and inference for synthetic control methods, allowing for
spillover effects. We propose estimators for both direct treatment effects and
spillover effects and show they are asymptotically unbiased. In addition, we
propose an inferential procedure and show it is asymptotically unbiased. Our
estimation and inference procedure applies to cases with multiple treated units
or periods, and where the underlying factor model is either stationary or
cointegrated. In simulations, we confirm that the presence of spillovers
renders current methods biased and have distorted sizes, whereas our methods
yield properly sized tests and retain reasonable power. We apply our method to
a classic empirical example that investigates the effect of California's
tobacco control program as in Abadie et al. (2010) and find evidence of
spillovers.
arXiv link: http://arxiv.org/abs/1902.07343v2
Estimating Network Effects Using Naturally Occurring Peer Notification Queue Counterfactuals
of a feature on the behavior of users by creating two parallel universes in
which members are simultaneously assigned to treatment and control. However, in
social network settings, members interact, such that the impact of a feature is
not always contained within the treatment group. Researchers have developed a
number of experimental designs to estimate network effects in social settings.
Alternatively, naturally occurring exogenous variation, or 'natural
experiments,' allow researchers to recover causal estimates of peer effects
from observational data in the absence of experimental manipulation. Natural
experiments trade off the engineering costs and some of the ethical concerns
associated with network randomization with the search costs of finding
situations with natural exogenous variation. To mitigate the search costs
associated with discovering natural counterfactuals, we identify a common
engineering requirement used to scale massive online systems, in which natural
exogenous variation is likely to exist: notification queueing. We identify two
natural experiments on the LinkedIn platform based on the order of notification
queues to estimate the causal impact of a received message on the engagement of
a recipient. We show that receiving a message from another member significantly
increases a member's engagement, but that some popular observational
specifications, such as fixed-effects estimators, overestimate this effect by
as much as 2.7x. We then apply the estimated network effect coefficients to a
large body of past experiments to quantify the extent to which it changes our
interpretation of experimental results. The study points to the benefits of
using messaging queues to discover naturally occurring counterfactuals for the
estimation of causal effects without experimenter intervention.
arXiv link: http://arxiv.org/abs/1902.07133v1
Discrete Choice under Risk with Limited Consideration
on observed choices from a finite set of risky alternatives. We propose a
discrete choice model with unobserved heterogeneity in consideration sets and
in standard risk aversion. We obtain sufficient conditions for the model's
semi-nonparametric point identification, including in cases where consideration
depends on preferences and on some of the exogenous variables. Our method
yields an estimator that is easy to compute and is applicable in markets with
large choice sets. We illustrate its properties using a dataset on property
insurance purchases.
arXiv link: http://arxiv.org/abs/1902.06629v3
Semiparametric correction for endogenous truncation bias with Vox Populi based participation decision
development of semiparametric endogenous truncation-proof algorithm, correcting
for truncation bias due to endogenous self-selection. This synthesis enriches
the algorithm's accuracy, efficiency and applicability. Improving upon the
covariate shift assumption, data are intrinsically affected and largely
generated by their own behavior (cognition). Refining the concept of Vox Populi
(Wisdom of Crowd) allows data points to sort themselves out depending on their
estimated latent reference group opinion space. Monte Carlo simulations, based
on 2,000,000 different distribution functions, practically generating 100
million realizations, attest to a very high accuracy of our model.
arXiv link: http://arxiv.org/abs/1902.06286v1
Weak Identification and Estimation of Social Interaction Models
variation, the structure of the network or the relative position in the
network. I provide easy-to-verify necessary conditions for identification of
undirected network models based on the number of distinct eigenvalues of the
adjacency matrix. Identification of network effects is possible; although in
many empirical situations existing identification strategies may require the
use of many instruments or instruments that could be strongly correlated with
each other. The use of highly correlated instruments or many instruments may
lead to weak identification or many instruments bias. This paper proposes
regularized versions of the two-stage least squares (2SLS) estimators as a
solution to these problems. The proposed estimators are consistent and
asymptotically normal. A Monte Carlo study illustrates the properties of the
regularized estimators. An empirical application, assessing a local government
tax competition model, shows the empirical relevance of using regularization
methods.
arXiv link: http://arxiv.org/abs/1902.06143v1
Partial Identification in Matching Models for the Marriage Market
one-to-one matching model with perfectly transferable utilities. We do so
without imposing parametric distributional assumptions on the unobserved
heterogeneity and with data on one large market. We provide a tractable
characterisation of the identified set under various classes of nonparametric
distributional assumptions on the unobserved heterogeneity. Using our
methodology, we re-examine some of the relevant questions in the empirical
literature on the marriage market, which have been previously studied under the
Logit assumption. Our results reveal that many findings in the aforementioned
literature are primarily driven by such parametric restrictions.
arXiv link: http://arxiv.org/abs/1902.05610v6
Censored Quantile Regression Forests
limited in their usage in the presence of randomly censored observations, and
naively applied can exhibit poor predictive performance due to the incurred
biases. Based on a local adaptive representation of random forests, we develop
its regression adjustment for randomly censored regression quantile models.
Regression adjustment is based on new estimating equations that adapt to
censoring and lead to quantile score whenever the data do not exhibit
censoring. The proposed procedure named censored quantile regression forest,
allows us to estimate quantiles of time-to-event without any parametric
modeling assumption. We establish its consistency under mild model
specifications. Numerical studies showcase a clear advantage of the proposed
procedure.
arXiv link: http://arxiv.org/abs/1902.03327v1
Testing the Order of Multivariate Normal Mixture Models
empirical applications in diverse fields such as statistical genetics and
statistical finance. Testing the number of components in multivariate normal
mixture models is a long-standing challenge even in the most important case of
testing homogeneity. This paper develops likelihood-based tests of the null
hypothesis of $M_0$ components against the alternative hypothesis of $M_0 + 1$
components for a general $M_0 \geq 1$. For heteroscedastic normal mixtures, we
propose an EM test and derive the asymptotic distribution of the EM test
statistic. For homoscedastic normal mixtures, we derive the asymptotic
distribution of the likelihood ratio test statistic. We also derive the
asymptotic distribution of the likelihood ratio test statistic and EM test
statistic under local alternatives and show the validity of parametric
bootstrap. The simulations show that the proposed test has good finite sample
size and power properties.
arXiv link: http://arxiv.org/abs/1902.02920v1
A Bootstrap Test for the Existence of Moments for GARCH Processes
and the innovation moments by means of bootstrap to test for the existence of
moments for GARCH(p,q) processes. We propose a residual bootstrap to mimic the
joint distribution of the quasi-maximum likelihood estimators and the empirical
moments of the residuals and also prove its validity. A bootstrap-based test
for the existence of moments is proposed, which provides asymptotically
correctly-sized tests without losing its consistency property. It is simple to
implement and extends to other GARCH-type settings. A simulation study
demonstrates the test's size and power properties in finite samples and an
empirical application illustrates the testing approach.
arXiv link: http://arxiv.org/abs/1902.01808v3
A General Framework for Prediction in Time Series Models
series models and show how a wide class of popular time series models satisfies
this framework. We postulate a set of high-level assumptions, and formally
verify these assumptions for the aforementioned time series models. Our
framework coincides with that of Beutner et al. (2019, arXiv:1710.00643) who
establish the validity of conditional confidence intervals for predictions made
in this framework. The current paper therefore complements the results in
Beutner et al. (2019, arXiv:1710.00643) by providing practically relevant
applications of their theory.
arXiv link: http://arxiv.org/abs/1902.01622v1
Asymptotic Theory for Clustered Samples
a large number of independent groups, generalizing the classic laws of large
numbers, uniform laws, central limit theory, and clustered covariance matrix
estimation. Our theory allows for clustered observations with heterogeneous and
unbounded cluster sizes. Our conditions cleanly nest the classical results for
i.n.i.d. observations, in the sense that our conditions specialize to the
classical conditions under independent sampling. We use this theory to develop
a full asymptotic distribution theory for estimation based on linear
least-squares, 2SLS, nonlinear MLE, and nonlinear GMM.
arXiv link: http://arxiv.org/abs/1902.01497v1
A Sieve-SMM Estimator for Dynamic Models
for the parameters and the distribution of the shocks in nonlinear dynamic
models where the likelihood and the moments are not tractable. An important
concern with SMM, which matches sample with simulated moments, is that a
parametric distribution is required. However, economic quantities that depend
on this distribution, such as welfare and asset-prices, can be sensitive to
misspecification. The Sieve-SMM estimator addresses this issue by flexibly
approximating the distribution of the shocks with a Gaussian and tails mixture
sieve. The asymptotic framework provides consistency, rate of convergence and
asymptotic normality results, extending existing results to a new framework
with more general dynamics and latent variables. An application to asset
pricing in a production economy shows a large decline in the estimates of
relative risk-aversion, highlighting the empirical relevance of
misspecification bias.
arXiv link: http://arxiv.org/abs/1902.01456v4
Factor Investing: A Bayesian Hierarchical Approach
predictable. We introduce a market-timing Bayesian hierarchical (BH) approach
that adopts heterogeneous time-varying coefficients driven by lagged
fundamental characteristics. Our approach includes a joint estimation of
conditional expected returns and covariance matrix and considers estimation
risk for portfolio analysis. The hierarchical prior allows modeling different
assets separately while sharing information across assets. We demonstrate the
performance of the U.S. equity market. Though the Bayesian forecast is slightly
biased, our BH approach outperforms most alternative methods in point and
interval prediction. Our BH approach in sector investment for the recent twenty
years delivers a 0.92% average monthly returns and a 0.32% significant
Jensen`s alpha. We also find technology, energy, and manufacturing are
important sectors in the past decade, and size, investment, and short-term
reversal factors are heavily weighted. Finally, the stochastic discount factor
constructed by our BH approach explains most anomalies.
arXiv link: http://arxiv.org/abs/1902.01015v3
Approaches Toward the Bayesian Estimation of the Stochastic Volatility Model with Leverage
volatility (SV) models is known to highly depend on the actual parameter
values, and the effectiveness of samplers based on different parameterizations
varies significantly. We derive novel algorithms for the centered and the
non-centered parameterizations of the practically highly relevant SV model with
leverage, where the return process and innovations of the volatility process
are allowed to correlate. Moreover, based on the idea of
ancillarity-sufficiency interweaving (ASIS), we combine the resulting samplers
in order to guarantee stable sampling efficiency irrespective of the baseline
parameterization.We carry out an extensive comparison to already existing
sampling methods for this model using simulated as well as real world data.
arXiv link: http://arxiv.org/abs/1901.11491v2
A dynamic factor model approach to incorporate Big Data in state space models for official statistics
models using a dynamic factor approach to incorporate auxiliary information
from high-dimensional data sources. We apply the methodology to unemployment
estimation as done by Statistics Netherlands, who uses a multivariate state
space model to produce monthly figures for the unemployment using series
observed with the labour force survey (LFS). We extend the model by including
auxiliary series of Google Trends about job-search and economic uncertainty,
and claimant counts, partially observed at higher frequencies. Our factor model
allows for nowcasting the variable of interest, providing reliable unemployment
estimates in real-time before LFS data become available.
arXiv link: http://arxiv.org/abs/1901.11355v2
Volatility Models Applied to Geophysics and High Frequency Financial Market Data
series. A class of volatility models with time-varying parameters is presented
to forecast the volatility of time series in a stationary environment. The
modeling of stationary time series with consistent properties facilitates
prediction with much certainty. Using the GARCH and stochastic volatility
model, we forecast one-step-ahead suggested volatility with +/- 2 standard
prediction errors, which is enacted via Maximum Likelihood Estimation. We
compare the stochastic volatility model relying on the filtering technique as
used in the conditional volatility with the GARCH model. We conclude that the
stochastic volatility is a better forecasting tool than GARCH (1, 1), since it
is less conditioned by autoregressive past information.
arXiv link: http://arxiv.org/abs/1901.09145v1
Orthogonal Statistical Learning
a setting where the population risk with respect to which we evaluate the
target parameter depends on an unknown nuisance parameter that must be
estimated from data. We analyze a two-stage sample splitting meta-algorithm
that takes as input arbitrary estimation algorithms for the target parameter
and nuisance parameter. We show that if the population risk satisfies a
condition called Neyman orthogonality, the impact of the nuisance estimation
error on the excess risk bound achieved by the meta-algorithm is of second
order. Our theorem is agnostic to the particular algorithms used for the target
and nuisance and only makes an assumption on their individual performance. This
enables the use of a plethora of existing results from machine learning to give
new guarantees for learning with a nuisance component. Moreover, by focusing on
excess risk rather than parameter estimation, we can provide rates under weaker
assumptions than in previous works and accommodate settings in which the target
parameter belongs to a complex nonparametric class. We provide conditions on
the metric entropy of the nuisance and target classes such that oracle rates of
the same order as if we knew the nuisance parameter are achieved.
arXiv link: http://arxiv.org/abs/1901.09036v4
The Wisdom of a Kalman Crowd
statistics during the 20th century. Its purpose is to measure the state of a
system by processing the noisy data received from different electronic sensors.
In comparison, a useful resource for managers in their effort to make the right
decisions is the wisdom of crowds. This phenomenon allows managers to combine
judgments by different employees to get estimates that are often more accurate
and reliable than estimates, which managers produce alone. Since harnessing the
collective intelligence of employees, and filtering signals from multiple noisy
sensors appear related, we looked at the possibility of using the Kalman Filter
on estimates by people. Our predictions suggest, and our findings based on the
Survey of Professional Forecasters reveal, that the Kalman Filter can help
managers solve their decision-making problems by giving them stronger signals
before they choose. Indeed, when used on a subset of forecasters identified by
the Contribution Weighted Model, the Kalman Filter beat that rule clearly,
across all the forecasting horizons in the survey.
arXiv link: http://arxiv.org/abs/1901.08133v1
lassopack: Model selection and prediction with regularized regression in Stata
regression in Stata. lassopack implements lasso, square-root lasso, elastic
net, ridge regression, adaptive lasso and post-estimation OLS. The methods are
suitable for the high-dimensional setting where the number of predictors $p$
may be large and possibly greater than the number of observations, $n$. We
offer three different approaches for selecting the penalization (`tuning')
parameters: information criteria (implemented in lasso2), $K$-fold
cross-validation and $h$-step ahead rolling cross-validation for cross-section,
panel and time-series data (cvlasso), and theory-driven (`rigorous')
penalization for the lasso and square-root lasso for cross-section and panel
data (rlasso). We discuss the theoretical framework and practical
considerations for each approach. We also present Monte Carlo results to
compare the performance of the penalization approaches.
arXiv link: http://arxiv.org/abs/1901.05397v1
Inference on Functionals under First Order Degeneracy
conducting inference on parameters of the form $\phi(\theta_0)$, where
$\theta_0$ is unknown but can be estimated by $\hat\theta_n$, and $\phi$ is a
known map that admits null first order derivative at $\theta_0$. For a large
number of examples in the literature, the second order Delta method reveals a
nondegenerate weak limit for the plug-in estimator $\phi(\hat\theta_n)$. We
show, however, that the `standard' bootstrap is consistent if and only if the
second order derivative $\phi_{\theta_0}”=0$ under regularity conditions,
i.e., the standard bootstrap is inconsistent if $\phi_{\theta_0}”\neq 0$, and
provides degenerate limits unhelpful for inference otherwise. We thus identify
a source of bootstrap failures distinct from that in Fang and Santos (2018)
because the problem (of consistently bootstrapping a nondegenerate
limit) persists even if $\phi$ is differentiable. We show that the correction
procedure in Babu (1984) can be extended to our general setup. Alternatively, a
modified bootstrap is proposed when the map is in addition second
order nondifferentiable. Both are shown to provide local size control under
some conditions. As an illustration, we develop a test of common conditional
heteroskedastic (CH) features, a setting with both degeneracy and
nondifferentiability -- the latter is because the Jacobian matrix is degenerate
at zero and we allow the existence of multiple common CH features.
arXiv link: http://arxiv.org/abs/1901.04861v1
Mastering Panel 'Metrics: Causal Impact of Democracy on Growth
interest. We revisit the panel data analysis of this relationship by Acemoglu,
Naidu, Restrepo and Robinson (forthcoming) using state of the art econometric
methods. We argue that this and lots of other panel data settings in economics
are in fact high-dimensional, resulting in principal estimators -- the fixed
effects (FE) and Arellano-Bond (AB) estimators -- to be biased to the degree
that invalidates statistical inference. We can however remove these biases by
using simple analytical and sample-splitting methods, and thereby restore valid
statistical inference. We find that the debiased FE and AB estimators produce
substantially higher estimates of the long-run effect of democracy on growth,
providing even stronger support for the key hypothesis in Acemoglu, Naidu,
Restrepo and Robinson (forthcoming). Given the ubiquitous nature of panel data,
we conclude that the use of debiased panel data estimators should substantially
improve the quality of empirical inference in economics.
arXiv link: http://arxiv.org/abs/1901.03821v1
Non-Parametric Inference Adaptive to Intrinsic Dimension
models in high dimensions. We show that even when the dimension $D$ of the
conditioning variable is larger than the sample size $n$, estimation and
inference is feasible as long as the distribution of the conditioning variable
has small intrinsic dimension $d$, as measured by locally low doubling
measures. Our estimation is based on a sub-sampled ensemble of the $k$-nearest
neighbors ($k$-NN) $Z$-estimator. We show that if the intrinsic dimension of
the covariate distribution is equal to $d$, then the finite sample estimation
error of our estimator is of order $n^{-1/(d+2)}$ and our estimate is
$n^{1/(d+2)}$-asymptotically normal, irrespective of $D$. The sub-sampling size
required for achieving these results depends on the unknown intrinsic dimension
$d$. We propose an adaptive data-driven approach for choosing this parameter
and prove that it achieves the desired rates. We discuss extensions and
applications to heterogeneous treatment effect estimation.
arXiv link: http://arxiv.org/abs/1901.03719v3
Community Matters: Heterogeneous Impacts of a Sanitation Intervention
aimed at improving sanitation using a cluster-randomized controlled trial (RCT)
in Nigerian communities. The intervention, Community-Led Total Sanitation
(CLTS), is currently part of national sanitation policy in more than 25
countries. While average impacts are exiguous almost three years after
implementation at scale, the results hide important heterogeneity: the
intervention has strong and lasting effects on sanitation practices in poorer
communities. These are realized through increased sanitation investments. We
show that community wealth, widely available in secondary data, is a key
statistic for effective intervention targeting. Using data from five other
similar randomized interventions in various contexts, we find that
community-level wealth heterogeneity can rationalize the wide range of impact
estimates in the literature. This exercise provides plausible external validity
to our findings, with implications for intervention scale-up. JEL Codes: O12,
I12, I15, I18.
arXiv link: http://arxiv.org/abs/1901.03544v5
Estimating population average treatment effects from experiments with noncompliance
effects, but often use samples that are non-representative of the actual
population of interest. We propose a reweighting method for estimating
population average treatment effects in settings with noncompliance.
Simulations show the proposed compliance-adjusted population estimator
outperforms its unadjusted counterpart when compliance is relatively low and
can be predicted by observed covariates. We apply the method to evaluate the
effect of Medicaid coverage on health care use for a target population of
adults who may benefit from expansions to the Medicaid program. We draw RCT
data from the Oregon Health Insurance Experiment, where less than one-third of
those randomly selected to receive Medicaid benefits actually enrolled.
arXiv link: http://arxiv.org/abs/1901.02991v3
Dynamic tail inference with log-Laplace volatility
time-varying extreme event probabilities in heavy-tailed and nonlinearly
dependent time series. The models are a white noise process with conditionally
log-Laplace stochastic volatility. In contrast to other, similar stochastic
volatility formalisms, this process has analytic expressions for its
conditional probabilistic structure that enable straightforward estimation of
dynamically changing extreme event probabilities. The process and volatility
are conditionally Pareto-tailed, with tail exponent given by the reciprocal of
the log-volatility's mean absolute innovation. This formalism can accommodate a
wide variety of nonlinear dependence, as well as conditional power law-tail
behavior ranging from weakly non-Gaussian to Cauchy-like tails. We provide a
computationally straightforward estimation procedure that uses an asymptotic
approximation of the process' dynamic large deviation probabilities. We
demonstrate the estimator's utility with a simulation study. We then show the
method's predictive capabilities on a simulated nonlinear time series where the
volatility is driven by the chaotic Lorenz system. Lastly we provide an
empirical application, which shows that this simple modeling method can be
effectively used for dynamic and predictive tail inference in financial time
series.
arXiv link: http://arxiv.org/abs/1901.02419v5
Semi-parametric dynamic contextual pricing
consider the problem of revenue-maximization in a setting where the seller can
leverage contextual information describing the customer's history and the
product's type to predict her valuation of the product. However, her true
valuation is unobservable to the seller, only binary outcome in the form of
success-failure of a transaction is observed. Unlike in usual contextual bandit
settings, the optimal price/arm given a covariate in our setting is sensitive
to the detailed characteristics of the residual uncertainty distribution. We
develop a semi-parametric model in which the residual distribution is
non-parametric and provide the first algorithm which learns both regression
parameters and residual distribution with $\tilde O(n)$ regret. We
empirically test a scalable implementation of our algorithm and observe good
performance.
arXiv link: http://arxiv.org/abs/1901.02045v4
Shrinkage for Categorical Regressors
estimation risk of group means stemming from e.g. categorical regressors,
(quasi-)experimental data or panel data models. The loss function is penalized
by adding weighted squared l2-norm differences between group location
parameters and informative first-stage estimates. Under quadratic loss, the
penalized estimation problem has a simple interpretable closed-form solution
that nests methods established in the literature on ridge regression,
discretized support smoothing kernels and model averaging methods. We derive
risk-optimal penalty parameters and propose a plug-in approach for estimation.
The large sample properties are analyzed in an asymptotic local to zero
framework by introducing a class of sequences for close and distant systems of
locations that is sufficient for describing a large range of data generating
processes. We provide the asymptotic distributions of the shrinkage estimators
under different penalization schemes. The proposed plug-in estimator uniformly
dominates the ordinary least squares in terms of asymptotic risk if the number
of groups is larger than three. Monte Carlo simulations reveal robust
improvements over standard methods in finite samples. Real data examples of
estimating time trends in a panel and a difference-in-differences study
illustrate potential applications.
arXiv link: http://arxiv.org/abs/1901.01898v1
Nonparametric Instrumental Variables Estimation Under Misspecification
conditional moment restriction. We show that if this moment condition is even
slightly misspecified, say because instruments are not quite valid, then NPIV
estimates can be subject to substantial asymptotic error and the identified set
under a relaxed moment condition may be large. Imposing strong a priori
smoothness restrictions mitigates the problem but induces bias if the
restrictions are too strong. In order to manage this trade-off we develop a
methods for empirical sensitivity analysis and apply them to the consumer
demand data previously analyzed in Blundell (2007) and Horowitz (2011).
arXiv link: http://arxiv.org/abs/1901.01241v7
Modeling Dynamic Transport Network with Matrix Factor Models: with an Application to International Trade Flow
and shed light on wider issues relating to poverty, development, migration,
productivity, and economy. With recent advances in information technology,
global and regional agencies distribute an enormous amount of internationally
comparable trading data among a large number of countries over time, providing
a goldmine for empirical analysis of international trade. Meanwhile, an array
of new statistical methods are recently developed for dynamic network analysis.
However, these advanced methods have not been utilized for analyzing such
massive dynamic cross-country trading data. International trade data can be
viewed as a dynamic transport network because it emphasizes the amount of goods
moving across a network. Most literature on dynamic network analysis
concentrates on the connectivity network that focuses on link formation or
deformation rather than the transport moving across the network. We take a
different perspective from the pervasive node-and-edge level modeling: the
dynamic transport network is modeled as a time series of relational matrices.
We adopt a matrix factor model of wang2018factor, with a specific
interpretation for the dynamic transport network. Under the model, the observed
surface network is assumed to be driven by a latent dynamic transport network
with lower dimensions. The proposed method is able to unveil the latent dynamic
structure and achieve the objective of dimension reduction. We applied the
proposed framework and methodology to a data set of monthly trading volumes
among 24 countries and regions from 1982 to 2015. Our findings shed light on
trading hubs, centrality, trends and patterns of international trade and show
matching change points to trading policies. The dataset also provides a fertile
ground for future research on international trade.
arXiv link: http://arxiv.org/abs/1901.00769v1
Salvaging Falsified Instrumental Variable Models
four constructive answers. First, researchers can measure the extent of
falsification. To do this, we consider continuous relaxations of the baseline
assumptions of concern. We then define the falsification frontier: The smallest
relaxations of the baseline model which are not refuted. This frontier provides
a quantitative measure of the extent of falsification. Second, researchers can
present the identified set for the parameter of interest under the assumption
that the true model lies somewhere on this frontier. We call this the
falsification adaptive set. This set generalizes the standard baseline estimand
to account for possible falsification. Third, researchers can present the
identified set for a specific point on this frontier. Finally, as a sensitivity
analysis, researchers can present identified sets for points beyond the
frontier. To illustrate these four ways of salvaging falsified models, we study
overidentifying restrictions in two instrumental variable models: a homogeneous
effects linear model, and heterogeneous effect models with either binary or
continuous outcomes. In the linear model, we consider the classical
overidentifying restrictions implied when multiple instruments are observed. We
generalize these conditions by considering continuous relaxations of the
classical exclusion restrictions. By sufficiently weakening the assumptions, a
falsified baseline model becomes non-falsified. We obtain analogous results in
the heterogeneous effect models, where we derive identified sets for marginal
distributions of potential outcomes, falsification frontiers, and falsification
adaptive sets under continuous relaxations of the instrument exogeneity
assumptions. We illustrate our results in four different empirical
applications.
arXiv link: http://arxiv.org/abs/1812.11598v3
Dynamic Models with Robust Decision Makers: Identification and Estimation
in which the decision maker (DM) is uncertain about the data-generating
process. The DM surrounds a benchmark model that he or she fears is
misspecified by a set of models. Decisions are evaluated under a worst-case
model delivering the lowest utility among all models in this set. The DM's
benchmark model and preference parameters are jointly underidentified. With the
benchmark model held fixed, primitive conditions are established for
identification of the DM's worst-case model and preference parameters. The key
step in the identification analysis is to establish existence and uniqueness of
the DM's continuation value function allowing for unbounded statespace and
unbounded utilities. To do so, fixed-point results are derived for monotone,
convex operators that act on a Banach space of thin-tailed functions arising
naturally from the structure of the continuation value recursion. The
fixed-point results are quite general; applications to models with learning and
Rust-type dynamic discrete choice models are also discussed. For estimation, a
perturbation result is derived which provides a necessary and sufficient
condition for consistent estimation of continuation values and the worst-case
model. The result also allows convergence rates of estimators to be
characterized. An empirical application studies an endowment economy where the
DM's benchmark model may be interpreted as an aggregate of experts' forecasting
models. The application reveals time-variation in the way the DM
pessimistically distorts benchmark probabilities. Consequences for asset
pricing are explored and connections are drawn with the literature on
macroeconomic uncertainty.
arXiv link: http://arxiv.org/abs/1812.11246v3
Predicting "Design Gaps" in the Market: Deep Consumer Choice Models under Probabilistic Design Constraints
a fundamental goal of product design firms. There is accordingly a long history
of quantitative approaches that aim to capture diverse consumer preferences,
and then translate those preferences to corresponding "design gaps" in the
market. We extend this work by developing a deep learning approach to predict
design gaps in the market. These design gaps represent clusters of designs that
do not yet exist, but are predicted to be both (1) highly preferred by
consumers, and (2) feasible to build under engineering and manufacturing
constraints. This approach is tested on the entire U.S. automotive market using
of millions of real purchase data. We retroactively predict design gaps in the
market, and compare predicted design gaps with actual known successful designs.
Our preliminary results give evidence it may be possible to predict design
gaps, suggesting this approach has promise for early identification of market
opportunity.
arXiv link: http://arxiv.org/abs/1812.11067v1
Decentralization Estimators for Instrumental Variable Quantile Regression Models
Hansen, 2005) is a popular tool for estimating causal quantile effects with
endogenous covariates. However, estimation is complicated by the non-smoothness
and non-convexity of the IVQR GMM objective function. This paper shows that the
IVQR estimation problem can be decomposed into a set of conventional quantile
regression sub-problems which are convex and can be solved efficiently. This
reformulation leads to new identification results and to fast, easy to
implement, and tuning-free estimators that do not require the availability of
high-level "black box" optimization routines.
arXiv link: http://arxiv.org/abs/1812.10925v4
Semiparametric Difference-in-Differences with Potentially Many Control Variables
exist many control variables, potentially more than the sample size. In this
case, traditional estimation methods, which require a limited number of
variables, do not work. One may consider using statistical or machine learning
(ML) methods. However, by the well-known theory of inference of ML methods
proposed in Chernozhukov et al. (2018), directly applying ML methods to the
conventional semiparametric DID estimators will cause significant bias and make
these DID estimators fail to be sqrt{N}-consistent. This article proposes three
new DID estimators for three different data structures, which are able to
shrink the bias and achieve sqrt{N}-consistency and asymptotic normality with
mean zero when applying ML methods. This leads to straightforward inferential
procedures. In addition, I show that these new estimators have the small bias
property (SBP), meaning that their bias will converge to zero faster than the
pointwise bias of the nonparametric estimator on which it is based.
arXiv link: http://arxiv.org/abs/1812.10846v3
Debiasing and $t$-tests for synthetic control inference on average causal effects
treatment effects estimated by synthetic controls. We develop a $K$-fold
cross-fitting procedure for bias correction. To avoid the difficult estimation
of the long-run variance, inference is based on a self-normalized
$t$-statistic, which has an asymptotically pivotal $t$-distribution. Our
$t$-test is easy to implement, provably robust against misspecification, and
valid with stationary and non-stationary data. It demonstrates an excellent
small sample performance in application-based simulations and performs well
relative to other methods. We illustrate the usefulness of the $t$-test by
revisiting the effect of carbon taxes on emissions.
arXiv link: http://arxiv.org/abs/1812.10820v9
How to avoid the zero-power trap in testing for correlation
tests can be very low for strongly correlated errors. This counterintuitive
phenomenon has become known as the "zero-power trap". Despite a considerable
amount of literature devoted to this problem, mainly focusing on its detection,
a convincing solution has not yet been found. In this article we first discuss
theoretical results concerning the occurrence of the zero-power trap
phenomenon. Then, we suggest and compare three ways to avoid it. Given an
initial test that suffers from the zero-power trap, the method we recommend for
practice leads to a modified test whose power converges to one as the
correlation gets very strong. Furthermore, the modified test has approximately
the same power function as the initial test, and thus approximately preserves
all of its optimality properties. We also provide some numerical illustrations
in the context of testing for network generated correlation.
arXiv link: http://arxiv.org/abs/1812.10752v1
Synthetic Difference in Differences
insights behind the widely used difference in differences and synthetic control
methods. Relative to these methods we find, both theoretically and empirically,
that this "synthetic difference in differences" estimator has desirable
robustness properties, and that it performs well in settings where the
conventional estimators are commonly used in practice. We study the asymptotic
behavior of the estimator when the systematic part of the outcome model
includes latent unit factors interacted with latent time factors, and we
present conditions for consistency and asymptotic normality.
arXiv link: http://arxiv.org/abs/1812.09970v4
Robust Tests for Convergence Clubs
cross-sectional units is large and the number of time periods are few. In these
situations asymptotic tests based on an omnibus null hypothesis are
characterised by a number of problems. In this paper we propose a multiple
pairwise comparisons method based on an a recursive bootstrap to test for
convergence with no prior information on the composition of convergence clubs.
Monte Carlo simulations suggest that our bootstrap-based test performs well to
correctly identify convergence clubs when compared with other similar tests
that rely on asymptotic arguments. Across a potentially large number of
regions, using both cross-country and regional data for the European Union, we
find that the size distortion which afflicts standard tests and results in a
bias towards finding less convergence, is ameliorated when we utilise our
bootstrap test.
arXiv link: http://arxiv.org/abs/1812.09518v1
Modified Causal Forests for Estimating Heterogeneous Causal Effects
decisions at various levels of granularity provides substantial value to
decision makers. This paper develops new estimation and inference procedures
for multiple treatment models in a selection-on-observables framework by
modifying the Causal Forest approach suggested by Wager and Athey (2018) in
several dimensions. The new estimators have desirable theoretical,
computational and practical properties for various aggregation levels of the
causal effects. While an Empirical Monte Carlo study suggests that they
outperform previously suggested estimators, an application to the evaluation of
an active labour market programme shows the value of the new methods for
applied research.
arXiv link: http://arxiv.org/abs/1812.09487v2
Functional Sequential Treatment Allocation
observing each outcome before the next subject arrives. Initially, it is
unknown which treatment is best, but the sequential nature of the problem
permits learning about the effectiveness of the treatments. While the
multi-armed-bandit literature has shed much light on the situation when the
policy maker compares the effectiveness of the treatments through their mean,
much less is known about other targets. This is restrictive, because a cautious
decision maker may prefer to target a robust location measure such as a
quantile or a trimmed mean. Furthermore, socio-economic decision making often
requires targeting purpose specific characteristics of the outcome
distribution, such as its inherent degree of inequality, welfare or poverty. In
the present paper we introduce and study sequential learning algorithms when
the distributional characteristic of interest is a general functional of the
outcome distribution. Minimax expected regret optimality results are obtained
within the subclass of explore-then-commit policies, and for the unrestricted
class of all policies.
arXiv link: http://arxiv.org/abs/1812.09408v8
Many Average Partial Effects: with An Application to Text Regression
intervals for many average partial effects of lasso Logit. Focusing on
high-dimensional, cluster-sampling environments, we propose a new average
partial effect estimator and explore its asymptotic properties. Practical
penalty choices compatible with our asymptotic theory are also provided. The
proposed estimator allow for valid inference without requiring oracle property.
We provide easy-to-implement algorithms for cluster-robust high-dimensional
hypothesis testing and construction of simultaneously valid confidence
intervals using a multiplier cluster bootstrap. We apply the proposed
algorithms to the text regression model of Wu (2018) to examine the presence of
gendered language on the internet.
arXiv link: http://arxiv.org/abs/1812.09397v5
Selection and the Distribution of Female Hourly Wages in the U.S
observed distribution of female hourly wages in the United States using CPS
data for the years 1975 to 2020. We account for the selection bias from the
employment decision by modeling the distribution of the number of working hours
and estimating a nonseparable model of wages. We decompose changes in the wage
distribution into composition, structural and selection effects. Composition
effects have increased wages at all quantiles while the impact of the
structural effects varies by time period and quantile. Changes in the role of
selection only appear at the lower quantiles of the wage distribution. The
evidence suggests that there is positive selection in the 1970s which
diminishes until the later 1990s. This reduces wages at lower quantiles and
increases wage inequality. Post 2000 there appears to be an increase in
positive sorting which reduces the selection effects on wage inequality.
arXiv link: http://arxiv.org/abs/1901.00419v5
Multivariate Fractional Components Analysis
formulated in terms of latent integrated and short-memory components. It
accommodates nonstationary processes with different fractional orders and
cointegration of different strengths and is applicable in high-dimensional
settings. In an application to realized covariance matrices, we find that
orthogonal short- and long-memory components provide a reasonable fit and
competitive out-of-sample performance compared to several competing methods.
arXiv link: http://arxiv.org/abs/1812.09149v2
Approximate State Space Modelling of Unobserved Fractional Components
multivariate unobserved components models with fractional integration and
cointegration. Based on finite-order ARMA approximations in the state space
representation, maximum likelihood estimation can make use of the EM algorithm
and related techniques. The approximation outperforms the frequently used
autoregressive or moving average truncation, both in terms of computational
costs and with respect to approximation quality. Monte Carlo simulations reveal
good estimation properties of the proposed methods for processes of different
complexity and dimension.
arXiv link: http://arxiv.org/abs/1812.09142v3
Econometric modelling and forecasting of intraday electricity prices
Continuous electricity market using an econometric time series model. A
multivariate approach is conducted for hourly and quarter-hourly products
separately. We estimate the model using lasso and elastic net techniques and
perform an out-of-sample, very short-term forecasting study. The model's
performance is compared with benchmark models and is discussed in detail.
Forecasting results provide new insights to the German Intraday Continuous
electricity market regarding its efficiency and to the ID$_3$-Price behaviour.
arXiv link: http://arxiv.org/abs/1812.09081v2
Multifractal cross-correlations between the World Oil and other Financial Markets in 2012-2017
expressed in US dollar in relation to the most traded currencies as well as to
gold futures and to the E-mini S$&$P500 futures prices on 5 min intra-day
recordings in the period January 2012 - December 2017 are studied. It is shown
that in most of the cases the tails of return distributions of the considered
financial instruments follow the inverse cubic power law. The only exception is
the Russian ruble for which the distribution tail is heavier and scales with
the exponent close to 2. From the perspective of multiscaling the analysed time
series reveal the multifractal organization with the left-sided asymmetry of
the corresponding singularity spectra. Even more, all the considered financial
instruments appear to be multifractally cross-correlated with oil, especially
on the level of medium-size fluctuations, as the multifractal cross-correlation
analysis carried out by means of the multifractal cross-correlation analysis
(MFCCA) and detrended cross-correlation coefficient $\rho_q$ show. The degree
of such cross-correlations is however varying among the financial instruments.
The strongest ties to the oil characterize currencies of the oil extracting
countries. Strength of this multifractal coupling appears to depend also on the
oil market trend. In the analysed time period the level of cross-correlations
systematically increases during the bear phase on the oil market and it
saturates after the trend reversal in 1st half of 2016. The same methodology is
also applied to identify possible causal relations between considered
observables. Searching for some related asymmetry in the information flow
mediating cross-correlations indicates that it was the oil price that led the
Russian ruble over the time period here considered rather than vice versa.
arXiv link: http://arxiv.org/abs/1812.08548v2
A Primal-dual Learning Algorithm for Personalized Dynamic Pricing with an Inventory Constraint
an exogenously given stock of a product over a finite selling horizon to
different consumer types. We assume that the type of an arriving consumer can
be observed but the demand function associated with each type is initially
unknown. The firm sets personalized prices dynamically for each type and
attempts to maximize the revenue over the season. We provide a learning
algorithm that is near-optimal when the demand and capacity scale in
proportion. The algorithm utilizes the primal-dual formulation of the problem
and learns the dual optimal solution explicitly. It allows the algorithm to
overcome the curse of dimensionality (the rate of regret is independent of the
number of types) and sheds light on novel algorithmic designs for learning
problems with resource constraints.
arXiv link: http://arxiv.org/abs/1812.09234v3
Fuzzy Difference-in-Discontinuities: Identification Theory and Application to the Affordable Care Act
multiple treatments are applied at the threshold. The identification results
show that, under the very strong assumption that the change in the probability
of treatment at the cutoff is equal across treatments, a
difference-in-discontinuities estimator identifies the treatment effect of
interest. The point estimates of the treatment effect using a simple fuzzy
difference-in-discontinuities design are biased if the change in the
probability of a treatment applying at the cutoff differs across treatments.
Modifications of the fuzzy difference-in-discontinuities approach that rely on
milder assumptions are also proposed. Our results suggest caution is needed
when applying before-and-after methods in the presence of fuzzy
discontinuities. Using data from the National Health Interview Survey, we apply
this new identification strategy to evaluate the causal effect of the
Affordable Care Act (ACA) on older Americans' health care access and
utilization.
arXiv link: http://arxiv.org/abs/1812.06537v3
What Is the Value Added by Using Causal Machine Learning Methods in a Welfare Experiment Evaluation?
estimate conditional average treatment effects (CATEs). In this study, I
investigate whether CML methods add value compared to conventional CATE
estimators by re-evaluating Connecticut's Jobs First welfare experiment. This
experiment entails a mix of positive and negative work incentives. Previous
studies show that it is hard to tackle the effect heterogeneity of Jobs First
by means of CATEs. I report evidence that CML methods can provide support for
the theoretical labor supply predictions. Furthermore, I document reasons why
some conventional CATE estimators fail and discuss the limitations of CML
methods.
arXiv link: http://arxiv.org/abs/1812.06533v3
Closing the U.S. gender wage gap requires understanding its heterogeneity
significantly less than comparable men. The extent to which women were affected
by gender inequality in earnings, however, depended greatly on socio-economic
characteristics, such as marital status or educational attainment. In this
paper, we analyzed data from the 2016 American Community Survey using a
high-dimensional wage regression and applying double lasso to quantify
heterogeneity in the gender wage gap. We found that the gap varied
substantially across women and was driven primarily by marital status, having
children at home, race, occupation, industry, and educational attainment. We
recommend that policy makers use these insights to design policies that will
reduce discrimination and unequal pay more effectively.
arXiv link: http://arxiv.org/abs/1812.04345v2
A supreme test for periodic explosive GARCH
strictly stationary GARCH$(r,s)$ (generalized autoregressive conditional
heteroskedasticity) process. Namely, we test the null hypothesis of a globally
stable GARCH process with constant parameters against an alternative where
there is an 'abnormal' period with changed parameter values. During this
period, the change may lead to an explosive behavior of the volatility process.
It is assumed that both the magnitude and the timing of the breaks are unknown.
We develop a double supreme test for the existence of a break, and then provide
an algorithm to identify the period of change. Our theoretical results hold
under mild moment assumptions on the innovations of the GARCH process.
Technically, the existing properties for the QMLE in the GARCH model need to be
reinvestigated to hold uniformly over all possible periods of change. The key
results involve a uniform weak Bahadur representation for the estimated
parameters, which leads to weak convergence of the test statistic to the
supreme of a Gaussian Process. In simulations we show that the test has good
size and power for reasonably large time series lengths. We apply the test to
Apple asset returns and Bitcoin returns.
arXiv link: http://arxiv.org/abs/1812.03475v1
Improved Inference on the Rank of a Matrix
of an unknown matrix $\Pi_0$. A defining feature of our setup is the null
hypothesis of the form $\mathrm H_0: rank(\Pi_0)\le r$. The problem is
of first order importance because the previous literature focuses on $\mathrm
H_0': rank(\Pi_0)= r$ by implicitly assuming away
$rank(\Pi_0)<r$, which may lead to invalid rank tests due to
over-rejections. In particular, we show that limiting distributions of test
statistics under $\mathrm H_0'$ may not stochastically dominate those under
$rank(\Pi_0)<r$. A multiple test on the nulls
$rank(\Pi_0)=0,\ldots,r$, though valid, may be substantially
conservative. We employ a testing statistic whose limiting distributions under
$\mathrm H_0$ are highly nonstandard due to the inherent irregular natures of
the problem, and then construct bootstrap critical values that deliver size
control and improved power. Since our procedure relies on a tuning parameter, a
two-step procedure is designed to mitigate concerns on this nuisance. We
additionally argue that our setup is also important for estimation. We
illustrate the empirical relevance of our results through testing
identification in linear IV models that allows for clustered data and inference
on sorting dimensions in a two-sided matching model with transferrable utility.
arXiv link: http://arxiv.org/abs/1812.02337v2
Identifying the Effect of Persuasion
interpretation has been obscure in the literature. By using the potential
outcome framework, we define the causal persuasion rate by a proper conditional
probability of taking the action of interest with a persuasive message
conditional on not taking the action without the message. We then formally
study identification under empirically relevant data scenarios and show that
the commonly adopted measure generally does not estimate, but often overstates,
the causal rate of persuasion. We discuss several new parameters of interest
and provide practical methods for causal inference.
arXiv link: http://arxiv.org/abs/1812.02276v6
Necessary and Probably Sufficient Test for Finding Valid Instrumental Variables
(IV) methods are widely used to identify causal effect, testing their validity
from observed data remains a challenge. This is because validity of an IV
depends on two assumptions, exclusion and as-if-random, that are largely
believed to be untestable from data. In this paper, we show that under certain
conditions, testing for instrumental variables is possible. We build upon prior
work on necessary tests to derive a test that characterizes the odds of being a
valid instrument, thus yielding the name "necessary and probably sufficient".
The test works by defining the class of invalid-IV and valid-IV causal models
as Bayesian generative models and comparing their marginal likelihood based on
observed data. When all variables are discrete, we also provide a method to
efficiently compute these marginal likelihoods.
We evaluate the test on an extensive set of simulations for binary data,
inspired by an open problem for IV testing proposed in past work. We find that
the test is most powerful when an instrument follows monotonicity---effect on
treatment is either non-decreasing or non-increasing---and has moderate-to-weak
strength; incidentally, such instruments are commonly used in observational
studies. Among as-if-random and exclusion, it detects exclusion violations with
higher power. Applying the test to IVs from two seminal studies on instrumental
variables and five recent studies from the American Economic Review shows that
many of the instruments may be flawed, at least when all variables are
discretized. The proposed test opens the possibility of data-driven validation
and search for instrumental variables.
arXiv link: http://arxiv.org/abs/1812.01412v1
Column Generation Algorithms for Nonparametric Analysis of Random Utility Models
constraints, when these are are represented as vertices of a polyhedron instead
of its faces. They implement this test for an application to nonparametric
tests of Random Utility Models. As they note in their paper, testing such
models is computationally challenging. In this paper, we develop and implement
more efficient algorithms, based on column generation, to carry out the test.
These improved algorithms allow us to tackle larger datasets.
arXiv link: http://arxiv.org/abs/1812.01400v1
Doubly Robust Difference-in-Differences Estimators
effect on the treated (ATT) in difference-in-differences (DID) research
designs. In contrast to alternative DID estimators, the proposed estimators are
consistent if either (but not necessarily both) a propensity score or outcome
regression working models are correctly specified. We also derive the
semiparametric efficiency bound for the ATT in DID designs when either panel or
repeated cross-section data are available, and show that our proposed
estimators attain the semiparametric efficiency bound when the working models
are correctly specified. Furthermore, we quantify the potential efficiency
gains of having access to panel data instead of repeated cross-section data.
Finally, by paying articular attention to the estimation method used to
estimate the nuisance parameters, we show that one can sometimes construct
doubly robust DID estimators for the ATT that are also doubly robust for
inference. Simulation studies and an empirical application illustrate the
desirable finite-sample performance of the proposed estimators. Open-source
software for implementing the proposed policy evaluation tools is available.
arXiv link: http://arxiv.org/abs/1812.01723v3
Distribution Regression with Sample Selection, with an Application to Wage Decompositions in the UK
This model is a semi-parametric generalization of the Heckman selection model.
It accommodates much richer effects of the covariates on outcome distribution
and patterns of heterogeneity in the selection process, and allows for drastic
departures from the Gaussian error structure, while maintaining the same level
tractability as the classical model. The model applies to continuous, discrete
and mixed outcomes. We provide identification, estimation, and inference
methods, and apply them to obtain wage decomposition for the UK. Here we
decompose the difference between the male and female wage distributions into
composition, wage structure, selection structure, and selection sorting
effects. After controlling for endogenous employment selection, we still find
substantial gender wage gap -- ranging from 21% to 40% throughout the (latent)
offered wage distribution that is not explained by composition. We also uncover
positive sorting for single men and negative sorting for married women that
accounts for a substantive fraction of the gender wage gap at the top of the
distribution.
arXiv link: http://arxiv.org/abs/1811.11603v6
Simple Local Polynomial Density Estimators
density estimator based on local polynomial techniques. The estimator is fully
boundary adaptive and automatic, but does not require pre-binning or any other
transformation of the data. We study the main asymptotic properties of the
estimator, and use these results to provide principled estimation, inference,
and bandwidth selection methods. As a substantive application of our results,
we develop a novel discontinuity in density testing procedure, an important
problem in regression discontinuity designs and other program evaluation
settings. An illustrative empirical application is given. Two companion Stata
and R software packages are provided.
arXiv link: http://arxiv.org/abs/1811.11512v2
Simulation of Stylized Facts in Agent-Based Computational Economic Market Models
several agent-based computational economic market (ABCEM) models. We perform
our simulations with the SABCEMM (Simulator for Agent-Based Computational
Economic Market Models) tool recently introduced by the authors (Trimborn et
al. 2019). Furthermore, we present novel ABCEM models created by recombining
existing models and study them with respect to stylized facts as well. This can
be efficiently performed by the SABCEMM tool thanks to its object-oriented
software design. The code is available on GitHub (Trimborn et al. 2018), such
that all results can be reproduced by the reader.
arXiv link: http://arxiv.org/abs/1812.02726v2
A Residual Bootstrap for Conditional Expected Shortfall
estimator of Francq and Zako\"ian (2015) associated with the conditional
Expected Shortfall. For a general class of volatility models the bootstrap is
shown to be asymptotically valid under the conditions imposed by Beutner et al.
(2018). A simulation study is conducted revealing that the average coverage
rates are satisfactory for most settings considered. There is no clear evidence
to have a preference for any of the three proposed bootstrap intervals. This
contrasts results in Beutner et al. (2018) for the VaR, for which the
reversed-tails interval has a superior performance.
arXiv link: http://arxiv.org/abs/1811.11557v1
Estimation of a Heterogeneous Demand Function with Berkson Errors
demand this form of measurement error occurs when the price an individual pays
is measured by the (weighted) average price paid by individuals in a specified
group (e.g., a county), rather than the true transaction price. We show the
importance of such measurement errors for the estimation of demand in a setting
with nonseparable unobserved heterogeneity. We develop a consistent estimator
using external information on the true distribution of prices. Examining the
demand for gasoline in the U.S., we document substantial within-market price
variability, and show that there are significant spatial differences in the
magnitude of Berkson errors across regions of the U.S. Accounting for Berkson
errors is found to be quantitatively important for estimating price effects and
for welfare calculations. Imposing the Slutsky shape constraint greatly reduces
the sensitivity to Berkson errors.
arXiv link: http://arxiv.org/abs/1811.10690v2
LM-BIC Model Selection in Semiparametric Models
develops a consistent series-based model selection procedure based on a
Bayesian Information Criterion (BIC) type criterion to select between several
classes of models. The procedure selects a model by minimizing the
semiparametric Lagrange Multiplier (LM) type test statistic from Korolev (2018)
but additionally rewards simpler models. The paper also develops consistent
upward testing (UT) and downward testing (DT) procedures based on the
semiparametric LM type specification test. The proposed semiparametric LM-BIC
and UT procedures demonstrate good performance in simulations. To illustrate
the use of these semiparametric model selection procedures, I apply them to the
parametric and semiparametric gasoline demand specifications from Yatchew and
No (2001). The LM-BIC procedure selects the semiparametric specification that
is nonparametric in age but parametric in all other variables, which is in line
with the conclusions in Yatchew and No (2001). The results of the UT and DT
procedures heavily depend on the choice of tuning parameters and assumptions
about the model errors.
arXiv link: http://arxiv.org/abs/1811.10676v1
Generalized Dynamic Factor Models and Volatilities: Consistency, rates, and prediction intervals
dynamic factor structure on the levels or returns, typically also admit a
dynamic factor decomposition. We consider a two-stage dynamic factor model
method recovering the common and idiosyncratic components of both levels and
log-volatilities. Specifically, in a first estimation step, we extract the
common and idiosyncratic shocks for the levels, from which a log-volatility
proxy is computed. In a second step, we estimate a dynamic factor model, which
is equivalent to a multiplicative factor structure for volatilities, for the
log-volatility panel. By exploiting this two-stage factor approach, we build
one-step-ahead conditional prediction intervals for large $n \times T$ panels
of returns. Those intervals are based on empirical quantiles, not on
conditional variances; they can be either equal- or unequal- tailed. We provide
uniform consistency and consistency rates results for the proposed estimators
as both $n$ and $T$ tend to infinity. We study the finite-sample properties of
our estimators by means of Monte Carlo simulations. Finally, we apply our
methodology to a panel of asset returns belonging to the S&P100 index in order
to compute one-step-ahead conditional prediction intervals for the period
2006-2013. A comparison with the componentwise GARCH benchmark (which does not
take advantage of cross-sectional information) demonstrates the superiority of
our approach, which is genuinely multivariate (and high-dimensional),
nonparametric, and model-free.
arXiv link: http://arxiv.org/abs/1811.10045v2
Identification of Treatment Effects under Limited Exogenous Variation
wide class of econometric models. With control variables to correct for
endogeneity, nonparametric identification of treatment effects requires strong
support conditions. To alleviate this requirement, we consider varying
coefficients specifications for the conditional expectation function of the
outcome given a treatment and control variables. This function is expressed as
a linear combination of either known functions of the treatment, with unknown
coefficients varying with the controls, or known functions of the controls,
with unknown coefficients varying with the treatment. We use this modeling
approach to give necessary and sufficient conditions for identification of
average treatment effects. A sufficient condition for identification is
conditional nonsingularity, that the second moment matrix of the known
functions given the variable in the varying coefficients is nonsingular with
probability one. For known treatment functions with sufficient variation, we
find that triangular models with discrete instrument cannot identify average
treatment effects when the number of support points for the instrument is less
than the number of coefficients. For known functions of the controls, we find
that average treatment effects can be identified in general nonseparable
triangular models with binary or discrete instruments. We extend our analysis
to flexible models of increasing dimension and relate conditional
nonsingularity to the full support condition of Imbens and Newey (2009),
thereby embedding semi- and non-parametric identification into a common
framework.
arXiv link: http://arxiv.org/abs/1811.09837v2
High Dimensional Classification through $\ell_0$-Penalized Empirical Risk Minimization
classification procedure by minimizing the empirical misclassification risk
with a penalty on the number of selected features. We derive non-asymptotic
probability bounds on the estimated sparsity as well as on the excess
misclassification risk. In particular, we show that our method yields a sparse
solution whose l0-norm can be arbitrarily close to true sparsity with high
probability and obtain the rates of convergence for the excess
misclassification risk. The proposed procedure is implemented via the method of
mixed integer linear programming. Its numerical performance is illustrated in
Monte Carlo experiments.
arXiv link: http://arxiv.org/abs/1811.09540v1
Model instability in predictive exchange rate regressions
accounting for uncertainty with respect to the underlying structural
representation. Within a flexible Bayesian non-linear time series framework,
our modeling approach assumes that different regimes are characterized by
commonly used structural exchange rate models, with their evolution being
driven by a Markov process. We assume a time-varying transition probability
matrix with transition probabilities depending on a measure of the monetary
policy stance of the central bank at the home and foreign country. We apply
this model to a set of eight exchange rates against the US dollar. In a
forecasting exercise, we show that model evidence varies over time and a model
approach that takes this empirical evidence seriously yields improvements in
accuracy of density forecasts for most currency pairs considered.
arXiv link: http://arxiv.org/abs/1811.08818v2
The value of forecasts: Quantifying the economic gains of accurate quarter-hourly electricity price forecasts
quarter-hourly electricity spot markets. While the literature is diverse on
day-ahead prediction approaches, both the intraday continuous and intraday
call-auction prices have not been studied intensively with a clear focus on
predictive power. Besides electricity price forecasting, we check for the
impact of early day-ahead (DA) EXAA prices on intraday forecasts. Another
novelty of this paper is the complementary discussion of economic benefits. A
precise estimation is worthless if it cannot be utilized. We elaborate possible
trading decisions based upon our forecasting scheme and analyze their monetary
effects. We find that even simple electricity trading strategies can lead to
substantial economic impact if combined with a decent forecasting technique.
arXiv link: http://arxiv.org/abs/1811.08604v1
Bayesian Inference for Structural Vector Autoregressions Identified by Markov-Switching Heteroskedasticity
autoregressive models in which the structural parameters are identified via
Markov-switching heteroskedasticity. In such a model, restrictions that are
just-identifying in the homoskedastic case, become over-identifying and can be
tested. A set of parametric restrictions is derived under which the structural
matrix is globally or partially identified and a Savage-Dickey density ratio is
used to assess the validity of the identification conditions. The latter is
facilitated by analytical derivations that make the computations fast and
numerical standard errors small. As an empirical example, monetary models are
compared using heteroskedasticity as an additional device for identification.
The empirical results support models with money in the interest rate reaction
function.
arXiv link: http://arxiv.org/abs/1811.08167v1
Complete Subset Averaging with Many Instruments
the equal-weighted average over a complete subset with $k$ instruments among
$K$ available, which we call the complete subset averaging (CSA) 2SLS. The
approximate mean squared error (MSE) is derived as a function of the subset
size $k$ by the Nagar (1959) expansion. The subset size is chosen by minimizing
the sample counterpart of the approximate MSE. We show that this method
achieves the asymptotic optimality among the class of estimators with different
subset sizes. To deal with averaging over a growing set of irrelevant
instruments, we generalize the approximate MSE to find that the optimal $k$ is
larger than otherwise. An extensive simulation experiment shows that the
CSA-2SLS estimator outperforms the alternative estimators when instruments are
correlated. As an empirical illustration, we estimate the logistic demand
function in Berry, Levinsohn, and Pakes (1995) and find the CSA-2SLS estimate
is better supported by economic theory than the alternative estimates.
arXiv link: http://arxiv.org/abs/1811.08083v6
Optimal Iterative Threshold-Kernel Estimation of Jump Diffusion Processes
jump-diffusion processes, which iteratively applies thresholding and kernel
methods in an approximately optimal way to achieve improved finite-sample
performance. We use the expected number of jump misclassifications as the
objective function to optimally select the threshold parameter of the jump
detection scheme. We prove that the objective function is quasi-convex and
obtain a new second-order infill approximation of the optimal threshold in
closed form. The approximate optimal threshold depends not only on the spot
volatility, but also the jump intensity and the value of the jump density at
the origin. Estimation methods for these quantities are then developed, where
the spot volatility is estimated by a kernel estimator with thresholding and
the value of the jump density at the origin is estimated by a density kernel
estimator applied to those increments deemed to contain jumps by the chosen
thresholding criterion. Due to the interdependency between the model parameters
and the approximate optimal estimators built to estimate them, a type of
iterative fixed-point algorithm is developed to implement them. Simulation
studies for a prototypical stochastic volatility model show that it is not only
feasible to implement the higher-order local optimal threshold scheme but also
that this is superior to those based only on the first order approximation
and/or on average values of the parameters over the estimation time period.
arXiv link: http://arxiv.org/abs/1811.07499v4
MALTS: Matching After Learning to Stretch
matches for causal inference. Most prior work in matching uses ad-hoc distance
metrics, often leading to poor quality matches, particularly when there are
irrelevant covariates. In this work, we learn an interpretable distance metric
for matching, which leads to substantially higher quality matches. The learned
distance metric stretches the covariate space according to each covariate's
contribution to outcome prediction: this stretching means that mismatches on
important covariates carry a larger penalty than mismatches on irrelevant
covariates. Our ability to learn flexible distance metrics leads to matches
that are interpretable and useful for the estimation of conditional average
treatment effects.
arXiv link: http://arxiv.org/abs/1811.07415v9
Estimation of High-Dimensional Seemingly Unrelated Regression Models
that allow the number of equations (N) to be large, and to be comparable to the
number of the observations in each equation (T). It is well known in the
literature that the conventional SUR estimator, for example, the generalized
least squares (GLS) estimator of Zellner (1962) does not perform well. As the
main contribution of the paper, we propose a new feasible GLS estimator called
the feasible graphical lasso (FGLasso) estimator. For a feasible implementation
of the GLS estimator, we use the graphical lasso estimation of the precision
matrix (the inverse of the covariance matrix of the equation system errors)
assuming that the underlying unknown precision matrix is sparse. We derive
asymptotic theories of the new estimator and investigate its finite sample
properties via Monte-Carlo simulations.
arXiv link: http://arxiv.org/abs/1811.05567v1
Identification and estimation of multinomial choice models with latent special covariates
special covariates that have full support. This paper shows how these
identification results can be extended to a large class of multinomial choice
models when all covariates are bounded. I also provide a new
$n$-consistent asymptotically normal estimator of the finite-dimensional
parameters of the model.
arXiv link: http://arxiv.org/abs/1811.05555v3
Capital Structure and Speed of Adjustment in U.S. Firms. A Comparative Study in Microeconomic and Macroeconomic Conditions - A Quantille Regression Approach
"quickly", in different macroeconomic states, companies adjust their capital
structure to their leverage targets. This study extends the empirical research
on the topic of capital structure by focusing on a quantile regression method
to investigate the behavior of firm-specific characteristics and macroeconomic
factors across all quantiles of distribution of leverage (book leverage and
market leverage). Therefore, depending on a partial adjustment model, we find
that the adjustment speed fluctuated in different stages of book versus market
leverage. Furthermore, while macroeconomic states change, we detect clear
differentiations of the contribution and the effects of the firm-specific and
the macroeconomic variables between market leverage and book leverage debt
ratios. Consequently, we deduce that across different macroeconomic states the
nature and maturity of borrowing influence the persistence and endurance of the
relation between determinants and borrowing.
arXiv link: http://arxiv.org/abs/1811.04473v1
The Augmented Synthetic Control Method
impact of a treatment on a single unit in panel data settings. The "synthetic
control" is a weighted average of control units that balances the treated
unit's pre-treatment outcomes as closely as possible. A critical feature of the
original proposal is to use SCM only when the fit on pre-treatment outcomes is
excellent. We propose Augmented SCM as an extension of SCM to settings where
such pre-treatment fit is infeasible. Analogous to bias correction for inexact
matching, Augmented SCM uses an outcome model to estimate the bias due to
imperfect pre-treatment fit and then de-biases the original SCM estimate. Our
main proposal, which uses ridge regression as the outcome model, directly
controls pre-treatment fit while minimizing extrapolation from the convex hull.
This estimator can also be expressed as a solution to a modified synthetic
controls problem that allows negative weights on some donor units. We bound the
estimation error of this approach under different data generating processes,
including a linear factor model, and show how regularization helps to avoid
over-fitting to noise. We demonstrate gains from Augmented SCM with extensive
simulation studies and apply this framework to estimate the impact of the 2012
Kansas tax cuts on economic growth. We implement the proposed method in the new
augsynth R package.
arXiv link: http://arxiv.org/abs/1811.04170v3
Bootstrapping Structural Change Tests
in linear models estimated via Two Stage Least Squares (2SLS). Two types of
test are considered: one where the null hypothesis is of no change and the
alternative hypothesis involves discrete change at k unknown break-points in
the sample; and a second test where the null hypothesis is that there is
discrete parameter change at l break-points in the sample against an
alternative in which the parameters change at l + 1 break-points. In both
cases, we consider inferences based on a sup-Wald-type statistic using either
the wild recursive bootstrap or the wild fixed bootstrap. We establish the
asymptotic validity of these bootstrap tests under a set of general conditions
that allow the errors to exhibit conditional and/or unconditional
heteroskedasticity, and report results from a simulation study that indicate
the tests yield reliable inferences in the sample sizes often encountered in
macroeconomics. The analysis covers the cases where the first-stage estimation
of 2SLS involves a model whose parameters are either constant or themselves
subject to discrete parameter change. If the errors exhibit unconditional
heteroskedasticity and/or the reduced form is unstable then the bootstrap
methods are particularly attractive because the limiting distributions of the
test statistics are not pivotal.
arXiv link: http://arxiv.org/abs/1811.04125v1
How does stock market volatility react to oil shocks?
We jointly analyze three different structural oil market shocks (i.e.,
aggregate demand, oil supply, and oil-specific demand shocks) and stock market
volatility using a structural vector autoregressive model. Identification is
achieved by assuming that the price of crude oil reacts to stock market
volatility only with delay. This implies that innovations to the price of crude
oil are not strictly exogenous, but predetermined with respect to the stock
market. We show that volatility responds significantly to oil price shocks
caused by unexpected changes in aggregate and oil-specific demand, whereas the
impact of supply-side shocks is negligible.
arXiv link: http://arxiv.org/abs/1811.03820v1
Estimation of a Structural Break Point in Linear Regression Models
structural break in linear regression models. If the break magnitude is small,
the least-squares estimator of the break date has two modes at the ends of the
finite sample period, regardless of the true break location. To solve this
problem, I suggest an alternative estimator based on a modification of the
least-squares objective function. The modified objective function incorporates
estimation uncertainty that varies across potential break dates. The new break
point estimator is consistent and has a unimodal finite sample distribution
under small break magnitudes. A limit distribution is provided under an in-fill
asymptotic framework. Monte Carlo simulation results suggest that the new
estimator outperforms the least-squares estimator. I apply the method to
estimate the break date in U.S. real GDP growth and U.S. and UK stock return
prediction models.
arXiv link: http://arxiv.org/abs/1811.03720v3
Nonparametric maximum likelihood methods for binary response models with random coefficients
been extensively employed in many econometric settings under various parametric
specifications of the distribution of the random coefficients. Nonparametric
maximum likelihood estimation (NPMLE) as proposed by Cosslett (1983) and
Ichimura and Thompson (1998), in contrast, has received less attention in
applied work due primarily to computational difficulties. We propose a new
approach to computation of NPMLEs for binary response models that significantly
increase their computational tractability thereby facilitating greater
flexibility in applications. Our approach, which relies on recent developments
involving the geometry of hyperplane arrangements, is contrasted with the
recently proposed deconvolution method of Gautier and Kitamura (2013). An
application to modal choice for the journey to work in the Washington DC area
illustrates the methods.
arXiv link: http://arxiv.org/abs/1811.03329v3
Nonparametric Analysis of Finite Mixtures
model unobserved heterogeneity, which plays major roles in labor economics,
industrial organization and other fields. Mixtures are also convenient in
dealing with contaminated sampling models and models with multiple equilibria.
This paper shows that finite mixture models are nonparametrically identified
under weak assumptions that are plausible in economic applications. The key is
to utilize the identification power implied by information in covariates
variation. First, three identification approaches are presented, under distinct
and non-nested sets of sufficient conditions. Observable features of data
inform us which of the three approaches is valid. These results apply to
general nonparametric switching regressions, as well as to structural
econometric models, such as auction models with unobserved heterogeneity.
Second, some extensions of the identification results are developed. In
particular, a mixture regression where the mixing weights depend on the value
of the regressors in a fully unrestricted manner is shown to be
nonparametrically identifiable. This means a finite mixture model with
function-valued unobserved heterogeneity can be identified in a cross-section
setting, without restricting the dependence pattern between the regressor and
the unobserved heterogeneity. In this aspect it is akin to fixed effects panel
data models which permit unrestricted correlation between unobserved
heterogeneity and covariates. Third, the paper shows that fully nonparametric
estimation of the entire mixture model is possible, by forming a sample
analogue of one of the new identification strategies. The estimator is shown to
possess a desirable polynomial rate of convergence as in a standard
nonparametric estimation problem, despite nonregular features of the model.
arXiv link: http://arxiv.org/abs/1811.02727v1
Randomization Tests for Equality in Dependence Structure
structure is identical between two groups. Rather than relying on a single
index such as Pearson's correlation coefficient or Kendall's Tau, we consider
the entire dependence structure by investigating the dependence functions
(copulas). The critical values are obtained by a modified randomization
procedure designed to exploit asymptotic group invariance conditions.
Implementation of the test is intuitive and simple, and does not require any
specification of a tuning parameter or weight function. At the same time, the
test exhibits excellent finite sample performance, with the null rejection
rates almost equal to the nominal level even when the sample size is extremely
small. Two empirical applications concerning the dependence between income and
consumption, and the Brexit effect on European financial market integration are
provided.
arXiv link: http://arxiv.org/abs/1811.02105v1
Treatment Effect Estimation with Noisy Conditioning Variables
measurements of unobserved confounding factors are available. I use proxy
variables to construct a random variable conditional on which treatment
variables become exogenous. The key idea is that, under appropriate conditions,
there exists a one-to-one mapping between the distribution of unobserved
confounding factors and the distribution of proxies. To ensure sufficient
variation in the constructed control variable, I use an additional variable,
termed excluded variable, which satisfies certain exclusion restrictions and
relevance conditions. I establish asymptotic distributional results for
semiparametric and flexible parametric estimators of causal parameters. I
illustrate empirical relevance and usefulness of my results by estimating
causal effects of attending selective college on earnings.
arXiv link: http://arxiv.org/abs/1811.00667v4
Partial Mean Processes with Generated Regressors: Continuous Treatment Effects and Nonseparable Models
problems, such as the distribution of potential outcomes with continuous
treatments and the quantile structural function in a nonseparable triangular
model. This paper proposes a nonparametric estimator for the partial mean
process, where the second step consists of a kernel regression on regressors
that are estimated in the first step. The main contribution is a uniform
expansion that characterizes in detail how the estimation error associated with
the generated regressor affects the limiting distribution of the marginal
integration estimator. The general results are illustrated with two examples:
the generalized propensity score for a continuous treatment (Hirano and Imbens,
2004) and control variables in triangular models (Newey, Powell, and Vella,
1999; Imbens and Newey, 2009). An empirical application to the Job Corps
program evaluation demonstrates the usefulness of the method.
arXiv link: http://arxiv.org/abs/1811.00157v1
Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence
estimators for heterogeneous causal effects at different aggregation levels. We
employ an Empirical Monte Carlo Study that relies on arguably realistic data
generation processes (DGPs) based on actual data. We consider 24 different
DGPs, eleven different causal machine learning estimators, and three
aggregation levels of the estimated effects. In the main DGPs, we allow for
selection into treatment based on a rich set of observable covariates. We
provide evidence that the estimators can be categorized into three groups. The
first group performs consistently well across all DGPs and aggregation levels.
These estimators have multiple steps to account for the selection into the
treatment and the outcome process. The second group shows competitive
performance only for particular DGPs. The third group is clearly outperformed
by the other estimators.
arXiv link: http://arxiv.org/abs/1810.13237v2
Dynamic Assortment Optimization with Changing Contextual Information
finite selling season of length $T$. At each time period, the seller offers an
arriving customer an assortment of substitutable products under a cardinality
constraint, and the customer makes the purchase among offered products
according to a discrete choice model. Most existing work associates each
product with a real-valued fixed mean utility and assumes a multinomial logit
choice (MNL) model. In many practical applications, feature/contexutal
information of products is readily available. In this paper, we incorporate the
feature information by assuming a linear relationship between the mean utility
and the feature. In addition, we allow the feature information of products to
change over time so that the underlying choice model can also be
non-stationary. To solve the dynamic assortment optimization under this
changing contextual MNL model, we need to simultaneously learn the underlying
unknown coefficient and makes the decision on the assortment. To this end, we
develop an upper confidence bound (UCB) based policy and establish the regret
bound on the order of $\widetilde O(dT)$, where $d$ is the dimension of
the feature and $\widetilde O$ suppresses logarithmic dependence. We further
established the lower bound $\Omega(dT/K)$ where $K$ is the cardinality
constraint of an offered assortment, which is usually small. When $K$ is a
constant, our policy is optimal up to logarithmic factors. In the exploitation
phase of the UCB algorithm, we need to solve a combinatorial optimization for
assortment optimization based on the learned information. We further develop an
approximation algorithm and an efficient greedy heuristic. The effectiveness of
the proposed policy is further demonstrated by our numerical studies.
arXiv link: http://arxiv.org/abs/1810.13069v2
Semiparametrically efficient estimation of the average linear regression function
vector of pre-treatment control variables. Here X may include (combinations of)
continuous, discrete, and/or non-mutually exclusive "treatments". Consider the
linear regression of Y onto X in a subpopulation homogenous in W = w (formally
a conditional linear predictor). Let b0(w) be the coefficient vector on X in
this regression. We introduce a semiparametrically efficient estimate of the
average beta0 = E[b0(W)]. When X is binary-valued (multi-valued) our procedure
recovers the (a vector of) average treatment effect(s). When X is
continuously-valued, or consists of multiple non-exclusive treatments, our
estimand coincides with the average partial effect (APE) of X on Y when the
underlying potential response function is linear in X, but otherwise
heterogenous across agents. When the potential response function takes a
general nonlinear/heterogenous form, and X is continuously-valued, our
procedure recovers a weighted average of the gradient of this response across
individuals and values of X. We provide a simple, and semiparametrically
efficient, method of covariate adjustment for settings with complicated
treatment regimes. Our method generalizes familiar methods of covariate
adjustment used for program evaluation as well as methods of semiparametric
regression (e.g., the partially linear regression model).
arXiv link: http://arxiv.org/abs/1810.12511v1
Robust Inference Using Inverse Probability Weighting
economics and other disciplines. As Gaussian approximations perform poorly in
the presence of "small denominators," trimming is routinely employed as a
regularization strategy. However, ad hoc trimming of the observations renders
usual inference procedures invalid for the target estimand, even in large
samples. In this paper, we first show that the IPW estimator can have different
(Gaussian or non-Gaussian) asymptotic distributions, depending on how "close to
zero" the probability weights are and on how large the trimming threshold is.
As a remedy, we propose an inference procedure that is robust not only to small
probability weights entering the IPW estimator but also to a wide range of
trimming threshold choices, by adapting to these different asymptotic
distributions. This robustness is achieved by employing resampling techniques
and by correcting a non-negligible trimming bias. We also propose an
easy-to-implement method for choosing the trimming threshold by minimizing an
empirical analogue of the asymptotic mean squared error. In addition, we show
that our inference procedure remains valid with the use of a data-driven
trimming threshold. We illustrate our method by revisiting a dataset from the
National Supported Work program.
arXiv link: http://arxiv.org/abs/1810.11397v2
Factor-Driven Two-Regime Regression
driven by a vector of possibly unobservable factors. When the factors are
latent, we estimate them by the principal component analysis of a panel data
set. We show that the optimization problem can be reformulated as mixed integer
optimization, and we present two alternative computational algorithms. We
derive the asymptotic distribution of the resulting estimator under the scheme
that the threshold effect shrinks to zero. In particular, we establish a phase
transition that describes the effect of first-stage factor estimation as the
cross-sectional dimension of panel data increases relative to the time-series
dimension. Moreover, we develop bootstrap inference and illustrate our methods
via numerical studies.
arXiv link: http://arxiv.org/abs/1810.11109v4
Nuclear Norm Regularized Estimation of Panel Regression Models
effects. We propose two new estimation methods that are based on minimizing
convex objective functions. The first method minimizes the sum of squared
residuals with a nuclear (trace) norm regularization. The second method
minimizes the nuclear norm of the residuals. We establish the consistency of
the two resulting estimators. Those estimators have a very important
computational advantage compared to the existing least squares (LS) estimator,
in that they are defined as minimizers of a convex objective function. In
addition, the nuclear norm penalization helps to resolve a potential
identification problem for interactive fixed effect models, in particular when
the regressors are low-rank and the number of the factors is unknown. We also
show how to construct estimators that are asymptotically equivalent to the
least squares (LS) estimator in Bai (2009) and Moon and Weidner (2017) by using
our nuclear norm regularized or minimized estimators as initial values for a
finite number of LS minimizing iteration steps. This iteration avoids any
non-convex minimization, while the original LS estimation problem is generally
non-convex, and can have multiple local minima.
arXiv link: http://arxiv.org/abs/1810.10987v4
Spanning Tests for Markowitz Stochastic Dominance
points of real valued continuous stochastic processes. This facilitates the
derivation of the first-order asymptotic properties of tests for stochastic
spanning given some stochastic dominance relation. We define the concept of
Markowitz stochastic dominance spanning, and develop an analytical
representation of the spanning property. We construct a non-parametric test for
spanning based on subsampling, and derive its asymptotic exactness and
consistency. The spanning methodology determines whether introducing new
securities or relaxing investment constraints improves the investment
opportunity set of investors driven by Markowitz stochastic dominance. In an
application to standard data sets of historical stock market returns, we reject
market portfolio Markowitz efficiency as well as two-fund separation. Hence, we
find evidence that equity management through base assets can outperform the
market, for investors with Markowitz type preferences.
arXiv link: http://arxiv.org/abs/1810.10800v1
Model Selection Techniques -- An Overview
or machine learning methods for observed data in order to facilitate scientific
discoveries or gain predictive power. Whatever data and fitting procedures are
employed, a crucial step is to select the most appropriate model or method from
a set of candidates. Model selection is a key ingredient in data analysis for
reliable and reproducible statistical inference or prediction, and thus central
to scientific studies in fields such as ecology, economics, engineering,
finance, political science, biology, and epidemiology. There has been a long
history of model selection techniques that arise from researches in statistics,
information theory, and signal processing. A considerable number of methods
have been proposed, following different philosophies and exhibiting varying
performances. The purpose of this article is to bring a comprehensive overview
of them, in terms of their motivation, large sample performance, and
applicability. We provide integrated and practically relevant discussions on
theoretical properties of state-of- the-art model selection approaches. We also
share our thoughts on some controversial views on the practice of model
selection.
arXiv link: http://arxiv.org/abs/1810.09583v1
Forecasting Time Series with VARMA Recursions on Graphs
issues in modeling multivariate time series. However, there is yet no complete
understanding of how the underlying structure could be exploited to ease this
task. This work provides contributions in this direction by considering the
forecasting of a process evolving over a graph. We make use of the
(approximate) time-vertex stationarity assumption, i.e., timevarying graph
signals whose first and second order statistical moments are invariant over
time and correlated to a known graph topology. The latter is combined with VAR
and VARMA models to tackle the dimensionality issues present in predicting the
temporal evolution of multivariate time series. We find out that by projecting
the data to the graph spectral domain: (i) the multivariate model estimation
reduces to that of fitting a number of uncorrelated univariate ARMA models and
(ii) an optimal low-rank data representation can be exploited so as to further
reduce the estimation costs. In the case that the multivariate process can be
observed at a subset of nodes, the proposed models extend naturally to Kalman
filtering on graphs allowing for optimal tracking. Numerical experiments with
both synthetic and real data validate the proposed approach and highlight its
benefits over state-of-the-art alternatives.
arXiv link: http://arxiv.org/abs/1810.08581v2
Probabilistic Forecasting in Day-Ahead Electricity Markets: Simulating Peak and Off-Peak Prices
forecasting and forecasting evaluation. We work with off-peak and peak time
series from the German-Austrian day-ahead price, hence we analyze bivariate
data. We first estimate the mean of the two time series, and then in a second
step we estimate the residuals. The mean equation is estimated by OLS and
elastic net and the residuals are estimated by maximum likelihood. Our
contribution is to include a bivariate jump component on a mean reverting jump
diffusion model in the residuals. The models' forecasts are evaluated using
four different criteria, including the energy score to measure whether the
correlation structure between the time series is properly included or not. In
the results it is observed that the models with bivariate jumps provide better
results with the energy score, which means that it is important to consider
this structure in order to properly forecast correlated time series.
arXiv link: http://arxiv.org/abs/1810.08418v2
Treatment Effect Models with Strategic Interaction in Treatment Decisions
decisions can affect both one's own treatment and outcome. Focusing on the case
of two-player interactions, we formulate treatment decision behavior as a
complete information game with multiple equilibria. Using a latent index
framework and assuming a stochastic equilibrium selection, we prove that the
marginal treatment effect from one's own treatment and that from the partner
are identifiable on the conditional supports of certain threshold variables
determined through the game model. Based on our constructive identification
results, we propose a two-step semiparametric procedure for estimating the
marginal treatment effects using series approximation. We show that the
proposed estimator is uniformly consistent and asymptotically normally
distributed. As an empirical illustration, we investigate the impacts of risky
behaviors on adolescents' academic performance.
arXiv link: http://arxiv.org/abs/1810.08350v11
Quantile Regression Under Memory Constraint
large sample size $n$ but under a limited memory constraint, where the memory
can only store a small batch of data of size $m$. A natural method is the
na\"ive divide-and-conquer approach, which splits data into batches of size
$m$, computes the local QR estimator for each batch, and then aggregates the
estimators via averaging. However, this method only works when $n=o(m^2)$ and
is computationally expensive. This paper proposes a computationally efficient
method, which only requires an initial QR estimator on a small batch of data
and then successively refines the estimator via multiple rounds of
aggregations. Theoretically, as long as $n$ grows polynomially in $m$, we
establish the asymptotic normality for the obtained estimator and show that our
estimator with only a few rounds of aggregations achieves the same efficiency
as the QR estimator computed on all the data. Moreover, our result allows the
case that the dimensionality $p$ goes to infinity. The proposed method can also
be applied to address the QR problem under distributed computing environment
(e.g., in a large-scale sensor network) or for real-time streaming data.
arXiv link: http://arxiv.org/abs/1810.08264v1
A Consistent Heteroskedasticity Robust LM Type Specification Test for Semiparametric Models
Multiplier (LM) type specification test for semiparametric conditional mean
models. Consistency is achieved by turning a conditional moment restriction
into a growing number of unconditional moment restrictions using series
methods. The proposed test statistic is straightforward to compute and is
asymptotically standard normal under the null. Compared with the earlier
literature on series-based specification tests in parametric models, I rely on
the projection property of series estimators and derive a different
normalization of the test statistic. Compared with the recent test in Gupta
(2018), I use a different way of accounting for heteroskedasticity. I
demonstrate using Monte Carlo studies that my test has superior finite sample
performance compared with the existing tests. I apply the test to one of the
semiparametric gasoline demand specifications from Yatchew and No (2001) and
find no evidence against it.
arXiv link: http://arxiv.org/abs/1810.07620v3
Accounting for Unobservable Heterogeneity in Cross Section Using Spatial First Differences
the presence of unobservable heterogeneity without instruments. When units are
dense in physical space, it may be sufficient to regress the "spatial first
differences" (SFD) of the outcome on the treatment and omit all covariates. The
identifying assumptions of SFD are similar in mathematical structure and
plausibility to other quasi-experimental designs. We use SFD to obtain new
estimates for the effects of time-invariant geographic factors, soil and
climate, on long-run agricultural productivities --- relationships crucial for
economic decisions, such as land management and climate policy, but notoriously
confounded by unobservables.
arXiv link: http://arxiv.org/abs/1810.07216v2
Using generalized estimating equations to estimate nonlinear models with spatial data
data using two-step generalized estimating equations (GEE) in the quasi-maximum
likelihood estimation (QMLE) framework. In the interest of improving
efficiency, we propose a grouping estimator to account for the potential
spatial correlation in the underlying innovations. We use a Poisson model and a
Negative Binomial II model for count data and a Probit model for binary
response data to demonstrate the GEE procedure. Under mild weak dependency
assumptions, results on estimation consistency and asymptotic normality are
provided. Monte Carlo simulations show efficiency gain of our approach in
comparison of different estimation methods for count data and binary response
data. Finally we apply the GEE approach to study the determinants of the inflow
foreign direct investment (FDI) to China.
arXiv link: http://arxiv.org/abs/1810.05855v1
Stochastic Revealed Preferences with Measurement Error
observed purchase decisions satisfy the revealed preference (RP) axioms of the
utility maximization theory (UMT). Researchers using survey or experimental
panel data sets on prices and consumption to answer this question face the
well-known problem of measurement error. We show that ignoring measurement
error in the RP approach may lead to overrejection of the UMT. To solve this
problem, we propose a new statistical RP framework for consumption panel data
sets that allows for testing the UMT in the presence of measurement error. Our
test is applicable to all consumer models that can be characterized by their
first-order conditions. Our approach is nonparametric, allows for unrestricted
heterogeneity in preferences, and requires only a centering condition on
measurement error. We develop two applications that provide new evidence about
the UMT. First, we find support in a survey data set for the dynamic and
time-consistent UMT in single-individual households, in the presence of
nonclassical measurement error in consumption. In the second
application, we cannot reject the static UMT in a widely used experimental data
set in which measurement error in prices is assumed to be the result of price
misperception due to the experimental design. The first finding stands in
contrast to the conclusions drawn from the deterministic RP test of Browning
(1989). The second finding reverses the conclusions drawn from the
deterministic RP test of Afriat (1967) and Varian (1982).
arXiv link: http://arxiv.org/abs/1810.05287v2
Offline Multi-Action Policy Learning: Generalization and Optimization
maps from observable characteristics of an individual to an action. Examples
include selecting offers, prices, advertisements, or emails to send to
consumers, as well as the problem of determining which medication to prescribe
to a patient. While there is a growing body of literature devoted to this
problem, most existing results are focused on the case where data comes from a
randomized experiment, and further, there are only two possible actions, such
as giving a drug to a patient or not. In this paper, we study the offline
multi-action policy learning problem with observational data and where the
policy may need to respect budget constraints or belong to a restricted policy
class such as decision trees. We build on the theory of efficient
semi-parametric inference in order to propose and implement a policy learning
algorithm that achieves asymptotically minimax-optimal regret. To the best of
our knowledge, this is the first result of this type in the multi-action setup,
and it provides a substantial performance improvement over the existing
learning algorithms. We then consider additional computational challenges that
arise in implementing our method for the case where the policy is restricted to
take the form of a decision tree. We propose two different approaches, one
using a mixed integer program formulation and the other using a tree-search
based algorithm.
arXiv link: http://arxiv.org/abs/1810.04778v2
Prices, Profits, Proxies, and Production
heterogeneous firms that can be ranked in terms of productivity. Our approach
works when quantities and prices are latent, rendering standard approaches
inapplicable. Instead, we require observation of profits or other
optimizing-values such as costs or revenues, and either prices or price proxies
of flexibly chosen variables. We extend classical duality results for
price-taking firms to a setup with discrete heterogeneity, endogeneity, and
limited variation in possibly latent prices. Finally, we show that convergence
results for nonparametric estimators may be directly converted to convergence
results for production sets.
arXiv link: http://arxiv.org/abs/1810.04697v4
The Incidental Parameters Problem in Testing for Remaining Cross-section Correlation
for cross-section correlation when applied to residuals obtained from panel
data models with many estimated parameters. We show that the presence of
period-specific parameters leads the CD test statistic to diverge as length of
the time dimension of the sample grows. This result holds even if cross-section
dependence is correctly accounted for and hence constitutes an example of the
Incidental Parameters Problem. The relevance of this problem is investigated
both for the classical Time Fixed Effects estimator as well as the Common
Correlated Effects estimator of Pesaran (2006). We suggest a weighted CD test
statistic which re-establishes standard normal inference under the null
hypothesis. Given the widespread use of the CD test statistic to test for
remaining cross-section correlation, our results have far reaching implications
for empirical researchers.
arXiv link: http://arxiv.org/abs/1810.03715v4
Evaluating regulatory reform of network industries: a survey of empirical models based on categorical proxies
increasingly used in empirical evaluation models. We surveyed 63 studies that
rely on such indices to analyze the effects of entry liberalization,
privatization, unbundling, and independent regulation of the electricity,
natural gas, and telecommunications sectors. We highlight methodological issues
related to the use of these proxies. Next, taking stock of the literature, we
provide practical advice for the design of the empirical strategy and discuss
the selection of control and instrumental variables to attenuate endogeneity
problems undermining identification of the effects of regulatory reforms.
arXiv link: http://arxiv.org/abs/1810.03348v1
Simple Inference on Functionals of Set-Identified Parameters Defined by Linear Moments
linear functionals or scalar subvectors of a partially identified parameter
defined by linear moment inequalities. The procedure amounts to bootstrapping
the value functions of randomly perturbed linear programming problems, and does
not require the researcher to grid over the parameter space. The low-level
conditions for uniform validity rely on genericity results for linear programs.
The unconventional perturbation approach produces a confidence set with a
coverage probability of 1 over the identified set, but obtains exact coverage
on an outer set, is valid under weak assumptions, and is computationally simple
to implement.
arXiv link: http://arxiv.org/abs/1810.03180v10
On LASSO for Predictive Regression
strength and various degrees of persistence. Variable selection in such a
context is of great importance. In this paper, we explore the pitfalls and
possibilities of the LASSO methods in this predictive regression framework. In
the presence of stationary, local unit root, and cointegrated predictors, we
show that the adaptive LASSO cannot asymptotically eliminate all cointegrating
variables with zero regression coefficients. This new finding motivates a novel
post-selection adaptive LASSO, which we call the twin adaptive LASSO (TAlasso),
to restore variable selection consistency. Accommodating the system of
heterogeneous regressors, TAlasso achieves the well-known oracle property. In
contrast, conventional LASSO fails to attain coefficient estimation consistency
and variable screening in all components simultaneously. We apply these LASSO
methods to evaluate the short- and long-horizon predictability of S&P 500
excess returns.
arXiv link: http://arxiv.org/abs/1810.03140v4
Granger causality on horizontal sum of Boolean algebras
introduced by C.W.J. Granger in 1969. The Granger's model of causality has
become well-known and often used in various econometric models describing
causal systems, e.g., between commodity prices and exchange rates.
Our paper presents a new mathematical model of causality between two measured
objects. We have slightly modified the well-known Kolmogorovian probability
model. In particular, we use the horizontal sum of set $\sigma$-algebras
instead of their direct product.
arXiv link: http://arxiv.org/abs/1810.01654v1
Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights
using linear models with additive effects. I study the interpretation of the
OLS estimands in such models when treatment effects are heterogeneous. I show
that the treatment coefficient is a convex combination of two parameters, which
under certain conditions can be interpreted as the average treatment effects on
the treated and untreated. The weights on these parameters are inversely
related to the proportion of observations in each group. Reliance on these
implicit weights can have serious consequences for applied work, as I
illustrate with two well-known applications. I develop simple diagnostic tools
that empirical researchers can use to avoid potential biases. Software for
implementing these methods is available in R and Stata. In an important special
case, my diagnostics only require the knowledge of the proportion of treated
units.
arXiv link: http://arxiv.org/abs/1810.01576v3
Covariate Distribution Balance via Propensity Scores
maximize the covariate distribution balance among different treatment groups.
Heuristically, our proposed procedure attempts to estimate a propensity score
model by making the underlying covariate distribution of different treatment
groups as close to each other as possible. Our estimators are data-driven, do
not rely on tuning parameters such as bandwidths, admit an asymptotic linear
representation, and can be used to estimate different treatment effect
parameters under different identifying assumptions, including unconfoundedness
and local treatment effects. We derive the asymptotic properties of inverse
probability weighted estimators for the average, distributional, and quantile
treatment effects based on the proposed propensity score estimator and
illustrate their finite sample performance via Monte Carlo simulations and two
empirical applications.
arXiv link: http://arxiv.org/abs/1810.01370v4
Nonparametric Regression with Selectively Missing Covariates
a nonparametric framework. Our approach relies on instrumental variables that
explain variation in the latent covariates but have no direct effect on
selection. The regression function of interest is shown to be a weighted
version of observed conditional expectation where the weighting function is a
fraction of selection probabilities. Nonparametric identification of the
fractional probability weight (FPW) function is achieved via a partial
completeness assumption. We provide primitive functional form assumptions for
partial completeness to hold. The identification result is constructive for the
FPW series estimator. We derive the rate of convergence and also the pointwise
asymptotic distribution. In both cases, the asymptotic performance of the FPW
series estimator does not suffer from the inverse problem which derives from
the nonparametric instrumental variable approach. In a Monte Carlo study, we
analyze the finite sample properties of our estimator and we compare our
approach to inverse probability weighting, which can be used alternatively for
unconditional moment estimation. In the empirical application, we focus on two
different applications. We estimate the association between income and health
using linked data from the SHARE survey and administrative pension information
and use pension entitlements as an instrument. In the second application we
revisit the question how income affects the demand for housing based on data
from the German Socio-Economic Panel Study (SOEP). In this application we use
regional income information on the residential block level as an instrument. In
both applications we show that income is selectively missing and we demonstrate
that standard methods that do not account for the nonrandom selection process
lead to significantly biased estimates for individuals with low income.
arXiv link: http://arxiv.org/abs/1810.00411v4
Proxy Controls and Panel Data
inference of causal effects using `proxy controls': observables that are noisy
but informative proxies for unobserved confounding factors. Our analysis
applies to cross-sectional settings but is particularly well-suited to panel
models. Our identification results motivate a simple and `well-posed'
nonparametric estimator. We derive convergence rates for the estimator and
construct uniform confidence bands with asymptotically correct size. In panel
settings, our methods provide a novel approach to the difficult problem of
identification with non-separable, general heterogeneity and fixed $T$. In
panels, observations from different periods serve as proxies for unobserved
heterogeneity and our key identifying assumptions follow from restrictions on
the serial dependence structure. We apply our methods to two empirical
settings. We estimate consumer demand counterfactuals using panel data and we
estimate causal effects of grade retention on cognitive performance.
arXiv link: http://arxiv.org/abs/1810.00283v8
Deep Neural Networks for Estimation and Inference
establish novel rates of convergence for deep feedforward neural nets. Our new
rates are sufficiently fast (in some cases minimax optimal) to allow us to
establish valid second-step inference after first-step estimation with deep
learning, a result also new to the literature. Our estimation rates and
semiparametric inference results handle the current standard architecture:
fully connected feedforward neural networks (multi-layer perceptrons), with the
now-common rectified linear unit activation function and a depth explicitly
diverging with the sample size. We discuss other architectures as well,
including fixed-width, very deep networks. We establish nonasymptotic bounds
for these deep nets for a general class of nonparametric regression-type loss
functions, which includes as special cases least squares, logistic regression,
and other generalized linear models. We then apply our theory to develop
semiparametric inference, focusing on causal parameters for concreteness, such
as treatment effects, expected welfare, and decomposition effects. Inference in
many other semiparametric contexts can be readily obtained. We demonstrate the
effectiveness of deep learning with a Monte Carlo analysis and an empirical
application to direct mail marketing.
arXiv link: http://arxiv.org/abs/1809.09953v3
Multivariate Stochastic Volatility Model with Realized Volatilities and Pairwise Realized Correlations
conditional heteroscedasticity) models have successfully described the
volatility dynamics of univariate asset returns, extending them to the
multivariate models with dynamic correlations has been difficult due to several
major problems. First, there are too many parameters to estimate if available
data are only daily returns, which results in unstable estimates. One solution
to this problem is to incorporate additional observations based on intraday
asset returns, such as realized covariances. Second, since multivariate asset
returns are not synchronously traded, we have to use the largest time intervals
such that all asset returns are observed in order to compute the realized
covariance matrices. However, in this study, we fail to make full use of the
available intraday informations when there are less frequently traded assets.
Third, it is not straightforward to guarantee that the estimated (and the
realized) covariance matrices are positive definite. Our contributions are the
following: (1) we obtain the stable parameter estimates for the dynamic
correlation models using the realized measures, (2) we make full use of
intraday informations by using pairwise realized correlations, (3) the
covariance matrices are guaranteed to be positive definite, (4) we avoid the
arbitrariness of the ordering of asset returns, (5) we propose the flexible
correlation structure model (e.g., such as setting some correlations to be zero
if necessary), and (6) the parsimonious specification for the leverage effect
is proposed. Our proposed models are applied to the daily returns of nine U.S.
stocks with their realized volatilities and pairwise realized correlations and
are shown to outperform the existing models with respect to portfolio
performances.
arXiv link: http://arxiv.org/abs/1809.09928v2
Mostly Harmless Simulations? Using Monte Carlo Studies for Estimator Selection
motivated Monte Carlo study to help select a treatment effect estimator under
unconfoundedness. We show theoretically that neither is likely to be
informative except under restrictive conditions that are unlikely to be
satisfied in many contexts. To test empirical relevance, we also apply the
approaches to a real-world setting where estimator performance is known. Both
approaches are worse than random at selecting estimators which minimise
absolute bias. They are better when selecting estimators that minimise mean
squared error. However, using a simple bootstrap is at least as good and often
better. For now researchers would be best advised to use a range of estimators
and compare estimates for robustness.
arXiv link: http://arxiv.org/abs/1809.09527v2
An Automated Approach Towards Sparse Single-Equation Cointegration Modelling
Selector (SPECS) as an automated estimation procedure for dynamic
single-equation models with a large number of potentially (co)integrated
variables. By extending the classical single-equation error correction model,
SPECS enables the researcher to model large cointegrated datasets without
necessitating any form of pre-testing for the order of integration or
cointegrating rank. Under an asymptotic regime in which both the number of
parameters and time series observations jointly diverge to infinity, we show
that SPECS is able to consistently estimate an appropriate linear combination
of the cointegrating vectors that may occur in the underlying DGP. In addition,
SPECS is shown to enable the correct recovery of sparsity patterns in the
parameter space and to posses the same limiting distribution as the OLS oracle
procedure. A simulation study shows strong selective capabilities, as well as
superior predictive performance in the context of nowcasting compared to
high-dimensional models that ignore cointegration. An empirical application to
nowcasting Dutch unemployment rates using Google Trends confirms the strong
practical performance of our procedure.
arXiv link: http://arxiv.org/abs/1809.08889v3
Transmission of Macroeconomic Shocks to Risk Parameters: Their uses in Stress Testing
portfolios under extreme economic conditions. Therefore, we use empirical
measures to characterize the transmission process of macroeconomic shocks to
risk parameters. We propose the use of an extensive family of models, called
General Transfer Function Models, which condense well the characteristics of
the transmission described by the impact measures. The procedure for estimating
the parameters of these models is described employing the Bayesian approach and
using the prior information provided by the impact measures. In addition, we
illustrate the use of the estimated models from the credit risk data of a
portfolio.
arXiv link: http://arxiv.org/abs/1809.07401v3
Focused econometric estimation for noisy and small datasets: A Bayesian Minimum Expected Loss estimator approach
functions of parameters. The mainstream in statistics and econometrics
estimates these quantities based on the plug-in approach without consideration
of the main objective of the inferential situation. We propose the Bayesian
Minimum Expected Loss (MELO) approach focusing explicitly on the function of
interest, and calculating its frequentist variability. Asymptotic properties of
the MELO estimator are similar to the plug-in approach. Nevertheless,
simulation exercises show that our proposal is better in situations
characterized by small sample sizes and noisy models. In addition, we observe
in the applications that our approach gives lower standard errors than
frequently used alternatives when datasets are not very informative.
arXiv link: http://arxiv.org/abs/1809.06996v1
Estimating grouped data models with a binary dependent variable and fixed effects: What are the issues
dependent variable and want to include fixed effects (group specific
intercepts) in the specification, is Ordinary Least Squares (OLS) in any way
superior to a (conditional) logit form? In particular, what are the
consequences of using OLS instead of a fixed effects logit model with respect
to the latter dropping all units which show no variability in the dependent
variable while the former allows for estimation using all units. First, we show
that the discussion of fthe incidental parameters problem is based on an
assumption about the kinds of data being studied; for what appears to be the
common use of fixed effect models in political science the incidental
parameters issue is illusory. Turning to linear models, we see that OLS yields
a linear combination of the estimates for the units with and without variation
in the dependent variable, and so the coefficient estimates must be carefully
interpreted. The article then compares two methods of estimating logit models
with fixed effects, and shows that the Chamberlain conditional logit is as good
as or better than a logit analysis which simply includes group specific
intercepts (even though the conditional logit technique was designed to deal
with the incidental parameters problem!). Related to this, the article
discusses the estimation of marginal effects using both OLS and logit. While it
appears that a form of logit with fixed effects can be used to estimate
marginal effects, this method can be improved by starting with conditional
logit and then using the those parameter estimates to constrain the logit with
fixed effects model. This method produces estimates of sample average marginal
effects that are at least as good as OLS, and much better when group size is
small or the number of groups is large. .
arXiv link: http://arxiv.org/abs/1809.06505v1
Control Variables, Discrete Instruments, and Identification of Structural Functions
in econometric models with nonseparable and/or multidimensional heterogeneity.
We allow for discrete instruments, giving identification results under a
variety of restrictions on the way the endogenous variable and the control
variables affect the outcome. We consider many structural objects of interest,
such as average or quantile treatment effects. We illustrate our results with
an empirical application to Engel curve estimation.
arXiv link: http://arxiv.org/abs/1809.05706v2
On the Choice of Instruments in Mixed Frequency Specification Tests
frequencies. However, it ignores information possibly embedded in high
frequency. Mixed data sampling (MIDAS) regression models provide a concise way
to utilize the additional information in high-frequency variables. In this
paper, we propose a specification test to choose between time averaging and
MIDAS models, based on a Durbin-Wu-Hausman test. In particular, a set of
instrumental variables is proposed and theoretically validated when the
frequency ratio is large. As a result, our method tends to be more powerful
than existing methods, as reconfirmed through the simulations.
arXiv link: http://arxiv.org/abs/1809.05503v1
Automatic Debiased Machine Learning of Causal and Structural Effects
policy effects, average derivatives, regression decompositions, average
treatment effects, causal mediation, and parameters of economic structural
models. The regressions may be high dimensional, making machine learning
useful. Plugging machine learners into identifying equations can lead to poor
inference due to bias from regularization and/or model selection. This paper
gives automatic debiasing for linear and nonlinear functions of regressions.
The debiasing is automatic in using Lasso and the function of interest without
the full form of the bias correction. The debiasing can be applied to any
regression learner, including neural nets, random forests, Lasso, boosting, and
other high dimensional methods. In addition to providing the bias correction we
give standard errors that are robust to misspecification, convergence rates for
the bias correction, and primitive conditions for asymptotic inference for
estimators of a variety of estimators of structural and causal effects. The
automatic debiased machine learning is used to estimate the average treatment
effect on the treated for the NSW job training data and to estimate demand
elasticities from Nielsen scanner data while allowing preferences to be
correlated with prices and income.
arXiv link: http://arxiv.org/abs/1809.05224v5
Valid Simultaneous Inference in High-Dimensional Settings (with the hdm package for R)
in many research disciplines, valid simultaneous inference becomes more and
more important. For instance, high-dimensional settings might arise in economic
studies due to very rich data sets with many potential covariates or in the
analysis of treatment heterogeneities. Also the evaluation of potentially more
complicated (non-linear) functional forms of the regression relationship leads
to many potential variables for which simultaneous inferential statements might
be of interest. Here we provide a review of classical and modern methods for
simultaneous inference in (high-dimensional) settings and illustrate their use
by a case study using the R package hdm. The R package hdm implements valid
joint powerful and efficient hypothesis tests for a potentially large number of
coeffcients as well as the construction of simultaneous confidence intervals
and, therefore, provides useful methods to perform valid post-selection
inference based on the LASSO.
arXiv link: http://arxiv.org/abs/1809.04951v1
Bayesian shrinkage in mixture of experts models: Identifying robust determinants of class membership
proposed. We introduce a prior structure where information is taken from a set
of independent covariates. Robust class membership predictors are identified
using a normal gamma prior. The resulting model setup is used in a finite
mixture of Bernoulli distributions to find homogenous clusters of women in
Mozambique based on their information sources on HIV. Fully Bayesian inference
is carried out via the implementation of a Gibbs sampler.
arXiv link: http://arxiv.org/abs/1809.04853v2
Bootstrap Methods in Econometrics
test statistic by re-sampling the data or a model estimated from the data.
Under conditions that hold in a wide variety of econometric applications, the
bootstrap provides approximations to distributions of statistics, coverage
probabilities of confidence intervals, and rejection probabilities of
hypothesis tests that are more accurate than the approximations of first-order
asymptotic distribution theory. The reductions in the differences between true
and nominal coverage or rejection probabilities can be very large. In addition,
the bootstrap provides a way to carry out inference in certain settings where
obtaining analytic distributional approximations is difficult or impossible.
This article explains the usefulness and limitations of the bootstrap in
contexts of interest in econometrics. The presentation is informal and
expository. It provides an intuitive understanding of how the bootstrap works.
Mathematical details are available in references that are cited.
arXiv link: http://arxiv.org/abs/1809.04016v1
Regression Discontinuity Designs Using Covariates
estimation. We examine local polynomial estimators that include discrete or
continuous covariates in an additive separable way, but without imposing any
parametric restrictions on the underlying population regression functions. We
recommend a covariate-adjustment approach that retains consistency under
intuitive conditions, and characterize the potential for estimation and
inference improvements. We also present new covariate-adjusted mean squared
error expansions and robust bias-corrected inference procedures, with
heteroskedasticity-consistent and cluster-robust standard errors. An empirical
illustration and an extensive simulation study is presented. All methods are
implemented in R and Stata software packages.
arXiv link: http://arxiv.org/abs/1809.03904v1
Non-Asymptotic Inference in Instrumental Variables Estimation
variety of possibly nonlinear IV models under weak assumptions. The method is
non-asymptotic in the sense that it provides a finite sample bound on the
difference between the true and nominal probabilities of rejecting a correct
null hypothesis. The method is a non-Studentized version of the Anderson-Rubin
test but is motivated and analyzed differently. In contrast to the conventional
Anderson-Rubin test, the method proposed here does not require restrictive
distributional assumptions, linearity of the estimated model, or simultaneous
equations. Nor does it require knowledge of whether the instruments are strong
or weak. It does not require testing or estimating the strength of the
instruments. The method can be applied to quantile IV models that may be
nonlinear and can be used to test a parametric IV model against a nonparametric
alternative. The results presented here hold in finite samples, regardless of
the strength of the instruments.
arXiv link: http://arxiv.org/abs/1809.03600v1
Characteristic-Sorted Portfolios: Estimation and Inference
has been widely used to identify pricing anomalies. Despite its popularity,
little attention has been paid to the statistical properties of the procedure.
We develop a general framework for portfolio sorting by casting it as a
nonparametric estimator. We present valid asymptotic inference methods and a
valid mean square error expansion of the estimator leading to an optimal choice
for the number of portfolios. In practical settings, the optimal choice may be
much larger than the standard choices of 5 or 10. To illustrate the relevance
of our results, we revisit the size and momentum anomalies.
arXiv link: http://arxiv.org/abs/1809.03584v3
Bayesian dynamic variable selection in high dimensions
efficient posterior and predictive inference in time-varying parameter (TVP)
models. Within this context we specify a new dynamic variable/model selection
strategy for TVP dynamic regression models in the presence of a large number of
predictors. This strategy allows for assessing in individual time periods which
predictors are relevant (or not) for forecasting the dependent variable. The
new algorithm is evaluated numerically using synthetic data and its
computational advantages are established. Using macroeconomic data for the US
we find that regression models that combine time-varying parameters with the
information in many predictors have the potential to improve forecasts of price
inflation over a number of alternative forecasting models.
arXiv link: http://arxiv.org/abs/1809.03031v2
Change-Point Testing for Risk Measures in Time Series
estimators of expected shortfall and related risk measures in weakly dependent
time series. We can detect general multiple structural changes in the tails of
marginal distributions of time series under general assumptions.
Self-normalization allows us to avoid the issues of standard error estimation.
The theoretical foundations for our methods are functional central limit
theorems, which we develop under weak assumptions. An empirical study of S&P
500 and US Treasury bond returns illustrates the practical use of our methods
in detecting and quantifying instability in the tails of financial time series.
arXiv link: http://arxiv.org/abs/1809.02303v3
Efficient Difference-in-Differences Estimation with High-Dimensional Common Trend Confounding
under different assumptions on the relation between the treatment group
identifier, time and covariates for cross-sectional and panel data. The
variance lower bound is shown to be sensitive to the model assumptions imposed
implying a robustness-efficiency trade-off. The obtained efficient influence
functions lead to estimators that are rate double robust and have desirable
asymptotic properties under weak first stage convergence conditions. This
enables to use sophisticated machine-learning algorithms that can cope with
settings where common trend confounding is high-dimensional. The usefulness of
the proposed estimators is assessed in an empirical example. It is shown that
the efficiency-robustness trade-offs and the choice of first stage predictors
can lead to divergent empirical results in practice.
arXiv link: http://arxiv.org/abs/1809.01643v5
Shape-Enforcing Operators for Point and Interval Estimators
estimate and make inference on functions that satisfy shape restrictions. For
example, distribution functions are nondecreasing and range between zero and
one, height growth charts are nondecreasing in age, and production functions
are nondecreasing and quasi-concave in input quantities. We propose a method to
enforce these restrictions ex post on point and interval estimates of the
target function by applying functional operators. If an operator satisfies
certain properties that we make precise, the shape-enforced point estimates are
closer to the target function than the original point estimates and the
shape-enforced interval estimates have greater coverage and shorter length than
the original interval estimates. We show that these properties hold for six
different operators that cover commonly used shape restrictions in practice:
range, convexity, monotonicity, monotone convexity, quasi-convexity, and
monotone quasi-convexity. We illustrate the results with two empirical
applications to the estimation of a height growth chart for infants in India
and a production function for chemical firms in China.
arXiv link: http://arxiv.org/abs/1809.01038v5
Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs
local polynomial estimation and inference with a mean square error (MSE)
optimal bandwidth choice. This bandwidth yields an MSE-optimal RD treatment
effect estimator, but is by construction invalid for inference. Robust bias
corrected (RBC) inference methods are valid when using the MSE-optimal
bandwidth, but we show they yield suboptimal confidence intervals in terms of
coverage error. We establish valid coverage error expansions for RBC confidence
interval estimators and use these results to propose new inference-optimal
bandwidth choices for forming these intervals. We find that the standard
MSE-optimal bandwidth for the RD point estimator is too large when the goal is
to construct RBC confidence intervals with the smallest coverage error. We
further optimize the constant terms behind the coverage error to derive new
optimal choices for the auxiliary bandwidth required for RBC inference. Our
expansions also establish that RBC inference yields higher-order refinements
(relative to traditional undersmoothing) in the context of RD designs. Our main
results cover sharp and sharp kink RD designs under conditional
heteroskedasticity, and we discuss extensions to fuzzy and other RD designs,
clustered sampling, and pre-intervention covariates adjustments. The
theoretical findings are illustrated with a Monte Carlo experiment and an
empirical application, and the main methodological results are available in
R and Stata packages.
arXiv link: http://arxiv.org/abs/1809.00236v4
Identifying the Discount Factor in Dynamic Discrete Choice Models
shifts expected discounted future utilities, but not current utilities, as an
intuitive source of information on time preferences. We study the
identification of dynamic discrete choice models under such economically
motivated exclusion restrictions on primitive utilities. We show that each
exclusion restriction leads to an easily interpretable moment condition with
the discount factor as the only unknown parameter. The identified set of
discount factors that solves this condition is finite, but not necessarily a
singleton. Consequently, in contrast to common intuition, an exclusion
restriction does not in general give point identification. Finally, we show
that exclusion restrictions have nontrivial empirical content: The implied
moment conditions impose restrictions on choices that are absent from the
unconstrained model.
arXiv link: http://arxiv.org/abs/1808.10651v4
A Self-Attention Network for Hierarchical Data Structures with an Application to Claims Management
these claims are non-fraudulent, fraud detection is core for insurance
companies. The ultimate goal is a predictive model to single out the fraudulent
claims and pay out the non-fraudulent ones immediately. Modern machine learning
methods are well suited for this kind of problem. Health care claims often have
a data structure that is hierarchical and of variable length. We propose one
model based on piecewise feed forward neural networks (deep learning) and
another model based on self-attention neural networks for the task of claim
management. We show that the proposed methods outperform bag-of-words based
models, hand designed features, and models based on convolutional neural
networks, on a data set of two million health care claims. The proposed
self-attention method performs the best.
arXiv link: http://arxiv.org/abs/1808.10543v1
Uniform Inference in High-Dimensional Gaussian Graphical Models
dependencies within a large set of variables and are key for representing
causal structures. We provide results for uniform inference on high-dimensional
graphical models with the number of target parameters $d$ being possible much
larger than sample size. This is in particular important when certain features
or structures of a causal model should be recovered. Our results highlight how
in high-dimensional settings graphical models can be estimated and recovered
with modern machine learning methods in complex data sets. To construct
simultaneous confidence regions on many target parameters, sufficiently fast
estimation rates of the nuisance functions are crucial. In this context, we
establish uniform estimation rates and sparsity guarantees of the square-root
estimator in a random design under approximate sparsity conditions that might
be of independent interest for related problems in high-dimensions. We also
demonstrate in a comprehensive simulation study that our procedure has good
small sample properties.
arXiv link: http://arxiv.org/abs/1808.10532v2
House Price Modeling with Digital Census
In literature, house price modeling is based on socioeconomic variables from
traditional census, which is not real-time, dynamic and comprehensive. Inspired
by the emerging concept of "digital census" - using large-scale digital records
of human activities to measure urban population dynamics and socioeconomic
conditions, we introduce three typical datasets, namely 311 complaints, crime
complaints and taxi trips, into house price modeling. Based on the individual
housing sales data in New York City, we provide comprehensive evidence that
these digital census datasets can substantially improve the modeling
performances on both house price levels and changes, regardless whether
traditional census is included or not. Hence, digital census can serve as both
effective alternatives and complements to traditional census for house price
modeling.
arXiv link: http://arxiv.org/abs/1809.03834v1
Inference based on Kotlarski's Identity
However, how to conduct inference based on this popular identification approach
has been an open question for two decades. This paper addresses this open
problem by constructing a novel confidence band for the density function of a
latent variable in repeated measurement error model. The confidence band builds
on our finding that we can rewrite Kotlarski's identity as a system of linear
moment restrictions. The confidence band controls the asymptotic size uniformly
over a class of data generating processes, and it is consistent against all
fixed alternatives. Simulation studies support our theoretical results.
arXiv link: http://arxiv.org/abs/1808.09375v3
A Residual Bootstrap for Conditional Value-at-Risk
estimator of Francq and Zako\"ian (2015) associated with the conditional
Value-at-Risk. The bootstrap's consistency is proven for a general class of
volatility models and intervals are constructed for the conditional
Value-at-Risk. A simulation study reveals that the equal-tailed percentile
bootstrap interval tends to fall short of its nominal value. In contrast, the
reversed-tails bootstrap interval yields accurate coverage. We also compare the
theoretically analyzed fixed-design bootstrap with the recursive-design
bootstrap. It turns out that the fixed-design bootstrap performs equally well
in terms of average coverage, yet leads on average to shorter intervals in
smaller samples. An empirical application illustrates the interval estimation.
arXiv link: http://arxiv.org/abs/1808.09125v4
Tests for price indices in a dynamic item universe
consumer price index due to the underlying dynamic item universe. Traditionally
axiomatic tests are defined for a fixed universe. We propose five tests
explicitly formulated for a dynamic item universe, and motivate them both from
the perspectives of a cost-of-goods index and a cost-of-living index. None of
the indices satisfies all the tests at the same time, which are currently
available for making use of scanner data that comprises the whole item
universe. The set of tests provides a rigorous diagnostic for whether an index
is completely appropriate in a dynamic item universe, as well as pointing
towards the directions of possible remedies. We thus outline a large index
family that potentially can satisfy all the tests.
arXiv link: http://arxiv.org/abs/1808.08995v2
Supporting Crowd-Powered Science in Economics: FRACTI, a Conceptual Framework for Large-Scale Collaboration and Transparent Investigation in Financial Markets
to store, share, and replicate results and methods of experiments that are
often multidisciplinary and yield a massive amount of data. Given the
increasing complexity and growing interaction across diverse bodies of
knowledge it is becoming imperative to define a platform to properly support
collaborative research and track origin, accuracy and use of data. This paper
starts by defining a set of methods leveraging scientific principles and
advocating the importance of those methods in multidisciplinary, computer
intensive fields like computational finance. The next part of this paper
defines a class of systems called scientific support systems, vis-a-vis usages
in other research fields such as bioinformatics, physics and engineering. We
outline a basic set of fundamental concepts, and list our goals and motivation
for leveraging such systems to enable large-scale investigation, "crowd powered
science", in economics. The core of this paper provides an outline of FRACTI in
five steps. First we present definitions related to scientific support systems
intrinsic to finance and describe common characteristics of financial use
cases. The second step concentrates on what can be exchanged through the
definition of shareable entities called contributions. The third step is the
description of a classification system for building blocks of the conceptual
framework, called facets. The fourth step introduces the meta-model that will
enable provenance tracking and representation of data fragments and simulation.
Finally we describe intended cases of use to highlight main strengths of
FRACTI: application of the scientific method for investigation in computational
finance, large-scale collaboration and simulation.
arXiv link: http://arxiv.org/abs/1808.07959v1
Optimizing the tie-breaker regression discontinuity design
tie-breaker designs which are hybrids of randomized controlled trials (RCTs)
and regression discontinuity designs (RDDs). We quantify the statistical
efficiency of a tie-breaker design in which a proportion $\Delta$ of observed
subjects are in the RCT. In a two line regression, statistical efficiency
increases monotonically with $\Delta$, so efficiency is maximized by an RCT. We
point to additional advantages of tie-breakers versus RDD: for a nonparametric
regression the boundary bias is much less severe and for quadratic regression,
the variance is greatly reduced. For a two line model we can quantify the short
term value of the treatment allocation and this comparison favors smaller
$\Delta$ with the RDD being best. We solve for the optimal tradeoff between
these exploration and exploitation goals. The usual tie-breaker design applies
an RCT on the middle $\Delta$ subjects as ranked by the assignment variable. We
quantify the efficiency of other designs such as experimenting only in the
second decile from the top. We also show that in some general parametric models
a Monte Carlo evaluation can be replaced by matrix algebra.
arXiv link: http://arxiv.org/abs/1808.07563v3
Sensitivity Analysis using Approximate Moment Condition Models
show that near-optimal confidence intervals (CIs) can be formed by taking a
generalized method of moments (GMM) estimator, and adding and subtracting the
standard error times a critical value that takes into account the potential
bias from misspecification of the moment conditions. In order to optimize
performance under potential misspecification, the weighting matrix for this GMM
estimator takes into account this potential bias, and therefore differs from
the one that is optimal under correct specification. To formally show the
near-optimality of these CIs, we develop asymptotic efficiency bounds for
inference in the locally misspecified GMM setting. These bounds may be of
independent interest, due to their implications for the possibility of using
moment selection procedures when conducting inference in moment condition
models. We apply our methods in an empirical application to automobile demand,
and show that adjusting the weighting matrix can shrink the CIs by a factor of
3 or more.
arXiv link: http://arxiv.org/abs/1808.07387v5
Deep learning, deep change? Mapping the development of the Artificial Intelligence General Purpose Technology
are an important driver of economic growth and national and regional
competitiveness. In spite of this, the geography of their development and
diffusion has not received significant attention in the literature. We address
this with an analysis of Deep Learning (DL), a core technique in Artificial
Intelligence (AI) increasingly being recognized as the latest GPT. We identify
DL papers in a novel dataset from ArXiv, a popular preprints website, and use
CrunchBase, a technology business directory to measure industrial capabilities
related to it. After showing that DL conforms with the definition of a GPT,
having experienced rapid growth and diffusion into new fields where it has
generated an impact, we describe changes in its geography. Our analysis shows
China's rise in AI rankings and relative decline in several European countries.
We also find that initial volatility in the geography of DL has been followed
by consolidation, suggesting that the window of opportunity for new entrants
might be closing down as new DL research hubs become dominant. Finally, we
study the regional drivers of DL clustering. We find that competitive DL
clusters tend to be based in regions combining research and industrial
activities related to it. This could be because GPT developers and adopters
located close to each other can collaborate and share knowledge more easily,
thus overcoming coordination failures in GPT deployment. Our analysis also
reveals a Chinese comparative advantage in DL after we control for other
explanatory factors, perhaps underscoring the importance of access to data and
supportive policies for the successful development of this complex, `omni-use'
technology.
arXiv link: http://arxiv.org/abs/1808.06355v1
Quantifying the Computational Advantage of Forward Orthogonal Deviations
on the first-difference (FD) transformation is numerically equal to one-step
GMM based on the forward orthogonal deviations (FOD) transformation. However,
when the number of time periods ($T$) is not small, the FOD transformation
requires less computational work. This paper shows that the computational
complexity of the FD and FOD transformations increases with the number of
individuals ($N$) linearly, but the computational complexity of the FOD
transformation increases with $T$ at the rate $T^{4}$ increases, while the
computational complexity of the FD transformation increases at the rate $T^{6}$
increases. Simulations illustrate that calculations exploiting the FOD
transformation are performed orders of magnitude faster than those using the FD
transformation. The results in the paper indicate that, when one-step GMM based
on the FD and FOD transformations are the same, Monte Carlo experiments can be
conducted much faster if the FOD version of the estimator is used.
arXiv link: http://arxiv.org/abs/1808.05995v1
Estimation in a Generalization of Bivariate Probit Models with Dummy Endogenous Regressors
who use a class of bivariate threshold crossing models with dummy endogenous
variables. A common practice employed by the researchers is the specification
of the joint distribution of the unobservables as a bivariate normal
distribution, which results in a bivariate probit model. To address the problem
of misspecification in this practice, we propose an easy-to-implement
semiparametric estimation framework with parametric copula and nonparametric
marginal distributions. We establish asymptotic theory, including root-n
normality, for the sieve maximum likelihood estimators that can be used to
conduct inference on the individual structural parameters and the average
treatment effect (ATE). In order to show the practical relevance of the
proposed framework, we conduct a sensitivity analysis via extensive Monte Carlo
simulation exercises. The results suggest that the estimates of the parameters,
especially the ATE, are sensitive to parametric specification, while
semiparametric estimation exhibits robustness to underlying data generating
processes. We then provide an empirical illustration where we estimate the
effect of health insurance on doctor visits. In this paper, we also show that
the absence of excluded instruments may result in identification failure, in
contrast to what some practitioners believe.
arXiv link: http://arxiv.org/abs/1808.05792v2
When Do Households Invest in Solar Photovoltaics? An Application of Prospect Theory
the world, the policy tools that do so are still poorly understood, leading to
costly misadjustments in many cases. As a case study, the deployment dynamics
of residential solar photovoltaics (PV) invoked by the German feed-in tariff
legislation are investigated. Here we report a model showing that the question
of when people invest in residential PV systems is found to be not only
determined by profitability, but also by profitability's change compared to the
status quo. This finding is interpreted in the light of loss aversion, a
concept developed in Kahneman and Tversky's Prospect Theory. The model is able
to reproduce most of the dynamics of the uptake with only a few financial and
behavioral assumptions
arXiv link: http://arxiv.org/abs/1808.05572v1
Design-based Analysis in Difference-In-Differences Settings with Staggered Adoption
effects in a setting with panel data. We focus on the setting where units,
e.g., individuals, firms, or states, adopt the policy or treatment of interest
at a particular point in time, and then remain exposed to this treatment at all
times afterwards. We take a design perspective where we investigate the
properties of estimators and procedures given assumptions on the assignment
process. We show that under random assignment of the adoption date the standard
Difference-In-Differences estimator is is an unbiased estimator of a particular
weighted average causal effect. We characterize the proeperties of this
estimand, and show that the standard variance estimator is conservative.
arXiv link: http://arxiv.org/abs/1808.05293v3
Can GDP measurement be further improved? Data revision and reconciliation
U.S. real output (GDE) growth with income-side estimates (GDI) to improve
estimates of real GDP growth. We show how to incorporate information from
multiple releases of noisy data to provide more precise estimates while
avoiding some of the identifying assumptions required in earlier work. This
relies on a new insight: using multiple data releases allows us to distinguish
news and noise measurement errors in situations where a single vintage does
not.
Our new measure, GDP++, fits the data better than GDP+, the GDP growth
measure of Aruoba et al. (2016) published by the Federal Reserve Bank of
Philadephia. Historical decompositions show that GDE releases are more
informative than GDI, while the use of multiple data releases is particularly
important in the quarters leading up to the Great Recession.
arXiv link: http://arxiv.org/abs/1808.04970v1
A Unified Framework for Efficient Estimation of General Treatment Models
binary,multi-valued, continuous, as well as mixture of discrete and continuous
treatment, under the unconfounded treatment assignment. With a general loss
function, the framework includes the average, quantile and asymmetric least
squares causal effect of treatment as special cases. For this general
framework, we first derive the semiparametric efficiency bound for the causal
effect of treatment, extending the existing bound results to a wider class of
models. We then propose a generalized optimization estimation for the causal
effect with weights estimated by solving an expanding set of equations. Under
some sufficient conditions, we establish consistency and asymptotic normality
of the proposed estimator of the causal effect and show that the estimator
attains our semiparametric efficiency bound, thereby extending the existing
literature on efficient estimation of causal effect to a wider class of
applications. Finally, we discuss etimation of some causal effect functionals
such as the treatment effect curve and the average outcome. To evaluate the
finite sample performance of the proposed procedure, we conduct a small scale
simulation study and find that the proposed estimation has practical value. To
illustrate the applicability of the procedure, we revisit the literature on
campaign advertise and campaign contributions. Unlike the existing procedures
which produce mixed results, we find no evidence of campaign advertise on
campaign contribution.
arXiv link: http://arxiv.org/abs/1808.04936v2
Extrapolating Treatment Effects in Multi-Cutoff Regression Discontinuity Designs
of the most credible identification strategies for program evaluation and
causal inference. However, RD treatment effect estimands are necessarily local,
making statistical methods for the extrapolation of these effects a key area
for development. We introduce a new method for extrapolation of RD effects that
relies on the presence of multiple cutoffs, and is therefore design-based. Our
approach employs an easy-to-interpret identifying assumption that mimics the
idea of "common trends" in difference-in-differences designs. We illustrate our
methods with data on a subsidized loan program on post-education attendance in
Colombia, and offer new evidence on program effects for students with test
scores away from the cutoff that determined program eligibility.
arXiv link: http://arxiv.org/abs/1808.04416v3
Engineering and Economic Analysis for Electric Vehicle Charging Infrastructure --- Placement, Pricing, and Market Design
vehicle (EV) charging and the power system. We address three important issues
pertaining to EV charging and integration into the power system: (1) charging
station placement, (2) pricing policy and energy management strategy, and (3)
electricity trading market and distribution network design to facilitate
integrating EV and renewable energy source (RES) into the power system.
For charging station placement problem, we propose a multi-stage consumer
behavior based placement strategy with incremental EV penetration rates and
model the EV charging industry as an oligopoly where the entire market is
dominated by a few charging service providers (oligopolists). The optimal
placement policy for each service provider is obtained by solving a Bayesian
game.
For pricing and energy management of EV charging stations, we provide
guidelines for charging service providers to determine charging price and
manage electricity reserve to balance the competing objectives of improving
profitability, enhancing customer satisfaction, and reducing impact on the
power system. Two algorithms --- stochastic dynamic programming (SDP) algorithm
and greedy algorithm (benchmark algorithm) are applied to derive the pricing
and electricity procurement strategy.
We design a novel electricity trading market and distribution network, which
supports seamless RES integration, grid to vehicle (G2V), vehicle to grid
(V2G), vehicle to vehicle (V2V), and distributed generation (DG) and storage.
We apply a sharing economy model to the electricity sector to stimulate
different entities to exchange and monetize their underutilized electricity. A
fitness-score (FS)-based supply-demand matching algorithm is developed by
considering consumer surplus, electricity network congestion, and economic
dispatch.
arXiv link: http://arxiv.org/abs/1808.03897v1
BooST: Boosting Smooth Trees for Partial Effect Estimation in Nonlinear Regressions
regression called the Boosted Smooth Transition Regression Trees (BooST), which
is a combination of boosting algorithms with smooth transition regression
trees. The main advantage of the BooST model is the estimation of the
derivatives (partial effects) of very general nonlinear models. Therefore, the
model can provide more interpretation about the mapping between the covariates
and the dependent variable than other tree-based models, such as Random
Forests. We present several examples with both simulated and real data.
arXiv link: http://arxiv.org/abs/1808.03698v5
A Panel Quantile Approach to Attrition Bias in Big Data: Evidence from a Randomized Experiment
with individual heterogeneity and attrition. The method is motivated by the
fact that attrition bias is often encountered in Big Data applications. For
example, many users sign-up for the latest program but few remain active users
several months later, making the evaluation of such interventions inherently
very challenging. Building on earlier work by Hausman and Wise (1979), we
provide a simple identification strategy that leads to a two-step estimation
procedure. In the first step, the coefficients of interest in the selection
equation are consistently estimated using parametric or nonparametric methods.
In the second step, standard panel quantile methods are employed on a subset of
weighted observations. The estimator is computationally easy to implement in
Big Data applications with a large number of subjects. We investigate the
conditions under which the parameter estimator is asymptotically Gaussian and
we carry out a series of Monte Carlo simulations to investigate the finite
sample properties of the estimator. Lastly, using a simulation exercise, we
apply the method to the evaluation of a recent Time-of-Day electricity pricing
experiment inspired by the work of Aigner and Hausman (1980).
arXiv link: http://arxiv.org/abs/1808.03364v1
Change Point Estimation in Panel Data with Time-Varying Individual Effects
data models with unobserved individual effects via ordinary least-squares
(OLS). Typically, in this setting, the OLS slope estimators are inconsistent
due to the unobserved individual effects bias. As a consequence, existing
methods remove the individual effects before change point estimation through
data transformations such as first-differencing. We prove that under reasonable
assumptions, the unobserved individual effects bias has no impact on the
consistent estimation of change points. Our simulations show that since our
method does not remove any variation in the dataset before change point
estimation, it performs better in small samples compared to first-differencing
methods. We focus on short panels because they are commonly used in practice,
and allow for the unobserved individual effects to vary over time. Our method
is illustrated via two applications: the environmental Kuznets curve and the
U.S. house price expectations after the financial crisis.
arXiv link: http://arxiv.org/abs/1808.03109v1
Machine Learning for Dynamic Discrete Choice
its dimension in order to achieve valid inference. I propose a novel two-stage
estimator for the set-identified structural parameter that incorporates a
high-dimensional state space into the dynamic model of imperfect competition.
In the first stage, I estimate the state variable's law of motion and the
equilibrium policy function using machine learning tools. In the second stage,
I plug the first-stage estimates into a moment inequality and solve for the
structural parameter. The moment function is presented as the sum of two
components, where the first one expresses the equilibrium assumption and the
second one is a bias correction term that makes the sum insensitive (i.e.,
orthogonal) to first-stage bias. The proposed estimator uniformly converges at
the root-N rate and I use it to construct confidence regions. The results
developed here can be used to incorporate high-dimensional state space into
classic dynamic discrete choice models, for example, those considered in Rust
(1987), Bajari et al. (2007), and Scott (2013).
arXiv link: http://arxiv.org/abs/1808.02569v2
Coverage Error Optimal Confidence Intervals for Local Polynomial Regression
polynomial regression methods under random sampling. We prove Edgeworth
expansions for $t$ statistics and coverage error expansions for interval
estimators that (i) hold uniformly in the data generating process, (ii) allow
for the uniform kernel, and (iii) cover estimation of derivatives of the
regression function. The terms of the higher-order expansions, and their
associated rates as a function of the sample size and bandwidth sequence,
depend on the smoothness of the population regression function, the smoothness
exploited by the inference procedure, and on whether the evaluation point is in
the interior or on the boundary of the support. We prove that robust bias
corrected confidence intervals have the fastest coverage error decay rates in
all cases, and we use our results to deliver novel, inference-optimal bandwidth
selectors. The main methodological results are implemented in companion
R and Stata software packages.
arXiv link: http://arxiv.org/abs/1808.01398v4
A Theory of Dichotomous Valuation with Applications to Variable Selection
new variable to the model, and a marginal loss if we remove an existing
variable from the model. Assuming equality of opportunity among all candidate
variables, we derive a valuation framework by the expected marginal gain and
marginal loss in all potential modeling scenarios. However, marginal gain and
loss are not symmetric; thus, we introduce three unbiased solutions. When used
in variable selection, our new approaches significantly outperform several
popular methods used in practice. The results also explore some novel traits of
the Shapley value.
arXiv link: http://arxiv.org/abs/1808.00131v5
On the Unbiased Asymptotic Normality of Quantile Regression with Fixed Effects
important set of tools for describing microeconometric data. In a large class
of such models (including probit, proportional hazard and quantile regression
to name just a few) it is impossible to difference out individual effects, and
inference is usually justified in a `large n large T' asymptotic framework.
However, there is a considerable gap in the type of assumptions that are
currently imposed in models with smooth score functions (such as probit, and
proportional hazard) and quantile regression. In the present paper we show that
this gap can be bridged and establish asymptotic unbiased normality for
quantile regression panels under conditions on n,T that are very close to what
is typically assumed in standard nonlinear panels. Our results considerably
improve upon existing theory and show that quantile regression is applicable to
the same type of panel data (in terms of n,T) as other commonly used nonlinear
panel data models. Thorough numerical experiments confirm our theoretical
findings.
arXiv link: http://arxiv.org/abs/1807.11863v2
The econometrics of happiness: Are we underestimating the returns to education and income?
inherent to surveys of human feelings and opinions in which subjective
responses are elicited on numerical scales. The paper also proposes a solution.
The problem is a tendency by some individuals -- particularly those with low
levels of education -- to simplify the response scale by considering only a
subset of possible responses such as the lowest, middle, and highest. In
principle, this “focal value rounding” (FVR) behavior renders invalid even
the weak ordinality assumption often used in analysis of such data. With
“happiness” or life satisfaction data as an example, descriptive methods and
a multinomial logit model both show that the effect is large and that education
and, to a lesser extent, income level are predictors of FVR behavior.
A model simultaneously accounting for the underlying wellbeing and for the
degree of FVR is able to estimate the latent subjective wellbeing, i.e. the
counterfactual full-scale responses for all respondents, the biases associated
with traditional estimates, and the fraction of respondents who exhibit FVR.
Addressing this problem helps to resolve a longstanding puzzle in the life
satisfaction literature, namely that the returns to education, after adjusting
for income, appear to be small or negative. Due to the same econometric
problem, the marginal utility of income in a subjective wellbeing sense has
been consistently underestimated.
arXiv link: http://arxiv.org/abs/1807.11835v3
Local Linear Forests
limited in their ability to fit smooth signals, and can show poor predictive
performance in the presence of strong, smooth effects. Taking the perspective
of random forests as an adaptive kernel method, we pair the forest kernel with
a local linear regression adjustment to better capture smoothness. The
resulting procedure, local linear forests, enables us to improve on asymptotic
rates of convergence for random forests with smooth signals, and provides
substantial gains in accuracy on both real and simulated data. We prove a
central limit theorem valid under regularity conditions on the forest and
smoothness constraints, and propose a computationally efficient construction
for confidence intervals. Moving to a causal inference application, we discuss
the merits of local regression adjustments for heterogeneous treatment effect
estimation, and give an example on a dataset exploring the effect word choice
has on attitudes to the social safety net. Last, we include simulation results
on real and generated data.
arXiv link: http://arxiv.org/abs/1807.11408v4
Two-Step Estimation and Inference with Possibly Many Included Covariates
estimate entering a two-step estimation procedure. We find that a first order
bias emerges when the number of included covariates is "large"
relative to the square-root of sample size, rendering standard inference
procedures invalid. We show that the jackknife is able to estimate this "many
covariates" bias consistently, thereby delivering a new automatic
bias-corrected two-step point estimator. The jackknife also consistently
estimates the standard error of the original two-step point estimator. For
inference, we develop a valid post-bias-correction bootstrap approximation that
accounts for the additional variability introduced by the jackknife
bias-correction. We find that the jackknife bias-corrected point estimator and
the bootstrap post-bias-correction inference perform excellent in simulations,
offering important improvements over conventional two-step point estimators and
inference procedures, which are not robust to including many covariates. We
apply our results to an array of distinct treatment effect, policy evaluation,
and other applied microeconomics settings. In particular, we discuss production
function and marginal treatment effect estimation in detail.
arXiv link: http://arxiv.org/abs/1807.10100v1
Score Permutation Based Finite Sample Inference for Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) Models
that the variance of a process changes over time, is the Generalized
AutoRegressive Conditional Heteroskedasticity (GARCH) model, which is
especially important for economics and finance. GARCH models are typically
estimated by the Quasi-Maximum Likelihood (QML) method, which works under mild
statistical assumptions. Here, we suggest a finite sample approach, called
ScoPe, to construct distribution-free confidence regions around the QML
estimate, which have exact coverage probabilities, despite no additional
assumptions about moments are made. ScoPe is inspired by the recently developed
Sign-Perturbed Sums (SPS) method, which however cannot be applied in the GARCH
case. ScoPe works by perturbing the score function using randomly permuted
residuals. This produces alternative samples which lead to exact confidence
regions. Experiments on simulated and stock market data are also presented, and
ScoPe is compared with the asymptotic theory and bootstrap approaches.
arXiv link: http://arxiv.org/abs/1807.08390v1
EMU and ECB Conflicts
according to the exchange Rate of payment of fixed rates and fixed rates of
fixed income (EMU) convergence criteria such that the public debt / GDP ratio
The method consists of calculating private public debt management in a public
debt management system purpose there is no mechanism to allow naturally for
this adjustment.
arXiv link: http://arxiv.org/abs/1807.08097v1
Asymptotic results under multiway clustering
economics, surprisingly few theoretical results justify this practice. This
paper aims to fill this gap. We first prove, under nearly the same conditions
as with i.i.d. data, the weak convergence of empirical processes under multiway
clustering. This result implies central limit theorems for sample averages but
is also key for showing the asymptotic normality of nonlinear estimators such
as GMM estimators. We then establish consistency of various asymptotic variance
estimators, including that of Cameron et al. (2011) but also a new estimator
that is positive by construction. Next, we show the general consistency, for
linear and nonlinear estimators, of the pigeonhole bootstrap, a resampling
scheme adapted to multiway clustering. Monte Carlo simulations suggest that
inference based on our two preferred methods may be accurate even with very few
clusters, and significantly improve upon inference based on Cameron et al.
(2011).
arXiv link: http://arxiv.org/abs/1807.07925v2
Stability in EMU
of recurring controversy. First, there is debate about the role and impact of
these criteria in the initial phase of the introduction of the single currency.
Secondly, it must be specified how these will then be applied, in a permanent
regime, when the single currency is well established.
arXiv link: http://arxiv.org/abs/1807.07730v1
Machine Learning Classifiers Do Not Improve the Prediction of Academic Risk: Evidence from Australia
prediction. In the prediction of academic achievement, ML models have not shown
substantial improvement over logistic regression. So far, these results have
almost entirely focused on college achievement, due to the availability of
administrative datasets, and have contained relatively small sample sizes by ML
standards. In this article we apply popular machine learning models to a large
dataset ($n=1.2$ million) containing primary and middle school performance on a
standardized test given annually to Australian students. We show that machine
learning models do not outperform logistic regression for detecting students
who will perform in the `below standard' band of achievement upon sitting their
next test, even in a large-$n$ setting.
arXiv link: http://arxiv.org/abs/1807.07215v4
Take a Look Around: Using Street View and Satellite Images to Estimate House Prices
structural features, its accessibility to work, and the neighborhood amenities.
Some amenities, such as air quality, are measurable while others, such as the
prestige or the visual impression of a neighborhood, are difficult to quantify.
Despite the well-known impacts intangible housing features have on house
prices, limited attention has been given to systematically quantifying these
difficult to measure amenities. Two issues have led to this neglect. Not only
do few quantitative methods exist that can measure the urban environment, but
that the collection of such data is both costly and subjective.
We show that street image and satellite image data can capture these urban
qualities and improve the estimation of house prices. We propose a pipeline
that uses a deep neural network model to automatically extract visual features
from images to estimate house prices in London, UK. We make use of traditional
housing features such as age, size, and accessibility as well as visual
features from Google Street View images and Bing aerial images in estimating
the house price model. We find encouraging results where learning to
characterize the urban quality of a neighborhood improves house price
prediction, even when generalizing to previously unseen London boroughs.
We explore the use of non-linear vs. linear methods to fuse these cues with
conventional models of house pricing, and show how the interpretability of
linear models allows us to directly extract proxy variables for visual
desirability of neighborhoods that are both of interest in their own right, and
could be used as inputs to other econometric methods. This is particularly
valuable as once the network has been trained with the training data, it can be
applied elsewhere, allowing us to generate vivid dense maps of the visual
appeal of London streets.
arXiv link: http://arxiv.org/abs/1807.07155v2
A New Index of Human Capital to Predict Economic Growth
that often relates to nations economic growth. Such a relationship, however, is
misleading when the proxy of such accumulation is the average years of
education. In this paper, we show that the predictive power of this proxy
started to dwindle in 1990 when nations schooling began to homogenized. We
propose a metric of human capital that is less sensitive than average years of
education and remains as a significant predictor of economic growth when tested
with both cross-section data and panel data. We argue that future research on
economic growth will discard educational variables based on quantity as
predictor given the thresholds that these variables are reaching.
arXiv link: http://arxiv.org/abs/1807.07051v1
Cross Validation Based Model Selection via Generalized Method of Moments
a large class of structural models are estimated through the generalized method
of moments (GMM). Traditionally, selection of structural models has been
performed based on model fit upon estimation, which take the entire observed
samples. In this paper, we propose a model selection procedure based on
cross-validation (CV), which utilizes sample-splitting technique to avoid
issues such as over-fitting. While CV is widely used in machine learning
communities, we are the first to prove its consistency in model selection in
GMM framework. Its empirical property is compared to existing methods by
simulations of IV regressions and oligopoly market model. In addition, we
propose the way to apply our method to Mathematical Programming of Equilibrium
Constraint (MPEC) approach. Finally, we perform our method to online-retail
sales data to compare dynamic market model to static model.
arXiv link: http://arxiv.org/abs/1807.06993v1
Quantile-Regression Inference With Adaptive Control of Size
densities of the response variable given regressors. This paper develops a new
estimate of the asymptotic variance of regression quantiles that leads any
resulting Wald-type test or confidence region to behave as well in large
samples as its infeasible counterpart in which the true conditional response
densities are embedded. We give explicit guidance on implementing the new
variance estimator to control adaptively the size of any resulting Wald-type
test. Monte Carlo evidence indicates the potential of our approach to deliver
powerful tests of heterogeneity of quantile treatment effects in covariates
with good size performance over different quantile levels, data-generating
processes and sample sizes. We also include an empirical example. Supplementary
material is available online.
arXiv link: http://arxiv.org/abs/1807.06977v2
Pink Work: Same-Sex Marriage, Employment and Discrimination
affected gay and lesbian couples in the labor market. Results from a
difference-in-difference model show that both partners in same-sex couples were
more likely to be employed, to have a full-time contract, and to work longer
hours in states that legalized same-sex marriage. In line with a theoretical
search model of discrimination, suggestive empirical evidence supports the
hypothesis that marriage equality led to an improvement in employment outcomes
among gays and lesbians and lower occupational segregation thanks to a decrease
in discrimination towards sexual minorities.
arXiv link: http://arxiv.org/abs/1807.06698v1
Limit Theorems for Factor Models
valid inference in factor models. We consider a setting where many
counties/regions/assets are observed for many time periods, and when estimation
of a global parameter includes aggregation of a cross-section of heterogeneous
micro-parameters estimated separately for each entity. The central limit
theorem applies for quantities involving both cross-sectional and time series
aggregation, as well as for quadratic forms in time-aggregated errors. The
paper studies the conditions when one can consistently estimate the asymptotic
variance, and proposes a bootstrap scheme for cases when one cannot. A small
simulation study illustrates performance of the asymptotic and bootstrap
procedures. The results are useful for making inferences in two-step estimation
procedures related to factor models, as well as in other related contexts. Our
treatment avoids structural modeling of cross-sectional dependence but imposes
time-series independence.
arXiv link: http://arxiv.org/abs/1807.06338v3
A Simple and Efficient Estimation of the Average Treatment Effect in the Presence of Unmeasured Confounders
the average treatment effect when some confounders are unmeasured. Under their
identification condition, they showed that the semiparametric efficient
influence function depends on five unknown functionals. They proposed to
parameterize all functionals and estimate the average treatment effect from the
efficient influence function by replacing the unknown functionals with
estimated functionals. They established that their estimator is consistent when
certain functionals are correctly specified and attains the semiparametric
efficiency bound when all functionals are correctly specified. In applications,
it is likely that those functionals could all be misspecified. Consequently
their estimator could be inconsistent or consistent but not efficient. This
paper presents an alternative estimator that does not require parameterization
of any of the functionals. We establish that the proposed estimator is always
consistent and always attains the semiparametric efficiency bound. A simple and
intuitive estimator of the asymptotic variance is presented, and a small scale
simulation study reveals that the proposed estimation outperforms the existing
alternatives in finite samples.
arXiv link: http://arxiv.org/abs/1807.05678v1
Analysis of a Dynamic Voluntary Contribution Mechanism Public Good Game
derive its potential outcomes. In each period, players endogenously determine
contribution productivity by engaging in costly investment. The level of
contribution productivity carries from period to period, creating a dynamic
link between periods. The investment mimics investing in the stock of
technology for producing public goods such as national defense or a clean
environment. After investing, players decide how much of their remaining money
to contribute to provision of the public good, as in traditional public good
games. I analyze three kinds of outcomes of the game: the lowest payoff
outcome, the Nash Equilibria, and socially optimal behavior. In the lowest
payoff outcome, all players receive payoffs of zero. Nash Equilibrium occurs
when players invest any amount and contribute all or nothing depending on the
contribution productivity. Therefore, there are infinitely many Nash Equilibria
strategies. Finally, the socially optimal result occurs when players invest
everything in early periods, then at some point switch to contributing
everything. My goal is to discover and explain this point. I use mathematical
analysis and computer simulation to derive the results.
arXiv link: http://arxiv.org/abs/1807.04621v2
Heterogeneous Effects of Unconventional Monetary Policy on Loan Demand and Supply. Insights from the Bank Lending Survey
the euro area, providing evidence that the channel is indeed working. The
analysis of the transmission mechanism is based on structural impulse responses
to an unconventional monetary policy shock on bank loans. The Bank Lending
Survey (BLS) is exploited in order to get insights on developments of loan
demand and supply. The contribution of this paper is to use country-specific
data to analyze the consequences of unconventional monetary policy, instead of
taking an aggregate stance by using euro area data. This approach provides a
deeper understanding of the bank lending channel and its effects. That is, an
expansionary monetary policy shock leads to an increase in loan demand, supply
and output growth. A small north-south disparity between the countries can be
observed.
arXiv link: http://arxiv.org/abs/1807.04161v1
Factor models with many assets: strong factors, weak factors, and the two-pass procedure
pricing models. Typically, the data used in the empirical literature are
characterized by weakness of some pricing factors, strong cross-sectional
dependence in the errors, and (moderately) high cross-sectional dimensionality.
Using an asymptotic framework where the number of assets/portfolios grows with
the time span of the data while the risk exposures of weak factors are
local-to-zero, we show that the conventional two-pass estimation procedure
delivers inconsistent estimates of the risk premia. We propose a new estimation
procedure based on sample-splitting instrumental variables regression. The
proposed estimator of risk premia is robust to weak included factors and to the
presence of strong unaccounted cross-sectional error dependence. We derive the
many-asset weak factor asymptotic distribution of the proposed estimator, show
how to construct its standard errors, verify its performance in simulations,
and revisit some empirical studies.
arXiv link: http://arxiv.org/abs/1807.04094v2
Clustering Macroeconomic Time Series
many fields. However, as an unsupervised learning method, it requires making
choices that are nontrivially influenced by the nature of the data involved.
The aim of this paper is to verify usefulness of the time series clustering
method for macroeconomics research, and to develop the most suitable
methodology.
By extensively testing various possibilities, we arrive at a choice of a
dissimilarity measure (compression-based dissimilarity measure, or CDM) which
is particularly suitable for clustering macroeconomic variables. We check that
the results are stable in time and reflect large-scale phenomena such as
crises. We also successfully apply our findings to analysis of national
economies, specifically to identifying their structural relations.
arXiv link: http://arxiv.org/abs/1807.04004v2
Simulation Modelling of Inequality in Cancer Service Access
exercise in assessing spatial inequality in cancer service access in regional
areas. We propose a mathematical model for accessing chemotherapy among local
government areas (LGAs). Our model incorporates a distance factor. With a
simulation we report results for a single inequality measure: the Lorenz curve
is depicted for our illustrative data. We develop this approach in order to
move incrementally towards its application to actual data and real-world health
service regions. We seek to develop the exercises that can lead policy makers
to relevant policy information on the most useful data collections to be
collected and modeling for cancer service access in regional areas.
arXiv link: http://arxiv.org/abs/1807.03048v1
Cancer Risk Messages: Public Health and Economic Welfare
85" have appeared in public spaces. The meaning drawn from such statements
affects economic welfare, not just public health. Both markets and government
use risk information on all kinds of risks, useful information can, in turn,
improve economic welfare, however inaccuracy can lower it. We adapt the
contingency table approach so that a quoted risk is cross-classified with the
states of nature. We show that bureaucratic objective functions regarding the
accuracy of a reported cancer risk can then be stated.
arXiv link: http://arxiv.org/abs/1807.03045v2
Cancer Risk Messages: A Light Bulb Model
in y people gets cancer by age z" can be improved. One assumption commonly
invoked is that there is no other cause of death, a confusing assumption. We
develop a light bulb model to clarify cumulative risk and we use Markov chain
modeling, incorporating the assumption widely in place, to evaluate transition
probabilities. Age-progression in the cancer risk is then reported on
Australian data. Future modelling can elicit realistic assumptions.
arXiv link: http://arxiv.org/abs/1807.03040v2
Transaction costs and institutional change of trade litigations in Bulgaria
costs of trade litigations in Bulgaria are used in the current paper. For the
needs of the research, an indicative model, measuring this type of costs on
microeconomic level, is applied in the study. The main purpose of the model is
to forecast the rational behavior of trade litigation parties in accordance
with the transaction costs in the process of enforcing the execution of the
signed commercial contract. The application of the model is related to the more
accurate measurement of the transaction costs on microeconomic level, which
fact could lead to better prediction and management of these costs in order
market efficiency and economic growth to be achieved. In addition, it is made
an attempt to be analysed the efficiency of the institutional change of the
commercial justice system and the impact of the reform of the judicial system
over the economic turnover. The augmentation or lack of reduction of the
transaction costs in trade litigations would mean inefficiency of the reform of
the judicial system. JEL Codes: O43, P48, D23, K12
arXiv link: http://arxiv.org/abs/1807.03034v1
Measurement Errors as Bad Leverage Points
and progress depends in part on new identifying assumptions. I characterize
measurement error as bad-leverage points and assume that fewer than half the
sample observations are heavily contaminated, in which case a high-breakdown
robust estimator may be able to isolate and down weight or discard the
problematic data. In simulations of simple and multiple regression where eiv
affects 25% of the data and R-squared is mediocre, certain high-breakdown
estimators have small bias and reliable confidence intervals.
arXiv link: http://arxiv.org/abs/1807.02814v2
Maximizing Welfare in Social Networks under a Utility Driven Influence Diffusion Model
maximization (IM) has been extensively studied in the literature. The goal is
to select a small number of users to adopt an item such that it results in a
large cascade of adoptions by others. Existing works have three key
limitations. (1) They do not account for economic considerations of a user in
buying/adopting items. (2) Most studies on multiple items focus on competition,
with complementary items receiving limited attention. (3) For the network
owner, maximizing social welfare is important to ensure customer loyalty, which
is not addressed in prior work in the IM literature. In this paper, we address
all three limitations and propose a novel model called UIC that combines
utility-driven item adoption with influence propagation over networks. Focusing
on the mutually complementary setting, we formulate the problem of social
welfare maximization in this novel setting. We show that while the objective
function is neither submodular nor supermodular, surprisingly a simple greedy
allocation algorithm achieves a factor of $(1-1/e-\epsilon)$ of the optimum
expected social welfare. We develop bundleGRD, a scalable version of
this approximation algorithm, and demonstrate, with comprehensive experiments
on real and synthetic datasets, that it significantly outperforms all
baselines.
arXiv link: http://arxiv.org/abs/1807.02502v2
Autoregressive Wild Bootstrap Inference for Nonparametric Trends
confidence bands around a smooth deterministic trend. The bootstrap method is
easy to implement and does not require any adjustments in the presence of
missing data, which makes it particularly suitable for climatological
applications. We establish the asymptotic validity of the bootstrap method for
both pointwise and simultaneous confidence bands under general conditions,
allowing for general patterns of missing data, serial dependence and
heteroskedasticity. The finite sample properties of the method are studied in a
simulation study. We use the method to study the evolution of trends in daily
measurements of atmospheric ethane obtained from a weather station in the Swiss
Alps, where the method can easily deal with the many missing observations due
to adverse weather conditions.
arXiv link: http://arxiv.org/abs/1807.02357v2
State-Varying Factor Models of Large Dimensions
large dimensions. Unlike constant factor models, loadings are general functions
of some recurrent state process. We develop an estimator for the latent factors
and state-varying loadings under a large cross-section and time dimension. Our
estimator combines nonparametric methods with principal component analysis. We
derive the rate of convergence and limiting normal distribution for the
factors, loadings and common components. In addition, we develop a statistical
test for a change in the factor structure in different states. We apply the
estimator to U.S. Treasury yields and S&P500 stock returns. The systematic
factor structure in treasury yields differs in times of booms and recessions as
well as in periods of high market volatility. State-varying factors based on
the VIX capture significantly more variation and pricing information in
individual stocks than constant factor models.
arXiv link: http://arxiv.org/abs/1807.02248v4
Minimizing Sensitivity to Model Misspecification
misspecified. We rely on a local asymptotic approach where the degree of
misspecification is indexed by the sample size. We construct estimators whose
mean squared error is minimax in a neighborhood of the reference model, based
on one-step adjustments. In addition, we provide confidence intervals that
contain the true parameter under local misspecification. As a tool to interpret
the degree of misspecification, we map it to the local power of a specification
test of the reference model. Our approach allows for systematic sensitivity
analysis when the parameter of interest may be partially or irregularly
identified. As illustrations, we study three applications: an empirical
analysis of the impact of conditional cash transfers in Mexico where
misspecification stems from the presence of stigma effects of the program, a
cross-sectional binary choice model where the error distribution is
misspecified, and a dynamic panel data binary choice model where the number of
time periods is small and the distribution of individual effects is
misspecified.
arXiv link: http://arxiv.org/abs/1807.02161v6
Fixed Effects and the Generalized Mundlak Estimator
observational studies with unobserved group-level heterogeneity. We consider a
general model with group-level unconfoundedness and provide conditions under
which aggregate balancing statistics -- group-level averages of functions of
treatments and covariates -- are sufficient to eliminate differences between
groups. Building on these results, we reinterpret commonly used linear
fixed-effect regression estimators by writing them in the Mundlak form as
linear regression estimators without fixed effects but including group
averages. We use this representation to develop Generalized Mundlak Estimators
(GMEs) that capture group differences through group averages of (functions of)
the unit-level variables and adjust for these group differences in flexible and
robust ways in the spirit of the modern causal literature.
arXiv link: http://arxiv.org/abs/1807.02099v9
On the Identifying Content of Instrument Monotonicity
assumption of Imbens and Angrist (1994) on the distribution of potential
outcomes in a model with a binary outcome, a binary treatment and an exogenous
binary instrument. Specifically, I derive necessary and sufficient conditions
on the distribution of the data under which the identified set for the
distribution of potential outcomes when the instrument monotonicity assumption
is imposed can be a strict subset of that when it is not imposed.
arXiv link: http://arxiv.org/abs/1807.01661v2
Indirect inference through prediction
minimization and by using regularized regressions, we can bypass the three
major problems of estimation: selecting the summary statistics, defining the
distance function and minimizing it numerically. By substituting regression
with classification we can extend this approach to model selection as well. We
present three examples: a statistical fit, the parametrization of a simple real
business cycle model and heuristics selection in a fishery agent-based model.
The outcome is a method that automatically chooses summary statistics, weighs
them and use them to parametrize models without running any direct
minimization.
arXiv link: http://arxiv.org/abs/1807.01579v1
Bring a friend! Privately or Publicly?
the type of communication channels among consumers. The seller faces a
partially uninformed population of consumers, interconnected through a directed
social network. In the network, the seller offers rewards to informed consumers
(influencers) conditional on inducing purchases by uninformed consumers
(influenced). Rewards are needed to bear a communication cost and to induce
word-of-mouth (WOM) either privately (cost-per-contact) or publicly (fixed cost
to inform all friends). From the seller's viewpoint, eliciting Private WOM is
more costly than eliciting Public WOM. We investigate (i) the incentives for
the seller to move to a denser network, inducing either Private or Public WOM
and (ii) the optimal mix between the two types of communication. A denser
network is found to be always better, not only for information diffusion but
also for seller's profits, as long as Private WOM is concerned. Differently,
under Public WOM, the seller may prefer an environment with less competition
between informed consumers and the presence of highly connected influencers
(hubs) is the main driver to make network density beneficial to profits. When
the seller is able to discriminate between Private and Public WOM, the optimal
strategy is to cheaply incentivize the more connected people to pass on the
information publicly and then offer a high bonus for Private WOM.
arXiv link: http://arxiv.org/abs/1807.01994v2
Stochastic model specification in Markov switching vector error correction models
model specification in Markov switching vector error correction models. We
assume that a common distribution gives rise to the regime-specific regression
coefficients. The mean as well as the variances of this distribution are
treated as fully stochastic and suitable shrinkage priors are used. These
shrinkage priors enable to assess which coefficients differ across regimes in a
flexible manner. In the case of similar coefficients, our model pushes the
respective regions of the parameter space towards the common distribution. This
allows for selecting a parsimonious model while still maintaining sufficient
flexibility to control for sudden shifts in the parameters, if necessary. We
apply our modeling approach to real-time Euro area data and assume transition
probabilities between expansionary and recessionary regimes to be driven by the
cointegration errors. The results suggest that the regime allocation is
governed by a subset of short-run adjustment coefficients and regime-specific
variance-covariance matrices. These findings are complemented by an
out-of-sample forecast exercise, illustrating the advantages of the model for
predicting Euro area inflation in real time.
arXiv link: http://arxiv.org/abs/1807.00529v2
Maastricht and Monetary Cooperation
regard to international monetary cooperation. Even though the institutional and
intellectual assistance to the coordination of monetary policy in the EU will
probably be strengthened with the EMU, among the shortcomings of the Maastricht
Treaty concerns the relationship between the founder members and those
countries who wish to remain outside monetary union.
arXiv link: http://arxiv.org/abs/1807.00419v1
The Bretton Woods Experience and ERM
made with the current evolution of the EMS.
arXiv link: http://arxiv.org/abs/1807.00418v1
Subvector Inference in Partially Identified Models with Many Moment Inequalities
partially identified model with many moment inequalities. This framework allows
the number of moment conditions to grow with the sample size, possibly at
exponential rates. Our main motivating application is subvector inference,
i.e., inference on a single component of the partially identified parameter
vector associated with a treatment effect or a policy variable of interest.
Our inference method compares a MinMax test statistic (minimum over
parameters satisfying $H_0$ and maximum over moment inequalities) against
critical values that are based on bootstrap approximations or analytical
bounds. We show that this method controls asymptotic size uniformly over a
large class of data generating processes despite the partially identified many
moment inequality setting. The finite sample analysis allows us to obtain
explicit rates of convergence on the size control. Our results are based on
combining non-asymptotic approximations and new high-dimensional central limit
theorems for the MinMax of the components of random matrices. Unlike the
previous literature on functional inference in partially identified models, our
results do not rely on weak convergence results based on Donsker's class
assumptions and, in fact, our test statistic may not even converge in
distribution. Our bootstrap approximation requires the choice of a tuning
parameter sequence that can avoid the excessive concentration of our test
statistic. To this end, we propose an asymptotically valid data-driven method
to select this tuning parameter sequence. This method generalizes the selection
of tuning parameter sequences to problems outside the Donsker's class
assumptions and may also be of independent interest. Our procedures based on
self-normalized moderate deviation bounds are relatively more conservative but
easier to implement.
arXiv link: http://arxiv.org/abs/1806.11466v1
Quantitative analysis on the disparity of regional economic development in China and its evolution from 1952 to 2000
disparity and its evolution in China, but there is a big difference in
conclusions. What is the reason for this? We think it is mainly due to
different analytic approaches, perspectives, spatial units, statistical
indicators and different periods for studies. On the basis of previous analyses
and findings, we have done some further quantitative computation and empirical
study, and revealed the inter-provincial disparity and regional disparity of
economic development and their evolution trends from 1952-2000. The results
shows that (a) Regional disparity in economic development in China, including
the inter-provincial disparity, inter-regional disparity and intra-regional
disparity, has existed for years; (b) Gini coefficient and Theil coefficient
have revealed a similar dynamic trend for comparative disparity in economic
development between provinces in China. From 1952 to 1978, except for the
"Great Leap Forward" period, comparative disparity basically assumes a upward
trend and it assumed a slowly downward trend from 1979 to1990. Afterwards from
1991 to 2000 the disparity assumed a slowly upward trend again; (c) A
comparison between Shanghai and Guizhou shows that absolute inter-provincial
disparity has been quite big for years; and (d) The Hurst exponent (H=0.5) in
the period of 1966-1978 indicates that the comparative inter-provincial
disparity of economic development showed a random characteristic, and in the
Hurst exponent (H>0.5) in period of 1979-2000 indicates that in this period the
evolution of the comparative inter-provincial disparity of economic development
in China has a long-enduring characteristic.
arXiv link: http://arxiv.org/abs/1806.10794v1
Implementing Convex Optimization in R: Two Econometric Examples
empirical studies with complex big data. Estimation of these models calls for
optimization techniques to handle a large number of parameters. Convex problems
can be effectively executed in modern statistical programming languages. We
complement Koenker and Mizera (2014)'s work on numerical implementation of
convex optimization, with focus on high-dimensional econometric estimators.
Combining R and the convex solver MOSEK achieves faster speed and equivalent
accuracy, demonstrated by examples from Su, Shi, and Phillips (2016) and Shi
(2016). Robust performance of convex optimization is witnessed cross platforms.
The convenience and reliability of convex optimization in R make it easy to
turn new ideas into prototypes.
arXiv link: http://arxiv.org/abs/1806.10423v2
Point-identification in multivariate nonseparable triangular models
result for nonseparable triangular models with a multivariate first- and second
stage. Based on this we prove point-identification of Hedonic models with
multivariate heterogeneity and endogenous observable characteristics, extending
and complementing identification results from the literature which all require
exogeneity. As an additional application of our theoretical result, we show
that the BLP model (Berry et al. 1995) can also be identified without index
restrictions.
arXiv link: http://arxiv.org/abs/1806.09680v1
Non-testability of instrument validity under continuous endogenous variables
about testing the validity of an instrumental variable in hidden variable
models. It implies that instrument validity cannot be tested in the case where
the endogenous treatment is continuously distributed. This stands in contrast
to the classical testability results for instrument validity when the treatment
is discrete. However, imposing weak structural assumptions on the model, such
as continuity between the observable variables, can re-establish theoretical
testability in the continuous setting.
arXiv link: http://arxiv.org/abs/1806.09517v3
Semiparametrically Point-Optimal Hybrid Rank Tests for Unit Roots
in the Locally Asymptotically Brownian Functional limit experiment associated
to the unit root model. The invariance structures naturally suggest tests that
are based on the ranks of the increments of the observations, their average,
and an assumed reference density for the innovations. The tests are
semiparametric in the sense that they are valid, i.e., have the correct
(asymptotic) size, irrespective of the true innovation density. For a correctly
specified reference density, our test is point-optimal and nearly efficient.
For arbitrary reference densities, we establish a Chernoff-Savage type result,
i.e., our test performs as well as commonly used tests under Gaussian
innovations but has improved power under other, e.g., fat-tailed or skewed,
innovation distributions. To avoid nonparametric estimation, we propose a
simplified version of our test that exhibits the same asymptotic properties,
except for the Chernoff-Savage result that we are only able to demonstrate by
means of simulations.
arXiv link: http://arxiv.org/abs/1806.09304v1
The transmission of uncertainty shocks on income inequality: State-level evidence from the United States
income inequality and macroeconomic uncertainty in the United States. Using a
novel large-scale macroeconometric model, we shed light on regional disparities
of inequality responses to a national uncertainty shock. The results suggest
that income inequality decreases in most states, with a pronounced degree of
heterogeneity in terms of shapes and magnitudes of the dynamic responses. By
contrast, some few states, mostly located in the West and South census region,
display increasing levels of income inequality over time. We find that this
directional pattern in responses is mainly driven by the income composition and
labor market fundamentals. In addition, forecast error variance decompositions
allow for a quantitative assessment of the importance of uncertainty shocks in
explaining income inequality. The findings highlight that volatility shocks
account for a considerable fraction of forecast error variance for most states
considered. Finally, a regression-based analysis sheds light on the driving
forces behind differences in state-specific inequality responses.
arXiv link: http://arxiv.org/abs/1806.08278v1
Shift-Share Designs: Theory and Inference
outcome is regressed on a weighted average of sectoral shocks, using regional
sector shares as weights. We conduct a placebo exercise in which we estimate
the effect of a shift-share regressor constructed with randomly generated
sectoral shocks on actual labor market outcomes across U.S. Commuting Zones.
Tests based on commonly used standard errors with 5% nominal significance
level reject the null of no effect in up to 55% of the placebo samples. We use
a stylized economic model to show that this overrejection problem arises
because regression residuals are correlated across regions with similar
sectoral shares, independently of their geographic location. We derive novel
inference methods that are valid under arbitrary cross-regional correlation in
the regression residuals. We show using popular applications of shift-share
designs that our methods may lead to substantially wider confidence intervals
in practice.
arXiv link: http://arxiv.org/abs/1806.07928v5
Is VIX still the investor fear gauge? Evidence for the US and BRIC markets
detail, we pick up the analysis from the point left off by (Sarwar, 2012), and
we focus on the period: Jan 2007 - Feb 2018, thus capturing the relations
before, during and after the 2008 financial crisis. Results pinpoint frequent
structural breaks in the VIX and suggest an enhancement around 2008 of the fear
transmission in response to negative market moves; largely depending on
overlaps in trading hours, this has become even stronger post-crisis for the
US, while for BRIC countries has gone back towards pre-crisis levels.
arXiv link: http://arxiv.org/abs/1806.07556v2
Adaptive Bayesian Estimation of Mixed Discrete-Continuous Distributions under Smoothness and Sparsity
distribution under anisotropic smoothness conditions and possibly increasing
number of support points for the discrete part of the distribution. For these
settings, we derive lower bounds on the estimation rates in the total variation
distance. Next, we consider a nonparametric mixture of normals model that uses
continuous latent variables for the discrete part of the observations. We show
that the posterior in this model contracts at rates that are equal to the
derived lower bounds up to a log factor. Thus, Bayesian mixture of normals
models can be used for optimal adaptive estimation of mixed discrete-continuous
distributions.
arXiv link: http://arxiv.org/abs/1806.07484v1
Quantum Nash equilibrium in the thermodynamic limit
like quantum Prisoner's dilemma and the quantum game of chicken. A phase
transition is seen in both games as a function of the entanglement in the game.
We observe that for maximal entanglement irrespective of the classical payoffs,
a majority of players choose Quantum strategy over Defect in the thermodynamic
limit.
arXiv link: http://arxiv.org/abs/1806.07343v3
Cluster-Robust Standard Errors for Linear Regression Models with Many Controls
errors when using the linear regression model to estimate some
structural/causal effect of interest. Researchers also often include a large
set of regressors in their model specification in order to control for observed
and unobserved confounders. In this paper we develop inference methods for
linear regression models with many controls and clustering. We show that
inference based on the usual cluster-robust standard errors by Liang and Zeger
(1986) is invalid in general when the number of controls is a non-vanishing
fraction of the sample size. We then propose a new clustered standard errors
formula that is robust to the inclusion of many controls and allows to carry
out valid inference in a variety of high-dimensional linear regression models,
including fixed effects panel data models and the semiparametric partially
linear model. Monte Carlo evidence supports our theoretical results and shows
that our proposed variance estimator performs well in finite samples. The
proposed method is also illustrated with an empirical application that
re-visits Donohue III and Levitt's (2001) study of the impact of abortion on
crime.
arXiv link: http://arxiv.org/abs/1806.07314v3
The Origin and the Resolution of Nonuniqueness in Linear Rational Expectations
discrete-time, linear, constant-coefficients case, the associated free
parameters are coefficients that determine the public's most immediate
reactions to shocks. The requirement of model-consistency may leave these
parameters completely free, yet when their values are appropriately specified,
a unique solution is determined. In a broad class of models, the requirement of
least-square forecast errors determines the parameter values, and therefore
defines a unique solution. This approach is independent of dynamical stability,
and generally does not suppress model dynamics.
Application to a standard New Keynesian example shows that the traditional
solution suppresses precisely those dynamics that arise from rational
expectations. The uncovering of those dynamics reveals their incompatibility
with the new I-S equation and the expectational Phillips curve.
arXiv link: http://arxiv.org/abs/1806.06657v3
Effect of Climate and Geography on worldwide fine resolution economic activity
important elements in shaping socio-economic activities, alongside other
determinants, such as institutions. Here we demonstrate that geography and
climate satisfactorily explain worldwide economic activity as measured by the
per capita Gross Cell Product (GCP-PC) at a fine geographical resolution,
typically much higher than country average. A 1{\deg} by 1{\deg} GCP-PC dataset
has been key for establishing and testing a direct relationship between 'local'
geography/climate and GCP-PC. Not only have we tested the geography/climate
hypothesis using many possible explanatory variables, importantly we have also
predicted and reconstructed GCP-PC worldwide by retaining the most significant
predictors. While this study confirms that latitude is the most important
predictor for GCP-PC when taken in isolation, the accuracy of the GCP-PC
prediction is greatly improved when other factors mainly related to variations
in climatic variables, such as the variability in air pressure, rather than
average climatic conditions as typically used, are considered. Implications of
these findings include an improved understanding of why economically better-off
societies are geographically placed where they are
arXiv link: http://arxiv.org/abs/1806.06358v2
On the relation between Sion's minimax theorem and existence of Nash equilibrium in asymmetric multi-players zero-sum game with only one alien
function and a Nash equilibrium in an asymmetric multi-players zero-sum game in
which only one player is different from other players, and the game is
symmetric for the other players. Then,
1. The existence of a Nash equilibrium, which is symmetric for players other
than one player, implies Sion's minimax theorem for pairs of this player and
one of other players with symmetry for the other players.
2. Sion's minimax theorem for pairs of one player and one of other players
with symmetry for the other players implies the existence of a Nash equilibrium
which is symmetric for the other players.
Thus, they are equivalent.
arXiv link: http://arxiv.org/abs/1806.07253v1
Generalized Log-Normal Chain-Ladder
normal chain-ladder model. The theory overcomes the difficulty of convoluting
log normal variables and takes estimation error into account. The results
differ from that of the over-dispersed Poisson model and from the chain-ladder
based bootstrap. We embed the log normal chain-ladder model in a class of
infinitely divisible distributions called the generalized log normal
chain-ladder model. The asymptotic theory uses small $\sigma$ asymptotics where
the dimension of the reserving triangle is kept fixed while the standard
deviation is assumed to decrease. The resulting asymptotic forecast
distributions follow t distributions. The theory is supported by simulations
and an empirical application.
arXiv link: http://arxiv.org/abs/1806.05939v1
Stratification Trees for Adaptive Randomization in Randomized Controlled Trials
randomized controlled trials. The method uses data from a first-wave experiment
in order to determine how to stratify in a second wave of the experiment, where
the objective is to minimize the variance of an estimator for the average
treatment effect (ATE). We consider selection from a class of stratified
randomization procedures which we call stratification trees: these are
procedures whose strata can be represented as decision trees, with differing
treatment assignment probabilities across strata. By using the first wave to
estimate a stratification tree, we simultaneously select which covariates to
use for stratification, how to stratify over these covariates, as well as the
assignment probabilities within these strata. Our main result shows that using
this randomization procedure with an appropriate estimator results in an
asymptotic variance which is minimal in the class of stratification trees.
Moreover, the results we present are able to accommodate a large class of
assignment mechanisms within strata, including stratified block randomization.
In a simulation study, we find that our method, paired with an appropriate
cross-validation procedure ,can improve on ad-hoc choices of stratification. We
conclude by applying our method to the study in Karlan and Wood (2017), where
we estimate stratification trees using the first wave of their experiment.
arXiv link: http://arxiv.org/abs/1806.05127v7
LASSO-Driven Inference in Time and Space
regression equations allowing for temporal and cross-sectional dependency in
covariates and error processes, covering rather general forms of weak temporal
dependence. A sequence of regressions with many regressors using LASSO (Least
Absolute Shrinkage and Selection Operator) is applied for variable selection
purpose, and an overall penalty level is carefully chosen by a block multiplier
bootstrap procedure to account for multiplicity of the equations and
dependencies in the data. Correspondingly, oracle properties with a jointly
selected tuning parameter are derived. We further provide high-quality
de-biased simultaneous inference on the many target parameters of the system.
We provide bootstrap consistency results of the test procedure, which are based
on a general Bahadur representation for the $Z$-estimators with dependent data.
Simulations demonstrate good performance of the proposed inference procedure.
Finally, we apply the method to quantify spillover effects of textual sentiment
indices in a financial market and to test the connectedness among sectors.
arXiv link: http://arxiv.org/abs/1806.05081v4
Regularized Orthogonal Machine Learning for Nonlinear Semiparametric Models
parameter identified by a single index conditional moment restriction (CMR). In
addition to this parameter, the moment function can also depend on a nuisance
function, such as the propensity score or the conditional choice probability,
which we estimate by modern machine learning tools. We first adjust the moment
function so that the gradient of the future loss function is insensitive
(formally, Neyman-orthogonal) with respect to the first-stage regularization
bias, preserving the single index property. We then take the loss function to
be an indefinite integral of the adjusted moment function with respect to the
single index. The proposed Lasso estimator converges at the oracle rate, where
the oracle knows the nuisance function and solves only the parametric problem.
We demonstrate our method by estimating the short-term heterogeneous impact of
Connecticut's Jobs First welfare reform experiment on women's welfare
participation decision.
arXiv link: http://arxiv.org/abs/1806.04823v8
Asymmetric response to PMI announcements in China's stock returns
Index (PMI) on Manufacturing generally assumes that PMI announcements will
produce an impact on stock markets. International experience suggests that
stock markets react to negative PMI news. In this research, we empirically
investigate the stock market reaction towards PMI in China. The asymmetric
effects of PMI announcements on the stock market are observed: no market
reaction is generated towards negative PMI announcements, while a positive
reaction is generally generated for positive PMI news. We further find that the
positive reaction towards the positive PMI news occurs 1 day before the
announcement and lasts for nearly 3 days, and the positive reaction is observed
in the context of expanding economic conditions. By contrast, the negative
reaction towards negative PMI news is prevalent during downward economic
conditions for stocks with low market value, low institutional shareholding
ratios or high price earnings. Our study implies that China's stock market
favors risk to a certain extent given the vast number of individual investors
in the country, and there may exist information leakage in the market.
arXiv link: http://arxiv.org/abs/1806.04347v1
Estimating Trade-Related Adjustment Costs in the Agricultural Sector in Iran
consideration for developing countries, because they are increasingly facing
the difficult task of implementing and harmonizing regional and international
trade commitments. The tariff reform and its costs for Iranian government is
one of the issues that are examined in this study. Another goal of this paper
is, estimating the cost of trade liberalization. On this regard, imports value
of agricultural sector in Iran in 2010 was analyzed according to two scenarios.
For reforming nuisance tariff, a VAT policy is used in both scenarios. In this
study, TRIST method is used. In the first scenario, imports' value decreased to
a level equal to the second scenario and higher tariff revenue will be created.
The results show that reducing the average tariff rate does not always result
in the loss of tariff revenue. This paper is a witness that different forms of
tariff can generate different amount of income when they have same level of
liberalization and equal effect on producers. Therefore, using a good tariff
regime can help a government to generate income when increases social welfare
by liberalization.
arXiv link: http://arxiv.org/abs/1806.04238v1
The Role of Agricultural Sector Productivity in Economic Growth: The Case of Iran's Economic Development Plan
productivity growth evaluations in agricultural sector as one of the most
important sectors in Iran's economic development plan. We use the Solow
residual model to measure the productivity growth share in the value-added
growth of the agricultural sector. Our time series data includes value-added
per worker, employment, and capital in this sector. The results show that the
average total factor productivity growth rate in the agricultural sector is
-0.72% during 1991-2010. Also, during this period, the share of total factor
productivity growth in the value-added growth is -19.6%, while it has been
forecasted to be 33.8% in the fourth development plan. Considering the
effective role of capital in the agricultural low productivity, we suggest
applying productivity management plans (especially in regards of capital
productivity) to achieve future growth goals.
arXiv link: http://arxiv.org/abs/1806.04235v1
A Growth Model with Unemployment
what Keynes (1936) suggests in the "essence" of his general theory. The
theoretical essence is the idea that exogenous changes in investment cause
changes in employment and unemployment. We implement this idea by assuming the
path for capital growth rate is exogenous in the growth model. The result is a
growth model that can explain both long term trends and fluctuations around the
trend. The modified growth model was tested using the U.S. economic data from
1947 to 2014. The hypothesized inverse relationship between the capital growth
and changes in unemployment was confirmed, and the structurally estimated model
fits fluctuations in unemployment reasonably well.
arXiv link: http://arxiv.org/abs/1806.04228v1
Inference under Covariate-Adaptive Randomization with Multiple Treatments
covariate-adaptive randomization when there are multiple treatments. More
specifically, we study inference about the average effect of one or more
treatments relative to other treatments or a control. As in Bugni et al.
(2018), covariate-adaptive randomization refers to randomization schemes that
first stratify according to baseline covariates and then assign treatment
status so as to achieve balance within each stratum. In contrast to Bugni et
al. (2018), we not only allow for multiple treatments, but further allow for
the proportion of units being assigned to each of the treatments to vary across
strata. We first study the properties of estimators derived from a fully
saturated linear regression, i.e., a linear regression of the outcome on all
interactions between indicators for each of the treatments and indicators for
each of the strata. We show that tests based on these estimators using the
usual heteroskedasticity-consistent estimator of the asymptotic variance are
invalid; on the other hand, tests based on these estimators and suitable
estimators of the asymptotic variance that we provide are exact. For the
special case in which the target proportion of units being assigned to each of
the treatments does not vary across strata, we additionally consider tests
based on estimators derived from a linear regression with strata fixed effects,
i.e., a linear regression of the outcome on indicators for each of the
treatments and indicators for each of the strata. We show that tests based on
these estimators using the usual heteroskedasticity-consistent estimator of the
asymptotic variance are conservative, but tests based on these estimators and
suitable estimators of the asymptotic variance that we provide are exact. A
simulation study illustrates the practical relevance of our theoretical
results.
arXiv link: http://arxiv.org/abs/1806.04206v3
Determining the dimension of factor structures in non-stationary large datasets
in a large, possibly non-stationary, dataset. Our procedure is designed to
determine whether there are (and how many) common factors (i) with linear
trends, (ii) with stochastic trends, (iii) with no trends, i.e. stationary. Our
analysis is based on the fact that the largest eigenvalues of a suitably scaled
covariance matrix of the data (corresponding to the common factor part)
diverge, as the dimension $N$ of the dataset diverges, whilst the others stay
bounded. Therefore, we propose a class of randomised test statistics for the
null that the $p$-th eigenvalue diverges, based directly on the estimated
eigenvalue. The tests only requires minimal assumptions on the data, and no
restrictions on the relative rates of divergence of $N$ and $T$ are imposed.
Monte Carlo evidence shows that our procedure has very good finite sample
properties, clearly dominating competing approaches when no common factors are
present. We illustrate our methodology through an application to US bond yields
with different maturities observed over the last 30 years. A common linear
trend and two common stochastic trends are found and identified as the
classical level, slope and curvature factors.
arXiv link: http://arxiv.org/abs/1806.03647v1
Orthogonal Random Forest for Causal Inference
Neyman-orthogonality to reduce sensitivity with respect to estimation error of
nuisance parameters with generalized random forests (Athey et al., 2017)--a
flexible non-parametric method for statistical estimation of conditional moment
models using random forests. We provide a consistency rate and establish
asymptotic normality for our estimator. We show that under mild assumptions on
the consistency rate of the nuisance estimator, we can achieve the same error
rate as an oracle with a priori knowledge of these nuisance parameters. We show
that when the nuisance functions have a locally sparse parametrization, then a
local $\ell_1$-penalized regression achieves the required rate. We apply our
method to estimate heterogeneous treatment effects from observational data with
discrete treatments or continuous treatments, and we show that, unlike prior
work, our method provably allows to control for a high-dimensional set of
variables under standard sparsity conditions. We also provide a comprehensive
empirical evaluation of our algorithm on both synthetic and real data.
arXiv link: http://arxiv.org/abs/1806.03467v4
A hybrid econometric-machine learning approach for relative importance analysis: Prioritizing food policy
when the explanatory aspects of econometric methods are of interest. To this
end, the author briefly reviews the limitations of conventional econometrics in
constructing a reliable measure of variable importance. The author highlights
the relative stature of explanatory and predictive analysis in economics and
the emergence of fruitful collaborations between econometrics and computer
science. Learning lessons from both, the author proposes a hybrid approach
based on conventional econometrics and advanced machine learning (ML)
algorithms, which are otherwise, used in predictive analytics. The purpose of
this article is two-fold, to propose a hybrid approach to assess relative
importance and demonstrate its applicability in addressing policy priority
issues with an example of food inflation in India, followed by a broader aim to
introduce the possibility of conflation of ML and conventional econometrics to
an audience of researchers in economics and social sciences, in general.
arXiv link: http://arxiv.org/abs/1806.04517v3
Pricing Engine: Estimating Causal Impacts in Real World Business Settings
estimation techniques in general panel data settings. Customization allows the
user to specify first-stage models, first-stage featurization, second stage
treatment selection and second stage causal-modeling. We also introduce a
DynamicDML class that allows the user to generate dynamic treatment-aware
forecasts at a range of leads and to understand how the forecasts will vary as
a function of causally estimated treatment parameters. The Pricing Engine is
built on Python 3.5 and can be run on an Azure ML Workbench environment with
the addition of only a few Python packages. This note provides high-level
discussion of the Double ML method, describes the packages intended use and
includes an example Jupyter notebook demonstrating application to some publicly
available data. Installation of the package and additional technical
documentation is available at
$https://github.com/bquistorff/pricingengine{github.com/bquistorff/pricingengine}$.
arXiv link: http://arxiv.org/abs/1806.03285v2
A Profit Optimization Approach Based on the Use of Pumped-Hydro Energy Storage Unit and Dynamic Pricing
maximum economic benefit from wind farms with variable and intermittent energy
generation in the day ahead and balancing electricity markets. This method,
which is based on the use of pumped-hydro energy storage unit and wind farm
together, increases the profit from the power plant by taking advantage of the
price changes in the markets and at the same time supports the power system by
supplying a portion of the peak load demand in the system to which the plant is
connected. With the objective of examining the effectiveness of the proposed
method, detailed simulation studies are carried out by making use of actual
wind and price data, and the results are compared to those obtained for the
various cases in which the storage unit is not available and/or the proposed
price-based energy management method is not applied. As a consequence, it is
demonstrated that the pumped-hydro energy storage units are the storage systems
capable of being used effectively for high-power levels and that the proposed
optimization problem is quite successful in the cost-effective implementation
of these systems.
arXiv link: http://arxiv.org/abs/1806.05211v1
Role of Symmetry in Irrational Choice
sciences. Being such a powerful tool, almost all physical theories can be
derived from symmetry, and the effectiveness of such an approach is
astonishing. Since many physicists do not actually believe that symmetry is a
fundamental feature of nature, it seems more likely it is a fundamental feature
of human cognition. According to evolutionary psychologists, humans have a
sensory bias for symmetry. The unconscious quest for symmetrical patterns has
developed as a solution to specific adaptive problems related to survival and
reproduction. Therefore, it comes as no surprise that some fundamental concepts
in psychology and behavioral economics necessarily involve symmetry. The
purpose of this paper is to draw attention to the role of symmetry in
decision-making and to illustrate how it can be algebraically operationalized
through the use of mathematical group theory.
arXiv link: http://arxiv.org/abs/1806.02627v3
High-Dimensional Econometrics and Regularized GMM
estimation and inference in high-dimensional models. High-dimensional models
are characterized by having a number of unknown parameters that is not
vanishingly small relative to the sample size. We first present results in a
framework where estimators of parameters of interest may be represented
directly as approximate means. Within this context, we review fundamental
results including high-dimensional central limit theorems, bootstrap
approximation of high-dimensional limit distributions, and moderate deviation
theory. We also review key concepts underlying inference when many parameters
are of interest such as multiple testing with family-wise error rate or false
discovery rate control. We then turn to a general high-dimensional minimum
distance framework with a special focus on generalized method of moments
problems where we present results for estimation and inference about model
parameters. The presented results cover a wide array of econometric
applications, and we discuss several leading special cases including
high-dimensional linear regression and linear instrumental variables models to
illustrate the general results.
arXiv link: http://arxiv.org/abs/1806.01888v2
A Quantitative Analysis of Possible Futures of Autonomous Transport
amount of attention in recent years. They promise benefits such as reduced crew
costs, increased safety and increased flexibility. This paper explores the
effects of a faster increase in technological performance in maritime shipping
achieved by leveraging fast-improving technological domains such as computer
processors, and advanced energy storage. Based on historical improvement rates
of several modes of transport (Cargo Ships, Air, Rail, Trucking) a simplified
Markov-chain Monte-Carlo (MCMC) simulation of an intermodal transport model
(IMTM) is used to explore the effects of differing technological improvement
rates for AS. The results show that the annual improvement rates of traditional
shipping (Ocean Cargo Ships = 2.6%, Air Cargo = 5.5%, Trucking = 0.6%, Rail =
1.9%, Inland Water Transport = 0.4%) improve at lower rates than technologies
associated with automation such as Computer Processors (35.6%), Fuel Cells
(14.7%) and Automotive Autonomous Hardware (27.9%). The IMTM simulations up to
the year 2050 show that the introduction of any mode of autonomous transport
will increase competition in lower cost shipping options, but is unlikely to
significantly alter the overall distribution of transport mode costs. Secondly,
if all forms of transport end up converting to autonomous systems, then the
uncertainty surrounding the improvement rates yields a complex intermodal
transport solution involving several options, all at a much lower cost over
time. Ultimately, the research shows a need for more accurate measurement of
current autonomous transport costs and how they are changing over time.
arXiv link: http://arxiv.org/abs/1806.01696v1
Leave-out estimation of variance components
linear models with unrestricted heteroscedasticity. Applications include
analysis of variance and tests of linear restrictions in models with many
regressors. An approximation algorithm is provided that enables accurate
computation of the estimator in very large datasets. We study the large sample
properties of our estimator allowing the number of regressors to grow in
proportion to the number of observations. Consistency is established in a
variety of settings where plug-in methods and estimators predicated on
homoscedasticity exhibit first-order biases. For quadratic forms of increasing
rank, the limiting distribution can be represented by a linear combination of
normal and non-central $\chi^2$ random variables, with normality ensuing under
strong identification. Standard error estimators are proposed that enable tests
of linear restrictions and the construction of uniformly valid confidence
intervals for quadratic forms of interest. We find in Italian social security
records that leave-out estimates of a variance decomposition in a two-way fixed
effects model of wage determination yield substantially different conclusions
regarding the relative contribution of workers, firms, and worker-firm sorting
to wage inequality than conventional methods. Monte Carlo exercises corroborate
the accuracy of our asymptotic approximations, with clear evidence of
non-normality emerging when worker mobility between blocks of firms is limited.
arXiv link: http://arxiv.org/abs/1806.01494v2
A Consistent Variance Estimator for 2SLS When Instruments Identify Different LATEs
instrument-specific local average treatment effect (LATE). With multiple
instruments, two-stage least squares (2SLS) estimand is a weighted average of
different LATEs. What is often overlooked in the literature is that the
postulated moment condition evaluated at the 2SLS estimand does not hold unless
those LATEs are the same. If so, the conventional heteroskedasticity-robust
variance estimator would be inconsistent, and 2SLS standard errors based on
such estimators would be incorrect. I derive the correct asymptotic
distribution, and propose a consistent asymptotic variance estimator by using
the result of Hall and Inoue (2003, Journal of Econometrics) on misspecified
moment condition models. This can be used to correctly calculate the standard
errors regardless of whether there is more than one LATE or not.
arXiv link: http://arxiv.org/abs/1806.01457v1
Asymptotic Refinements of a Misspecification-Robust Bootstrap for Generalized Method of Moments Estimators
for t tests and confidence intervals based on GMM estimators even when the
model is misspecified. In addition, my bootstrap does not require recentering
the moment function, which has been considered as critical for GMM. Regardless
of model misspecification, the proposed bootstrap achieves the same sharp
magnitude of refinements as the conventional bootstrap methods which establish
asymptotic refinements by recentering in the absence of misspecification. The
key idea is to link the misspecified bootstrap moment condition to the large
sample theory of GMM under misspecification of Hall and Inoue (2003). Two
examples are provided: Combining data sets and invalid instrumental variables.
arXiv link: http://arxiv.org/abs/1806.01450v1
Driving by the Elderly and their Awareness of their Driving Difficulties (Hebrew)
reasons. one is the higher proportion of elderly in the population, and the
other is the rise in the share of the elderly who drive. This paper examines
the features of their driving and the level of their awareness of problems
relating to it, by analysis preference survey that included interviews with 205
drivers aged between 70 and 80. The interviewees exhibited a level of optimism
and self confidence in their driving that is out of line with the real
situation. There is also a discrepancy between how their driving is viewed by
others and their own assessment, and between their self assessment and their
assessment of the driving of other elderly drivers, which they rate lower than
their own. they attributed great importance to safety feature in cars, although
they did not think that they themselves needed them, and most elderly drivers
did not think there was any reason that they should stop driving, despite
suggestions from family members and others that they should do so. A declared
preference survey was undertaken to assess the degree of difficulty elderly
drivers attribute to driving conditions. It was found that they are concerned
mainly about weather condition, driving at night, and long journeys. Worry
about night driving was most marked among women, the oldest drivers, and those
who drove less frequently. In light of the findings, imposing greater
responsibility on the health system should be considered. Consideration should
also be given to issuing partial licenses to the elderly for daytime driving
only, or restricted to certain weather conditions, dependent on their medical
condition. Such flexibility will enable the elderly to maintain their life
style and independence for a longer period on the one hand, and on the other,
will minimize the risks to themselves and other.
arXiv link: http://arxiv.org/abs/1806.03254v1
The Impact of Supervision and Incentive Process in Explaining Wage Profile and Variance
workers may lead to wage variance that stems from employer and employee
optimization. The harder it is to assess the nature of the labor output, the
more important such a process becomes, and the influence of such a process on
wage development growth. The dynamic model presented in this paper shows that
an employer will choose to pay a worker a starting wage that is less than what
he deserves, resulting in a wage profile that fits the classic profile in the
human-capital literature. The wage profile and wage variance rise at times of
technological advancements, which leads to increased turnover as older workers
are replaced by younger workers due to a rise in the relative marginal cost of
the former.
arXiv link: http://arxiv.org/abs/1806.01332v1
Limit Theory for Moderate Deviation from Integrated GARCH Processes
moderately deviates from IGARCH process towards both stationary and explosive
regimes. The GARCH(1,1) process is defined by equations $u_t = \sigma_t
\varepsilon_t$, $\sigma_t^2 = \omega + \alpha_n u_{t-1}^2 +
\beta_n\sigma_{t-1}^2$ and $\alpha_n + \beta_n$ approaches to unity as sample
size goes to infinity. The asymptotic theory developed in this paper extends
Berkes et al. (2005) by allowing the parameters to have a slower convergence
rate. The results can be applied to unit root test for processes with
mildly-integrated GARCH innovations (e.g. Boswijk (2001), Cavaliere and Taylor
(2007, 2009)) and deriving limit theory of estimators for models involving
mildly-integrated GARCH processes (e.g. Jensen and Rahbek (2004), Francq and
Zako\"ian (2012, 2013)).
arXiv link: http://arxiv.org/abs/1806.01229v3
Quasi-Experimental Shift-Share Research Designs
of shocks with exposure share weights. We provide a new econometric framework
for shift-share instrumental variable (SSIV) regressions in which
identification follows from the quasi-random assignment of shocks, while
exposure shares are allowed to be endogenous. The framework is motivated by an
equivalence result: the orthogonality between a shift-share instrument and an
unobserved residual can be represented as the orthogonality between the
underlying shocks and a shock-level unobservable. SSIV regression coefficients
can similarly be obtained from an equivalent shock-level regression, motivating
shock-level conditions for their consistency. We discuss and illustrate several
practical insights of this framework in the setting of Autor et al. (2013),
estimating the effect of Chinese import competition on manufacturing employment
across U.S. commuting zones.
arXiv link: http://arxiv.org/abs/1806.01221v9
Asymptotic Refinements of a Misspecification-Robust Bootstrap for Generalized Empirical Likelihood Estimators
likelihood, the exponential tilting, and the exponentially tilted empirical
likelihood estimators that achieves asymptotic refinements for t tests and
confidence intervals, and Wald tests and confidence regions based on such
estimators. Furthermore, the proposed bootstrap is robust to model
misspecification, i.e., it achieves asymptotic refinements regardless of
whether the assumed moment condition model is correctly specified or not. This
result is new, because asymptotic refinements of the bootstrap based on these
estimators have not been established in the literature even under correct model
specification. Monte Carlo experiments are conducted in dynamic panel data
setting to support the theoretical finding. As an application, bootstrap
confidence intervals for the returns to schooling of Hellerstein and Imbens
(1999) are calculated. The result suggests that the returns to schooling may be
higher.
arXiv link: http://arxiv.org/abs/1806.00953v2
Identification of Conduit Countries and Community Structures in the Withholding Tax Networks
laws and tax treaties, has been forced to work as a single network. However,
each jurisdiction (country or region) has not made its economic law under the
assumption that its law functions as an element of one network, so it has
brought unexpected results. We thought that the results are exactly
international tax avoidance. To contribute to the solution of international tax
avoidance, we tried to investigate which part of the network is vulnerable.
Specifically, focusing on treaty shopping, which is one of international tax
avoidance methods, we attempt to identified which jurisdiction are likely to be
used for treaty shopping from tax liabilities and the relationship between
jurisdictions which are likely to be used for treaty shopping and others. For
that purpose, based on withholding tax rates imposed on dividends, interest,
and royalties by jurisdictions, we produced weighted multiple directed graphs,
computed the centralities and detected the communities. As a result, we
clarified the jurisdictions that are likely to be used for treaty shopping and
pointed out that there are community structures. The results of this study
suggested that fewer jurisdictions need to introduce more regulations for
prevention of treaty abuse worldwide.
arXiv link: http://arxiv.org/abs/1806.00799v1
Ill-posed Estimation in High-Dimensional Models with Instrumental Variables
high-dimensional parameter vector $\beta^0$ which is identified through
instrumental variables. We allow for eigenvalues of the expected outer product
of included and excluded covariates, denoted by $M$, to shrink to zero as the
sample size increases. We propose a novel estimator based on desparsification
of an instrumental variable Lasso estimator, which is a regularized version of
2SLS with an additional correction term. This estimator converges to $\beta^0$
at a rate depending on the mapping properties of $M$ captured by a sparse link
condition. Linear combinations of our estimator of $\beta^0$ are shown to be
asymptotically normally distributed. Based on consistent covariance estimation,
our method allows for constructing confidence intervals and statistical tests
for single or low-dimensional components of $\beta^0$. In Monte-Carlo
simulations we analyze the finite sample behavior of our estimator.
arXiv link: http://arxiv.org/abs/1806.00666v2
Introducing shrinkage in heavy-tailed state space models to predict equity excess returns
state space model with non-Gaussian features at several levels. More precisely,
we control for overparameterization via novel global-local shrinkage priors on
the state innovation variances as well as the time-invariant part of the state
space model. The shrinkage priors are complemented by heavy tailed state
innovations that cater for potential large breaks in the latent states.
Moreover, we allow for leptokurtic stochastic volatility in the observation
equation. The empirical findings indicate that several variants of the proposed
approach outperform typical competitors frequently used in the literature, both
in terms of point and density forecasts.
arXiv link: http://arxiv.org/abs/1805.12217v2
Estimation and Inference for Policy Relevant Treatment Effects
switching from a status-quo policy to a counterfactual policy. Estimation of
the PRTE involves estimation of multiple preliminary parameters, including
propensity scores, conditional expectation functions of the outcome and
covariates given the propensity score, and marginal treatment effects. These
preliminary estimators can affect the asymptotic distribution of the PRTE
estimator in complicated and intractable manners. In this light, we propose an
orthogonal score for double debiased estimation of the PRTE, whereby the
asymptotic distribution of the PRTE estimator is obtained without any influence
of preliminary parameter estimators as far as they satisfy mild requirements of
convergence rates. To our knowledge, this paper is the first to develop limit
distribution theories for inference about the PRTE.
arXiv link: http://arxiv.org/abs/1805.11503v4
Stationarity and ergodicity of vector STAR models
nonlinearities in univariate and multivariate time series. Existence of
stationary solution is typically assumed, implicitly or explicitly. In this
paper we describe conditions for stationarity and ergodicity of vector STAR
models. The key condition is that the joint spectral radius of certain matrices
is below 1, which is not guaranteed if only separate spectral radii are below
1. Our result allows to use recently introduced toolboxes from computational
mathematics to verify the stationarity and ergodicity of vector STAR models.
arXiv link: http://arxiv.org/abs/1805.11311v3
Modeling the residential electricity consumption within a restructured power market
the federal level. The market thus provides a unique testing environment for
the market organization structure. At the same time, the econometric modeling
and forecasting of electricity market consumption become more challenging.
Import and export, which generally follow simple rules in European countries,
can be a result of direct market behaviors. This paper seeks to build a general
model for power consumption and using the model to test several hypotheses.
arXiv link: http://arxiv.org/abs/1805.11138v2
Tilting Approximate Models
quasi-structural models. The paper considers the econometric properties of
estimators that utilize projections to reimpose information about the exact
model in the form of conditional moments. The resulting estimator efficiently
combines the information provided by the approximate law of motion and the
moment conditions. The paper develops the corresponding asymptotic theory and
provides simulation evidence that tilting substantially reduces the mean
squared error for parameter estimates. It applies the methodology to pricing
long-run risks in aggregate consumption in the US, whereas the model is solved
using the Campbell and Shiller (1988) approximation. Tilting improves empirical
fit and results suggest that approximation error is a source of upward bias in
estimates of risk aversion and downward bias in the elasticity of intertemporal
substitution.
arXiv link: http://arxiv.org/abs/1805.10869v5
Flexible shrinkage in high-dimensional Bayesian spatial autoregressive models
priors to enable stochastic variable selection in the context of
high-dimensional matrix exponential spatial specifications. Existing approaches
as a means to dealing with overparameterization problems in spatial
autoregressive specifications typically rely on computationally demanding
Bayesian model-averaging techniques. The proposed shrinkage priors can be
implemented using Markov chain Monte Carlo methods in a flexible and efficient
way. A simulation study is conducted to evaluate the performance of each of the
shrinkage priors. Results suggest that they perform particularly well in
high-dimensional environments, especially when the number of parameters to
estimate exceeds the number of observations. For an empirical illustration we
use pan-European regional economic growth data.
arXiv link: http://arxiv.org/abs/1805.10822v1
Inference Related to Common Breaks in a Multivariate System with Joined Segmented Trends with Applications to Global and Hemispheric Temperatures
forcing seem to be characterized by a linear trend with two changes in the rate
of growth. The first occurs in the early 60s and indicates a very large
increase in the rate of growth of both temperature and radiative forcing
series. This was termed as the "onset of sustained global warming". The second
is related to the more recent so-called hiatus period, which suggests that
temperatures and total radiative forcing have increased less rapidly since the
mid-90s compared to the larger rate of increase from 1960 to 1990. There are
two issues that remain unresolved. The first is whether the breaks in the slope
of the trend functions of temperatures and radiative forcing are common. This
is important because common breaks coupled with the basic science of climate
change would strongly suggest a causal effect from anthropogenic factors to
temperatures. The second issue relates to establishing formally via a proper
testing procedure that takes into account the noise in the series, whether
there was indeed a `hiatus period' for temperatures since the mid 90s. This is
important because such a test would counter the widely held view that the
hiatus is the product of natural internal variability. Our paper provides tests
related to both issues. The results show that the breaks in temperatures and
radiative forcing are common and that the hiatus is characterized by a
significant decrease in their rate of growth. The statistical results are of
independent interest and applicable more generally.
arXiv link: http://arxiv.org/abs/1805.09937v1
Identification in Nonparametric Models for Dynamic Treatment Effects
outcomes and treatment choices influence one another in a dynamic manner. In
this setting, we are interested in identifying the average outcome for
individuals in each period, had a particular treatment sequence been assigned.
The identification of this quantity allows us to identify the average treatment
effects (ATE's) and the ATE's on transitions, as well as the optimal treatment
regimes, namely, the regimes that maximize the (weighted) sum of the average
potential outcomes, possibly less the cost of the treatments. The main
contribution of this paper is to relax the sequential randomization assumption
widely used in the biostatistics literature by introducing a flexible
choice-theoretic framework for a sequence of endogenous treatments. We show
that the parameters of interest are identified under each period's two-way
exclusion restriction, i.e., with instruments excluded from the
outcome-determining process and other exogenous variables excluded from the
treatment-selection process. We also consider partial identification in the
case where the latter variables are not available. Lastly, we extend our
results to a setting where treatments do not appear in every period.
arXiv link: http://arxiv.org/abs/1805.09397v3
A Double Machine Learning Approach to Estimate the Effects of Musical Practice on Student's Skills
development. Identification is based on the conditional independence assumption
and estimation is implemented using a recent double machine learning estimator.
The study proposes solutions to two highly practically relevant questions that
arise for these new methods: (i) How to investigate sensitivity of estimates to
tuning parameter choices in the machine learning part? (ii) How to assess
covariate balancing in high-dimensional settings? The results show that
improvements in objectively measured cognitive skills require at least medium
intensity, while improvements in school grades are already observed for low
intensity of practice.
arXiv link: http://arxiv.org/abs/1805.10300v2
Model Selection in Time Series Analysis: Using Information Criteria as an Alternative to Hypothesis Testing
Since the true model in such research is not known, which model should be used
from among various potential ones is an empirical question. There might exist
several competitive models. A typical approach to dealing with this is classic
hypothesis testing using an arbitrarily chosen significance level based on the
underlying assumption that a true null hypothesis exists. In this paper we
investigate how successful this approach is in determining the correct model
for different data generating processes using time series data. An alternative
approach based on more formal model selection techniques using an information
criterion or cross-validation is suggested and evaluated in the time series
environment via Monte Carlo experiments. This paper also explores the
effectiveness of deciding what type of general relation exists between two
variables (e.g. relation in levels or relation in first differences) using
various strategies based on hypothesis testing and on information criteria with
the presence or absence of unit roots.
arXiv link: http://arxiv.org/abs/1805.08991v1
Sensitivity of Regular Estimators
estimates. We define sensitivity of a target estimate to a control estimate to
be the directional derivative of the target functional with respect to the
gradient direction of the control functional. Sensitivity according to the
information metric on the model manifold is the asymptotic covariance of
regular efficient estimators. Sensitivity according to a general policy metric
on the model manifold can be obtained from influence functions of regular
efficient estimators. Policy sensitivity has a local counterfactual
interpretation, where the ceteris paribus change to a counterfactual
distribution is specified by the combination of a control parameter and a
Riemannian metric on the model manifold.
arXiv link: http://arxiv.org/abs/1805.08883v1
Multiple Treatments with Strategic Interaction
treatments on outcomes of interest when the treatments are the result of
strategic interaction (e.g., bargaining, oligopolistic entry, peer effects). We
consider a model where agents play a discrete game with complete information
whose equilibrium actions (i.e., binary treatments) determine a post-game
outcome in a nonseparable model with endogeneity. Due to the simultaneity in
the first stage, the model as a whole is incomplete and the selection process
fails to exhibit the conventional monotonicity. Without imposing parametric
restrictions or large support assumptions, this poses challenges in recovering
treatment parameters. To address these challenges, we first establish a
monotonic pattern of the equilibria in the first-stage game in terms of the
number of treatments selected. Based on this finding, we derive bounds on the
average treatment effects (ATEs) under nonparametric shape restrictions and the
existence of excluded exogenous variables. We show that instrument variation
that compensates strategic substitution helps solve the multiple equilibria
problem. We apply our method to data on airlines and air pollution in cities in
the U.S. We find that (i) the causal effect of each airline on pollution is
positive, and (ii) the effect is increasing in the number of firms but at a
decreasing rate.
arXiv link: http://arxiv.org/abs/1805.08275v2
On testing substitutability
for testing whether the choice function induced by a (strict) preference list
of length $N$ over a universe $U$ is substitutable. The running time of these
algorithms is $O(|U|^3\cdot N^3)$, respectively $O(|U|^2\cdot N^3)$. In this
note we present an algorithm with running time $O(|U|^2\cdot N^2)$. Note that
$N$ may be exponential in the size $|U|$ of the universe.
arXiv link: http://arxiv.org/abs/1805.07642v1
Bitcoin price and its marginal cost of production: support for a fundamental value
the digital currency bitcoin. Results from both conventional regression and
vector autoregression (VAR) models show that the marginal cost of production
plays an important role in explaining bitcoin prices, challenging recent
allegations that bitcoins are essentially worthless. Even with markets pricing
bitcoin in the thousands of dollars each, the valuation model seems robust. The
data show that a price bubble that began in the Fall of 2017 resolved itself in
early 2018, converging with the marginal cost model. This suggests that while
bubbles may appear in the bitcoin market, prices will tend to this bound and
not collapse to zero.
arXiv link: http://arxiv.org/abs/1805.07610v1
Learning non-smooth models: instrumental variable quantile regressions and related problems
instrumental variable quantile regressions (IVQR) and related methods with
statistical guarantees. This is much needed when we investigate heterogenous
treatment effects since interactions between the endogenous treatment and
control variables lead to an increased number of endogenous covariates. We
prove that the GMM formulation of IVQR is NP-hard and finding an approximate
solution is also NP-hard. Hence, solving the problem from a purely
computational perspective seems unlikely. Instead, we aim to obtain an estimate
that has good statistical properties and is not necessarily the global solution
of any optimization problem.
The proposal consists of employing $k$-step correction on an initial
estimate. The initial estimate exploits the latest advances in mixed integer
linear programming and can be computed within seconds. One theoretical
contribution is that such initial estimators and Jacobian of the moment
condition used in the k-step correction need not be even consistent and merely
$k=4\log n$ fast iterations are needed to obtain an efficient estimator. The
overall proposal scales well to handle extremely large sample sizes because
lack of consistency requirement allows one to use a very small subsample to
obtain the initial estimate and the k-step iterations on the full sample can be
implemented efficiently. Another contribution that is of independent interest
is to propose a tuning-free estimation for the Jacobian matrix, whose
definition nvolves conditional densities. This Jacobian estimator generalizes
bootstrap quantile standard errors and can be efficiently computed via
closed-end solutions. We evaluate the performance of the proposal in
simulations and an empirical example on the heterogeneous treatment effect of
Job Training Partnership Act.
arXiv link: http://arxiv.org/abs/1805.06855v4
Happy family of stable marriages
distinguished marriage plans: the fully transferable case, where money can be
transferred between the participants, and the fully non transferable case where
each participant has its own rigid preference list regarding the other gender.
We continue to discuss intermediate partial transferable cases. Partial
transferable plans can be approached as either special cases of cooperative
games using the notion of a core, or as a generalization of the cyclical
monotonicity property of the fully transferable case (fake promises). We shall
introduced these two approaches, and prove the existence of stable marriage for
the fully transferable and non-transferable plans.
arXiv link: http://arxiv.org/abs/1805.06687v1
Data-Driven Investment Decision-Making: Applying Moore's Law and S-Curves to Business Strategies
(i.e. Moore's Law) and technology adoption curves (i.e. S-Curves). There has
been considerable research surrounding Moore's Law and the generalized versions
applied to the time dependence of performance for other technologies. The prior
work has culminated with methodology for quantitative estimation of
technological improvement rates for nearly any technology. This paper examines
the implications of such regular time dependence for performance upon the
timing of key events in the technological adoption process. We propose a simple
crossover point in performance which is based upon the technological
improvement rates and current level differences for target and replacement
technologies. The timing for the cross-over is hypothesized as corresponding to
the first 'knee'? in the technology adoption "S-curve" and signals when the
market for a given technology will start to be rewarding for innovators. This
is also when potential entrants are likely to intensely experiment with
product-market fit and when the competition to achieve a dominant design
begins. This conceptual framework is then back-tested by examining two
technological changes brought about by the internet, namely music and video
transmission. The uncertainty analysis around the cases highlight opportunities
for organizations to reduce future technological uncertainty. Overall, the
results from the case studies support the reliability and utility of the
conceptual framework in strategic business decision-making with the caveat that
while technical uncertainty is reduced, it is not eliminated.
arXiv link: http://arxiv.org/abs/1805.06339v1
The Finite Sample Performance of Treatment Effects Estimators based on the Lasso
machine learning inspired methods by studying the performance of different
estimators based on the Lasso. Building on recent work in the field of
high-dimensional statistics, we use the semiparametric efficient score
estimation structure to compare different estimators. Alternative weighting
schemes are considered and their suitability for the incorporation of machine
learning estimators is assessed using theoretical arguments and various Monte
Carlo experiments. Additionally we propose an own estimator based on doubly
robust Kernel matching that is argued to be more robust to nuisance parameter
misspecification. In the simulation study we verify theory based intuition and
find good finite sample properties of alternative weighting scheme estimators
like the one we propose.
arXiv link: http://arxiv.org/abs/1805.05067v1
A Dynamic Analysis of Nash Equilibria in Search Models with Fiat Money
economy by developing a method that can determine dynamic Nash equilibria for a
class of search models with genuine heterogenous agents. We also address open
issues regarding the stability properties of pure strategies equilibria and the
presence of multiple equilibria. Experiments illustrate the liquidity
conditions that favor the transition from partial to full acceptance of fiat
money, and the effects of inflationary shocks on production, liquidity, and
trade.
arXiv link: http://arxiv.org/abs/1805.04733v1
Efficiency in Micro-Behaviors and FL Bias
pari-mutuel betting system under two hypotheses on the behavior of bettors: 1.
The amount of bets increases very rapidly as the deadline for betting comes
near. 2. Each bettor bets on a horse which gives the largest expectation value
of the benefit. The results can be interpreted as such efficient behaviors do
not serve to extinguish the FL bias but even produce stronger FL bias.
arXiv link: http://arxiv.org/abs/1805.04225v1
Density Forecasts in Panel Data Models: A Semiparametric Bayesian Perspective
firms or households using a dynamic linear model with common and heterogeneous
coefficients as well as cross-sectional heteroskedasticity. The panel
considered in this paper features a large cross-sectional dimension N but short
time series T. Due to the short T, traditional methods have difficulty in
disentangling the heterogeneous parameters from the shocks, which contaminates
the estimates of the heterogeneous parameters. To tackle this problem, I assume
that there is an underlying distribution of heterogeneous parameters, model
this distribution nonparametrically allowing for correlation between
heterogeneous parameters and initial conditions as well as individual-specific
regressors, and then estimate this distribution by combining information from
the whole panel. Theoretically, I prove that in cross-sectional homoskedastic
cases, both the estimated common parameters and the estimated distribution of
the heterogeneous parameters achieve posterior consistency, and that the
density forecasts asymptotically converge to the oracle forecast.
Methodologically, I develop a simulation-based posterior sampling algorithm
specifically addressing the nonparametric density estimation of unobserved
heterogeneous parameters. Monte Carlo simulations and an empirical application
to young firm dynamics demonstrate improvements in density forecasts relative
to alternative approaches.
arXiv link: http://arxiv.org/abs/1805.04178v3
News Sentiment as Leading Indicators for Recessions
scoring methods to construct a novel metric that serves as a leading indicator
in recession prediction models. We hypothesize that the inclusion of such a
sentiment indicator, derived purely from unstructured news data, will improve
our capabilities to forecast future recessions because it provides a direct
measure of the polarity of the information consumers and producers are exposed
to. We go on to show that the inclusion of our proposed news sentiment
indicator, with traditional sentiment data, such as the Michigan Index of
Consumer Sentiment and the Purchasing Manager's Index, and common factors
derived from a large panel of economic and financial indicators helps improve
model performance significantly.
arXiv link: http://arxiv.org/abs/1805.04160v2
Sufficient Statistics for Unobserved Heterogeneity in Structural Dynamic Logit Models
dynamic panel data logit models where decisions are forward-looking and the
joint distribution of unobserved heterogeneity and observable state variables
is nonparametric, i.e., fixed-effects model. We consider models with two
endogenous state variables: the lagged decision variable, and the time duration
in the last choice. This class of models includes as particular cases important
economic applications such as models of market entry-exit, occupational choice,
machine replacement, inventory and investment decisions, or dynamic demand of
differentiated products. The identification of structural parameters requires a
sufficient statistic that controls for unobserved heterogeneity not only in
current utility but also in the continuation value of the forward-looking
decision problem. We obtain the minimal sufficient statistic and prove
identification of some structural parameters using a conditional likelihood
approach. We apply this estimator to a machine replacement model.
arXiv link: http://arxiv.org/abs/1805.04048v1
A mixture autoregressive model based on Student's $t$-distribution
proposed. A key feature of our model is that the conditional $t$-distributions
of the component models are based on autoregressions that have multivariate
$t$-distributions as their (low-dimensional) stationary distributions. That
autoregressions with such stationary distributions exist is not immediate. Our
formulation implies that the conditional mean of each component model is a
linear function of past observations and the conditional variance is also time
varying. Compared to previous mixture autoregressive models our model may
therefore be useful in applications where the data exhibits rather strong
conditional heteroskedasticity. Our formulation also has the theoretical
advantage that conditions for stationarity and ergodicity are always met and
these properties are much more straightforward to establish than is common in
nonlinear autoregressive models. An empirical example employing a realized
kernel series based on S&P 500 high-frequency data shows that the proposed
model performs well in volatility forecasting.
arXiv link: http://arxiv.org/abs/1805.04010v1
Structural Breaks in Time Series
computation for models involving structural changes. Our aim is to review
developments as they relate to econometric applications based on linear models.
Substantial advances have been made to cover models at a level of generality
that allow a host of interesting practical applications. These include models
with general stationary regressors and errors that can exhibit temporal
dependence and heteroskedasticity, models with trending variables and possible
unit roots and cointegrated models, among others. Advances have been made
pertaining to computational aspects of constructing estimates, their limit
distributions, tests for structural changes, and methods to determine the
number of changes present. A variety of topics are covered. The first part
summarizes and updates developments described in an earlier review, Perron
(2006), with the exposition following heavily that of Perron (2008). Additions
are included for recent developments: testing for common breaks, models with
endogenous regressors (emphasizing that simply using least-squares is
preferable over instrumental variables methods), quantile regressions, methods
based on Lasso, panel data models, testing for changes in forecast accuracy,
factors models and methods of inference based on a continuous records
asymptotic framework. Our focus is on the so-called off-line methods whereby
one wants to retrospectively test for breaks in a given sample of data and form
confidence intervals about the break dates. The aim is to provide the readers
with an overview of methods that are of direct usefulness in practice as
opposed to issues that are mostly of theoretical interest.
arXiv link: http://arxiv.org/abs/1805.03807v1
Optimal Linear Instrumental Variables Approximations
approximation of a structural regression function. The parameter in the linear
approximation is called the Optimal Linear Instrumental Variables Approximation
(OLIVA). This paper shows that a necessary condition for standard inference on
the OLIVA is also sufficient for the existence of an IV estimand in a linear
model. The instrument in the IV estimand is unknown and may not be identified.
A Two-Step IV (TSIV) estimator based on Tikhonov regularization is proposed,
which can be implemented by standard regression routines. We establish the
asymptotic normality of the TSIV estimator assuming neither completeness nor
identification of the instrument. As an important application of our analysis,
we robustify the classical Hausman test for exogeneity against misspecification
of the linear structural model. We also discuss extensions to weighted least
squares criteria. Monte Carlo simulations suggest an excellent finite sample
performance for the proposed inferences. Finally, in an empirical application
estimating the elasticity of intertemporal substitution (EIS) with US data, we
obtain TSIV estimates that are much larger than their standard IV counterparts,
with our robust Hausman test failing to reject the null hypothesis of
exogeneity of real interest rates.
arXiv link: http://arxiv.org/abs/1805.03275v3
Endogenous growth - A dynamic technology augmentation of the Solow model
exogenous economic growth model by including a measurement which tries to
explain and quantify the size of technological innovation ( A ) endogenously. I
do not agree technology is a "constant" exogenous variable, because it is
humans who create all technological innovations, and it depends on how much
human and physical capital is allocated for its research. I inspect several
possible approaches to do this, and then I test my model both against sample
and real world evidence data. I call this method "dynamic" because it tries to
model the details in resource allocations between research, labor and capital,
by affecting each other interactively. In the end, I point out which is the new
residual and the parts of the economic growth model which can be further
improved.
arXiv link: http://arxiv.org/abs/1805.00668v1
Identifying Effects of Multivalued Treatments
assumptions: ordered choice, and more recently unordered monotonicity. We show
how treatment effects can be identified in a more general class of models that
allows for multidimensional unobserved heterogeneity. Our results rely on two
main assumptions: treatment assignment must be a measurable function of
threshold-crossing rules, and enough continuous instruments must be available.
We illustrate our approach for several classes of models.
arXiv link: http://arxiv.org/abs/1805.00057v1
Interpreting Quantile Independence
independence, like quantile independence? In the context of identifying causal
effects of a treatment variable, we argue that such deviations should be chosen
based on the form of selection on unobservables they allow. For quantile
independence, we characterize this form of treatment selection. Specifically,
we show that quantile independence is equivalent to a constraint on the average
value of either a latent propensity score (for a binary treatment) or the cdf
of treatment given the unobservables (for a continuous treatment). In both
cases, this average value constraint requires a kind of non-monotonic treatment
selection. Using these results, we show that several common treatment selection
models are incompatible with quantile independence. We introduce a class of
assumptions which weakens quantile independence by removing the average value
constraint, and therefore allows for monotonic treatment selection. In a
potential outcomes model with a binary treatment, we derive identified sets for
the ATT and QTT under both classes of assumptions. In a numerical example we
show that the average value constraint inherent in quantile independence has
substantial identifying power. Our results suggest that researchers should
carefully consider the credibility of this non-monotonicity property when using
quantile independence to weaken full independence.
arXiv link: http://arxiv.org/abs/1804.10957v1
Chain effects of clean water: The Mills-Reincke phenomenon in early twentieth-century Japan
known as the "Mills-Reincke phenomenon," in early twentieth-century Japan.
Recent studies have reported that water purifications systems are responsible
for huge contributions to human capital. Although some studies have
investigated the instantaneous effects of water-supply systems in pre-war
Japan, little is known about the chain effects of these systems. By analyzing
city-level cause-specific mortality data from 1922-1940, we find that a decline
in typhoid deaths by one per 1,000 people decreased the risk of death due to
non-waterborne diseases such as tuberculosis and pneumonia by 0.742-2.942 per
1,000 people. Our finding suggests that the observed Mills-Reincke phenomenon
could have resulted in the relatively rapid decline in the mortality rate in
early twentieth-century Japan.
arXiv link: http://arxiv.org/abs/1805.00875v3
New HSIC-based tests for independence between two stationary multivariate time series
between two multivariate stationary time series. These new tests apply the
Hilbert-Schmidt independence criterion (HSIC) to test the independence between
the innovations of both time series. Under regular conditions, the limiting
null distributions of our HSIC-based tests are established. Next, our
HSIC-based tests are shown to be consistent. Moreover, a residual bootstrap
method is used to obtain the critical values for our HSIC-based tests, and its
validity is justified. Compared with the existing cross-correlation-based tests
for linear dependence, our tests examine the general (including both linear and
non-linear) dependence to give investigators more complete information on the
causal relationship between two multivariate time series. The merits of our
tests are illustrated by some simulation results and a real example.
arXiv link: http://arxiv.org/abs/1804.09866v1
Deep Learning for Predicting Asset Returns
Predictability is achieved via multiple layers of composite factors as opposed
to additive ones. Viewed in this way, asset pricing studies can be revisited
using multi-layer deep learners, such as rectified linear units (ReLU) or
long-short-term-memory (LSTM) for time-series effects. State-of-the-art
algorithms including stochastic gradient descent (SGD), TensorFlow and dropout
design provide imple- mentation and efficient factor exploration. To illustrate
our methodology, we revisit the equity market risk premium dataset of Welch and
Goyal (2008). We find the existence of nonlinear factors which explain
predictability of returns, in particular at the extremes of the characteristic
space. Finally, we conclude with directions for future research.
arXiv link: http://arxiv.org/abs/1804.09314v2
Economic inequality and Islamic Charity: An exploratory agent-based modeling approach
social policy makers across the world to insure the sustainable economic growth
and justice. In the mainstream school of economics, namely neoclassical
theories, economic issues are dealt with in a mechanistic manner. Such a
mainstream framework is majorly focused on investigating a socio-economic
system based on an axiomatic scheme where reductionism approach plays a vital
role. The major limitations of such theories include unbounded rationality of
economic agents, reducing the economic aggregates to a set of predictable
factors and lack of attention to adaptability and the evolutionary nature of
economic agents. In tackling deficiencies of conventional economic models, in
the past two decades, some new approaches have been recruited. One of those
novel approaches is the Complex adaptive systems (CAS) framework which has
shown a very promising performance in action. In contrast to mainstream school,
under this framework, the economic phenomena are studied in an organic manner
where the economic agents are supposed to be both boundedly rational and
adaptive. According to it, the economic aggregates emerge out of the ways
agents of a system decide and interact. As a powerful way of modeling CASs,
Agent-based models (ABMs) has found a growing application among academicians
and practitioners. ABMs show that how simple behavioral rules of agents and
local interactions among them at micro-scale can generate surprisingly complex
patterns at macro-scale. In this paper, ABMs have been used to show (1) how an
economic inequality emerges in a system and to explain (2) how sadaqah as an
Islamic charity rule can majorly help alleviating the inequality and how
resource allocation strategies taken by charity entities can accelerate this
alleviation.
arXiv link: http://arxiv.org/abs/1804.09284v1
Statistical and Economic Evaluation of Time Series Models for Forecasting Arrivals at Call Centers
distributional forecasts of call arrivals in order to achieve an optimal
balance between service quality and operating costs. We present a strategy for
selecting forecast models of call arrivals which is based on three pillars: (i)
flexibility of the loss function; (ii) statistical evaluation of forecast
accuracy; (iii) economic evaluation of forecast performance using money
metrics. We implement fourteen time series models and seven forecast
combination schemes on three series of daily call arrivals. Although we focus
mainly on point forecasts, we also analyze density forecast evaluation. We show
that second moments modeling is important both for point and density
forecasting and that the simple Seasonal Random Walk model is always
outperformed by more general specifications. Our results suggest that call
center managers should invest in the use of forecast models which describe both
first and second moments of call arrivals.
arXiv link: http://arxiv.org/abs/1804.08315v1
Econometric Modeling of Regional Electricity Spot Prices in the Australian Market
interconnectors, and inter-regional trade in electricity is growing. To model
this, we consider a spatial equilibrium model of price formation, where
constraints on inter-regional flows result in three distinct equilibria in
prices. We use this to motivate an econometric model for the distribution of
observed electricity spot prices that captures many of their unique empirical
characteristics. The econometric model features supply and inter-regional trade
cost functions, which are estimated using Bayesian monotonic regression
smoothing methodology. A copula multivariate time series model is employed to
capture additional dependence -- both cross-sectional and serial-- in regional
prices. The marginal distributions are nonparametric, with means given by the
regression means. The model has the advantage of preserving the heavy
right-hand tail in the predictive densities of price. We fit the model to
half-hourly spot price data in the five interconnected regions of the
Australian national electricity market. The fitted model is then used to
measure how both supply and price shocks in one region are transmitted to the
distribution of prices in all regions in subsequent periods. Finally, to
validate our econometric model, we show that prices forecast using the proposed
model compare favorably with those from some benchmark alternatives.
arXiv link: http://arxiv.org/abs/1804.08218v1
Price Competition with Geometric Brownian motion in Exchange Rate Uncertainty
against exchange rate uncertainties and competition. We consider a single
product and single period. Because of long-lead times, the capacity investment
must done before the selling season begins when the exchange rate between the
two countries is uncertain. we consider a duopoly competition in the foreign
country. We model the exchange rate as a random variable. We investigate the
impact of competition and exchange rate on optimal capacities and optimal
prices. We show how competition can impact the decision of the home
manufacturer to enter the foreign market.
arXiv link: http://arxiv.org/abs/1804.08153v1
Empirical Equilibrium
equilibrium that is based on a non-parametric characterization of empirical
distributions of behavior in games (Velez and Brown,2020b arXiv:1907.12408).
The refinement can be alternatively defined as those Nash equilibria that do
not refute the regular QRE theory of Goeree, Holt, and Palfrey (2005). By
contrast, some empirical equilibria may refute monotone additive randomly
disturbed payoff models. As a by product, we show that empirical equilibrium
does not coincide with refinements based on approximation by monotone additive
randomly disturbed payoff models, and further our understanding of the
empirical content of these models.
arXiv link: http://arxiv.org/abs/1804.07986v3
Transaction Costs in Collective Waste Recovery Systems in the EU
management model by analysing the economic model of extended producer
responsibility and collective waste management systems and to create a model
for measuring the transaction costs borne by waste recovery organizations. The
model was approbated by analysing the Bulgarian collective waste management
systems that have been complying with the EU legislation for the last 10 years.
The analysis focuses on waste oils because of their economic importance and the
limited number of studies and analyses in this field as the predominant body of
research to date has mainly addressed packaging waste, mixed household waste or
discarded electrical and electronic equipment. The study aims to support the
process of establishing a circular economy in the EU, which was initiated in
2015.
arXiv link: http://arxiv.org/abs/1804.06792v1
Estimating Treatment Effects in Mover Designs
estimate causal effects. While these "mover regressions" are often motivated by
a linear constant-effects model, it is not clear what they capture under weaker
quasi-experimental assumptions. I show that binary treatment mover regressions
recover a convex average of four difference-in-difference comparisons and are
thus causally interpretable under a standard parallel trends assumption.
Estimates from multiple-treatment models, however, need not be causal without
stronger restrictions on the heterogeneity of treatment effects and
time-varying shocks. I propose a class of two-step estimators to isolate and
combine the large set of difference-in-difference quasi-experiments generated
by a mover design, identifying mover average treatment effects under
conditional-on-covariate parallel trends and effect homogeneity restrictions. I
characterize the efficient estimators in this class and derive specification
tests based on the model's overidentifying restrictions. Future drafts will
apply the theory to the Finkelstein et al. (2016) movers design, analyzing the
causal effects of geography on healthcare utilization.
arXiv link: http://arxiv.org/abs/1804.06721v1
Revisiting the thermal and superthermal two-class distribution of incomes: A critical perspective
the income distribution performed by physicists over the past decade. Their
finding rely on the graphical analysis of the observed distribution of
normalized incomes. Two central observations lead to the conclusion that the
majority of incomes are exponentially distributed, but neither each individual
piece of evidence nor their concurrent observation robustly proves that the
thermal and superthermal mixture fits the observed distribution of incomes
better than reasonable alternatives. A formal analysis using popular measures
of fit shows that while an exponential distribution with a power-law tail
provides a better fit of the IRS income data than the log-normal distribution
(often assumed by economists), the thermal and superthermal mixture's fit can
be improved upon further by adding a log-normal component. The economic
implications of the thermal and superthermal distribution of incomes, and the
expanded mixture are explored in the paper.
arXiv link: http://arxiv.org/abs/1804.06341v1
Dissection of Bitcoin's Multiscale Bubble History from January 2012 to February 2018
dynamics from January 2012 to February 2018. We introduce a robust automatic
peak detection method that classifies price time series into periods of
uninterrupted market growth (drawups) and regimes of uninterrupted market
decrease (drawdowns). In combination with the Lagrange Regularisation Method
for detecting the beginning of a new market regime, we identify 3 major peaks
and 10 additional smaller peaks, that have punctuated the dynamics of Bitcoin
price during the analyzed time period. We explain this classification of long
and short bubbles by a number of quantitative metrics and graphs to understand
the main socio-economic drivers behind the ascent of Bitcoin over this period.
Then, a detailed analysis of the growing risks associated with the three long
bubbles using the Log-Periodic Power Law Singularity (LPPLS) model is based on
the LPPLS Confidence Indicators, defined as the fraction of qualified fits of
the LPPLS model over multiple time windows. Furthermore, for various fictitious
'present' times $t_2$ before the crashes, we employ a clustering method to
group the predicted critical times $t_c$ of the LPPLS fits over different time
scales, where $t_c$ is the most probable time for the ending of the bubble.
Each cluster is proposed as a plausible scenario for the subsequent Bitcoin
price evolution. We present these predictions for the three long bubbles and
the four short bubbles that our time scale of analysis was able to resolve.
Overall, our predictive scheme provides useful information to warn of an
imminent crash risk.
arXiv link: http://arxiv.org/abs/1804.06261v4
Quantifying the Economic Case for Electric Semi-Trucks
transport, particularly heavy-duty trucks to downscale the greenhouse-gas (GHG)
emissions from the transportation sector. However, the economic competitiveness
of electric semi-trucks is uncertain as there are substantial additional
initial costs associated with the large battery packs required. In this work,
we analyze the trade-off between the initial investment and the operating cost
for realistic usage scenarios to compare a fleet of electric semi-trucks with a
range of 500 miles with a fleet of diesel trucks. For the baseline case with
30% of fleet requiring battery pack replacements and a price differential of
US$50,000, we find a payback period of about 3 years. Based on sensitivity
analysis, we find that the fraction of the fleet that requires battery pack
replacements is a major factor. For the case with 100% replacement fraction,
the payback period could be as high as 5-6 years. We identify the price of
electricity as the second most important variable, where a price of
US$0.14/kWh, the payback period could go up to 5 years. Electric semi-trucks
are expected to lead to savings due to reduced repairs and magnitude of these
savings could play a crucial role in the payback period as well. With increased
penetration of autonomous vehicles, the annual mileage of semi-trucks could
substantially increase and this heavily sways in favor of electric semi-trucks,
bringing down the payback period to around 2 years at an annual mileage of
120,000 miles. There is an undeniable economic case for electric semi-trucks
and developing battery packs with longer cycle life and higher specific energy
would make this case even stronger.
arXiv link: http://arxiv.org/abs/1804.05974v1
Bitcoin market route to maturity? Evidence from return fluctuations, temporal correlations and multiscaling effects
properties of the rapidly-emerging Bitcoin (BTC) market are assessed over
chosen sub-periods, in terms of return distributions, volatility
autocorrelation, Hurst exponents and multiscaling effects. The findings are
compared to the stylized facts of mature world markets. While early trading was
affected by system-specific irregularities, it is found that over the months
preceding Apr 2018 all these statistical indicators approach the features
hallmarking maturity. This can be taken as an indication that the Bitcoin
market, and possibly other cryptocurrencies, carry concrete potential of
imminently becoming a regular market, alternative to the foreign exchange
(Forex). Since high-frequency price data are available since the beginning of
trading, the Bitcoin offers a unique window into the statistical
characteristics of a market maturation trajectory.
arXiv link: http://arxiv.org/abs/1804.05916v2
Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects
use two-way fixed effects regressions that include leads and lags of the
treatment. We show that in settings with variation in treatment timing across
units, the coefficient on a given lead or lag can be contaminated by effects
from other periods, and apparent pretrends can arise solely from treatment
effects heterogeneity. We propose an alternative estimator that is free of
contamination, and illustrate the relative shortcomings of two-way fixed
effects regressions with leads and lags through an empirical application.
arXiv link: http://arxiv.org/abs/1804.05785v2
Triggers for cooperative behavior in the thermodynamic limit: a case study in Public goods game
behavior in the thermodynamic limit by taking recourse to the Public goods
game. Using the idea of mapping the 1D Ising model Hamiltonian with nearest
neighbor coupling to payoffs in the game theory we calculate the Magnetisation
of the game in the thermodynamic limit. We see a phase transition in the
thermodynamic limit of the two player Public goods game. We observe that
punishment acts as an external field for the two player Public goods game
triggering cooperation or provide strategy, while cost can be a trigger for
suppressing cooperation or free riding. Finally, reward also acts as a trigger
for providing while the role of inverse temperature (fluctuations in choices)
is to introduce randomness in strategic choices.
arXiv link: http://arxiv.org/abs/1804.06465v2
Shapley Value Methods for Attribution Modeling in Online Advertising
the area of online advertising. As a credit allocation solution in cooperative
game theory, Shapley value method directly quantifies the contribution of
online advertising inputs to the advertising key performance indicator (KPI)
across multiple channels. We simplify its calculation by developing an
alternative mathematical formulation. The new formula significantly improves
the computational efficiency and therefore extends the scope of applicability.
Based on the simplified formula, we further develop the ordered Shapley value
method. The proposed method is able to take into account the order of channels
visited by users. We claim that it provides a more comprehensive insight by
evaluating the attribution of channels at different stages of user conversion
journeys. The proposed approaches are illustrated using a real-world online
advertising campaign dataset.
arXiv link: http://arxiv.org/abs/1804.05327v1
Aide et Croissance dans les pays de l'Union Economique et Mon{é}taire Ouest Africaine (UEMOA) : retour sur une relation controvers{é}e
development assistance (ODA) on economic growth in WAEMU zone countries. To
achieve this, the study is based on OECD and WDI data covering the period
1980-2015 and used Hansen's Panel Threshold Regression (PTR) model to
"bootstrap" aid threshold above which its effectiveness is effective. The
evidence strongly supports the view that the relationship between aid and
economic growth is non-linear with a unique threshold which is 12.74% GDP.
Above this value, the marginal effect of aid is 0.69 points, "all things being
equal to otherwise". One of the main contribution of this paper is to show that
WAEMU countries need investments that could be covered by the foreign aid. This
later one should be considered just as a complementary resource. Thus, WEAMU
countries should continue to strengthen their efforts in internal resource
mobilization in order to fulfil this need.
arXiv link: http://arxiv.org/abs/1805.00435v1
Large Sample Properties of Partitioning-Based Series Estimators
nonparametric regression, a popular method for approximating conditional
expectation functions in statistics, econometrics, and machine learning. First,
we obtain a general characterization of their leading asymptotic bias. Second,
we establish integrated mean squared error approximations for the point
estimator and propose feasible tuning parameter selection. Third, we develop
pointwise inference methods based on undersmoothing and robust bias correction.
Fourth, employing different coupling approaches, we develop uniform
distributional approximations for the undersmoothed and robust bias-corrected
t-statistic processes and construct valid confidence bands. In the univariate
case, our uniform distributional approximations require seemingly minimal rate
restrictions and improve on approximation rates known in the literature.
Finally, we apply our general results to three partitioning-based estimators:
splines, wavelets, and piecewise polynomials. The supplemental appendix
includes several other general and example-specific technical and
methodological results. A companion R package is provided.
arXiv link: http://arxiv.org/abs/1804.04916v3
Moment Inequalities in the Context of Simulated and Predicted Variables
inference methods based on moment inequalities. Commonly used confidence sets
for parameters are level sets of criterion functions whose boundary points may
depend on sample moments in an irregular manner. Due to this feature,
simulation errors can affect the performance of inference in non-standard ways.
In particular, a (first-order) bias due to the simulation errors may remain in
the estimated boundary of the confidence set. We demonstrate, through Monte
Carlo experiments, that simulation errors can significantly reduce the coverage
probabilities of confidence sets in small samples. The size distortion is
particularly severe when the number of inequality restrictions is large. These
results highlight the danger of ignoring the sampling variations due to the
simulation errors in moment inequality models. Similar issues arise when using
predicted variables in moment inequalities models. We propose a method for
properly correcting for these variations based on regularizing the intersection
of moments in parameter space, and we show that our proposed method performs
well theoretically and in practice.
arXiv link: http://arxiv.org/abs/1804.03674v1
Inference on Local Average Treatment Effects for Misclassified Treatment
the binary treatment contains a measurement error. The standard instrumental
variable estimator is inconsistent for the parameter since the measurement
error is non-classical by construction. We correct the problem by identifying
the distribution of the measurement error based on the use of an exogenous
variable that can even be a binary covariate. The moment conditions derived
from the identification lead to generalized method of moments estimation with
asymptotically valid inferences. Monte Carlo simulations and an empirical
illustration demonstrate the usefulness of the proposed procedure.
arXiv link: http://arxiv.org/abs/1804.03349v1
Varying Random Coefficient Models
when observed characteristics are modeled nonlinearly. The proposed model
builds on varying random coefficients (VRC) that are determined by nonlinear
functions of observed regressors and additively separable unobservables. This
paper proposes a novel estimator of the VRC density based on weighted sieve
minimum distance. The main example of sieve bases are Hermite functions which
yield a numerically stable estimation procedure. This paper shows inference
results that go beyond what has been shown in ordinary RC models. We provide in
each case rates of convergence and also establish pointwise limit theory of
linear functionals, where a prominent example is the density of potential
outcomes. In addition, a multiplier bootstrap procedure is proposed to
construct uniform confidence bands. A Monte Carlo study examines finite sample
properties of the estimator and shows that it performs well even when the
regressors associated to RC are far from being heavy tailed. Finally, the
methodology is applied to analyze heterogeneity in income elasticity of demand
for housing.
arXiv link: http://arxiv.org/abs/1804.03110v4
Statistical inference for autoregressive models under heteroscedasticity of unknown form
model under (conditional) heteroscedasticity of unknown form with a finite
variance. We first establish the asymptotic normality of the weighted least
absolute deviations estimator (LADE) for the model. Second, we develop the
random weighting (RW) method to estimate its asymptotic covariance matrix,
leading to the implementation of the Wald test. Third, we construct a
portmanteau test for model checking, and use the RW method to obtain its
critical values. As a special weighted LADE, the feasible adaptive LADE (ALADE)
is proposed and proved to have the same efficiency as its infeasible
counterpart. The importance of our entire methodology based on the feasible
ALADE is illustrated by simulation results and the real data analysis on three
U.S. economic data sets.
arXiv link: http://arxiv.org/abs/1804.02348v2
Simultaneous Mean-Variance Regression
and approximation of conditional mean functions. In the presence of
heteroskedasticity of unknown form, our method accounts for varying dispersion
in the regression outcome across the support of conditioning variables by using
weights that are jointly determined with the mean regression parameters.
Simultaneity generates outcome predictions that are guaranteed to improve over
ordinary least-squares prediction error, with corresponding parameter standard
errors that are automatically valid. Under shape misspecification of the
conditional mean and variance functions, we establish existence and uniqueness
of the resulting approximations and characterize their formal interpretation
and robustness properties. In particular, we show that the corresponding
mean-variance regression location-scale model weakly dominates the ordinary
least-squares location model under a Kullback-Leibler measure of divergence,
with strict improvement in the presence of heteroskedasticity. The simultaneous
mean-variance regression loss function is globally convex and the corresponding
estimator is easy to implement. We establish its consistency and asymptotic
normality under misspecification, provide robust inference methods, and present
numerical simulations that show large improvements over ordinary and weighted
least-squares in terms of estimation and inference in finite samples. We
further illustrate our method with two empirical applications to the estimation
of the relationship between economic prosperity in 1500 and today, and demand
for gasoline in the United States.
arXiv link: http://arxiv.org/abs/1804.01631v2
A Bayesian panel VAR model to analyze the impact of climate change on high-income economies
agricultural commodities and a set of macroeconomic quantities for multiple
high-income economies. To capture relations among countries, markets, and
climate shocks, this paper proposes parsimonious methods to estimate
high-dimensional panel VARs. We assume that coefficients associated with
domestic lagged endogenous variables arise from a Gaussian mixture model while
further parsimony is achieved using suitable global-local shrinkage priors on
several regions of the parameter space. Our results point towards pronounced
global reactions of key macroeconomic quantities to climate shocks. Moreover,
the empirical findings highlight substantial linkages between regionally
located climate shifts and global commodity markets.
arXiv link: http://arxiv.org/abs/1804.01554v3
Should We Adjust for the Test for Pre-trends in Difference-in-Difference Designs?
parallel trends prior to treatment assignment, yet typical estimation and
inference does not account for the fact that this test has occurred. I analyze
the properties of the traditional DiD estimator conditional on having passed
(i.e. not rejected) the test for parallel pre-trends. When the DiD design is
valid and the test for pre-trends confirms it, the typical DiD estimator is
unbiased, but traditional standard errors are overly conservative.
Additionally, there exists an alternative unbiased estimator that is more
efficient than the traditional DiD estimator under parallel trends. However,
when in population there is a non-zero pre-trend but we fail to reject the
hypothesis of parallel pre-trends, the DiD estimator is generally biased
relative to the population DiD coefficient. Moreover, if the trend is monotone,
then under reasonable assumptions the bias from conditioning exacerbates the
bias relative to the true treatment effect. I propose new estimation and
inference procedures that account for the test for parallel trends, and compare
their performance to that of the traditional estimator in a Monte Carlo
simulation.
arXiv link: http://arxiv.org/abs/1804.01208v2
Continuous Record Laplace-based Inference about the Break Date in Structural Change Models
by Casini and Perron (2018a) for inference in structural change models, we
propose a Laplace-based (Quasi-Bayes) procedure for the construction of the
estimate and confidence set for the date of a structural change. It is defined
by an integration rather than an optimization-based method. A transformation of
the least-squares criterion function is evaluated in order to derive a proper
distribution, referred to as the Quasi-posterior. For a given choice of a loss
function, the Laplace-type estimator is the minimizer of the expected risk with
the expectation taken under the Quasi-posterior. Besides providing an
alternative estimate that is more precise|lower mean absolute error (MAE) and
lower root-mean squared error (RMSE)|than the usual least-squares one, the
Quasi-posterior distribution can be used to construct asymptotically valid
inference using the concept of Highest Density Region. The resulting
Laplace-based inferential procedure is shown to have lower MAE and RMSE, and
the confidence sets strike the best balance between empirical coverage rates
and average lengths of the confidence sets relative to traditional long-span
methods, whether the break size is small or large.
arXiv link: http://arxiv.org/abs/1804.00232v3
Mortality in a heterogeneous population - Lee-Carter's methodology
attention to the risk management methods. The sense of risk management is the
ability to quantify risk and apply methods that reduce uncertainty. In life
insurance, the risk is a consequence of the random variable describing the life
expectancy. The article will present a proposal for stochastic mortality
modeling based on the Lee and Carter methodology. The maximum likelihood method
is often used to estimate parameters in mortality models. This method assumes
that the population is homogeneous and the number of deaths has the Poisson
distribution. The aim of this article is to change assumptions about the
distribution of the number of deaths. The results indicate that the model can
get a better match to historical data, when the number of deaths has a negative
binomial distribution.
arXiv link: http://arxiv.org/abs/1803.11233v1
Bi-Demographic Changes and Current Account using SVAR Modeling
current account and growth. Using a SVAR modeling, we track the dynamic impacts
between these underlying variables. New insights have been developed about the
dynamic interrelation between population growth, current account and economic
growth. The long-run net impact on economic growth of the domestic working
population growth and demand labor for emigrants is positive, due to the
predominant contribution of skilled emigrant workers. Besides, the positive
long-run contribution of emigrant workers to the current account growth largely
compensates the negative contribution from the native population, because of
the predominance of skilled compared to unskilled workforce. We find that a
positive shock in demand labor for emigrant workers leads to an increasing
effect on native active age ratio. Thus, the emigrants appear to be more
complements than substitutes for native workers.
arXiv link: http://arxiv.org/abs/1803.11161v4
Tests for Forecast Instability and Forecast Failure under a Continuous Record Asymptotic Framework
whether the predictive ability of a given forecast model remains stable over
time. We formally define forecast instability from the economic forecaster's
perspective and highlight that the time duration of the instability bears no
relationship with stable period. Our approach is applicable in forecasting
environment involving low-frequency as well as high-frequency macroeconomic and
financial variables. As the sampling interval between observations shrinks to
zero the sequence of forecast losses is approximated by a continuous-time
stochastic process (i.e., an Ito semimartingale) possessing certain pathwise
properties. We build an hypotheses testing problem based on the local
properties of the continuous-time limit counterpart of the sequence of losses.
The null distribution follows an extreme value distribution. While controlling
the statistical size well, our class of test statistics feature uniform power
over the location of the forecast failure in the sample. The test statistics
are designed to have power against general form of insatiability and are robust
to common forms of non-stationarity such as heteroskedasticty and serial
correlation. The gains in power are substantial relative to extant methods,
especially when the instability is short-lasting and when occurs toward the
tail of the sample.
arXiv link: http://arxiv.org/abs/1803.10883v2
Continuous Record Asymptotics for Change-Points Models
break, we develop a continuous record asymptotic framework to build inference
methods for the break date. We have T observations with a sampling frequency h
over a fixed time horizon [0, N] , and let T with h 0 while keeping the time
span N fixed. We impose very mild regularity conditions on an underlying
continuous-time model assumed to generate the data. We consider the
least-squares estimate of the break date and establish consistency and
convergence rate. We provide a limit theory for shrinking magnitudes of shifts
and locally increasing variances. The asymptotic distribution corresponds to
the location of the extremum of a function of the quadratic variation of the
regressors and of a Gaussian centered martingale process over a certain time
interval. We can account for the asymmetric informational content provided by
the pre- and post-break regimes and show how the location of the break and
shift magnitude are key ingredients in shaping the distribution. We consider a
feasible version based on plug-in estimates, which provides a very good
approximation to the finite sample distribution. We use the concept of Highest
Density Region to construct confidence sets. Overall, our method is reliable
and delivers accurate coverage probabilities and relatively short average
length of the confidence sets. Importantly, it does so irrespective of the size
of the break.
arXiv link: http://arxiv.org/abs/1803.10881v3
Generalized Laplace Inference in Multiple Change-Points Models
Generalized Laplace (GL) inference methods for the change-point dates in a
linear time series regression model with multiple structural changes analyzed
in, e.g., Bai and Perron (1998). The GL estimator is defined by an integration
rather than optimization-based method and relies on the least-squares criterion
function. It is interpreted as a classical (non-Bayesian) estimator and the
inference methods proposed retain a frequentist interpretation. This approach
provides a better approximation about the uncertainty in the data of the
change-points relative to existing methods. On the theoretical side, depending
on some input (smoothing) parameter, the class of GL estimators exhibits a dual
limiting distribution; namely, the classical shrinkage asymptotic distribution,
or a Bayes-type asymptotic distribution. We propose an inference method based
on Highest Density Regions using the latter distribution. We show that it has
attractive theoretical properties not shared by the other popular alternatives,
i.e., it is bet-proof. Simulations confirm that these theoretical properties
translate to better finite-sample performance.
arXiv link: http://arxiv.org/abs/1803.10871v4
Emergence of Cooperation in the thermodynamic limit
of the outstanding problems in evolutionary game theory. For two player games,
cooperation is seldom the Nash equilibrium. However, in the thermodynamic limit
cooperation is the natural recourse regardless of whether we are dealing with
humans or animals. In this work, we use the analogy with the Ising model to
predict how cooperation arises in the thermodynamic limit.
arXiv link: http://arxiv.org/abs/1803.10083v2
A Perfect Specialization Model for Gravity Equation in Bilateral Trade based on Production Structure
the volume of trade between two partners, gravity equation has been the focus
of several theoretic models that try to explain it. Specialization models are
of great importance in providing a solid theoretic ground for gravity equation
in bilateral trade. Some research papers try to improve specialization models
by adding imperfect specialization to model, but we believe it is unnecessary
complication. We provide a perfect specialization model based on the phenomenon
we call tradability, which overcomes the problems with simpler initial. We
provide empirical evidence using estimates on panel data of bilateral trade of
40 countries over 10 years that support the theoretical model. The empirical
results have implied that tradability is the only reason for deviations of data
from basic perfect specialization models.
arXiv link: http://arxiv.org/abs/1803.09935v1
Panel Data Analysis with Heterogeneous Dynamics
heterogeneous dynamic structures across observational units. We first compute
the sample mean, autocovariances, and autocorrelations for each unit, and then
estimate the parameters of interest based on their empirical distributions. We
then investigate the asymptotic properties of our estimators using double
asymptotics and propose split-panel jackknife bias correction and inference
based on the cross-sectional bootstrap. We illustrate the usefulness of our
procedures by studying the deviation dynamics of the law of one price. Monte
Carlo simulations confirm that the proposed bias correction is effective and
yields valid inference in small samples.
arXiv link: http://arxiv.org/abs/1803.09452v2
Efficient Discovery of Heterogeneous Quantile Treatment Effects in Randomized Experiments via Anomalous Pattern Detection
proposed method makes its own set of restrictive assumptions about the
intervention's effects and which subpopulations to explicitly estimate.
Moreover, the majority of the literature provides no mechanism to identify
which subpopulations are the most affected--beyond manual inspection--and
provides little guarantee on the correctness of the identified subpopulations.
Therefore, we propose Treatment Effect Subset Scan (TESS), a new method for
discovering which subpopulation in a randomized experiment is most
significantly affected by a treatment. We frame this challenge as a pattern
detection problem where we efficiently maximize a nonparametric scan statistic
(a measure of the conditional quantile treatment effect) over subpopulations.
Furthermore, we identify the subpopulation which experiences the largest
distributional change as a result of the intervention, while making minimal
assumptions about the intervention's effects or the underlying data generating
process. In addition to the algorithm, we demonstrate that under the sharp null
hypothesis of no treatment effect, the asymptotic Type I and II error can be
controlled, and provide sufficient conditions for detection consistency--i.e.,
exact identification of the affected subpopulation. Finally, we validate the
efficacy of the method by discovering heterogeneous treatment effects in
simulations and in real-world data from a well-known program evaluation study.
arXiv link: http://arxiv.org/abs/1803.09159v3
Schooling Choice, Labour Market Matching, and Wages
of agents on one side of the market are endogenous due to pre-matching
investments. The model can be used to measure the impact of frictions in labour
markets using a single cross-section of matched employer-employee data. The
observed matching of workers to firms is the outcome of a discrete, two-sided
matching process where firms with heterogeneous preferences over education
sequentially choose workers according to an index correlated with worker
preferences over firms. The distribution of education arises in equilibrium
from a Bayesian game: workers, knowing the distribution of worker and firm
types, invest in education prior to the matching process. Although the observed
matching exhibits strong cross-sectional dependence due to the matching
process, we propose an asymptotically valid inference procedure that combines
discrete choice methods with simulation.
arXiv link: http://arxiv.org/abs/1803.09020v6
Difference-in-Differences with Multiple Time Periods
procedures for treatment effect parameters using Difference-in-Differences
(DiD) with (i) multiple time periods, (ii) variation in treatment timing, and
(iii) when the "parallel trends assumption" holds potentially only after
conditioning on observed covariates. We show that a family of causal effect
parameters are identified in staggered DiD setups, even if differences in
observed characteristics create non-parallel outcome dynamics between groups.
Our identification results allow one to use outcome regression, inverse
probability weighting, or doubly-robust estimands. We also propose different
aggregation schemes that can be used to highlight treatment effect
heterogeneity across different dimensions as well as to summarize the overall
effect of participating in the treatment. We establish the asymptotic
properties of the proposed estimators and prove the validity of a
computationally convenient bootstrap procedure to conduct asymptotically valid
simultaneous (instead of pointwise) inference. Finally, we illustrate the
relevance of our proposed tools by analyzing the effect of the minimum wage on
teen employment from 2001--2007. Open-source software is available for
implementing the proposed methods.
arXiv link: http://arxiv.org/abs/1803.09015v4
How does monetary policy affect income inequality in Japan? Evidence from grouped data
a novel econometric approach that jointly estimates the Gini coefficient based
on micro-level grouped data of households and the dynamics of macroeconomic
quantities. Our results indicate different effects on income inequality for
different types of households: A monetary tightening increases inequality when
income data is based on households whose head is employed (workers'
households), while the effect reverses over the medium term when considering a
broader definition of households. Differences in the relative strength of the
transmission channels can account for this finding. Finally we demonstrate that
the proposed joint estimation strategy leads to more informative inference
while results based on the frequently used two-step estimation approach yields
inconclusive results.
arXiv link: http://arxiv.org/abs/1803.08868v2
Decentralized Pure Exchange Processes on Networks
minimal assumptions converge to a stable set in the space of allocations, and
characterise the Pareto set of these processes. Choosing a specific process
belonging to this class, that we define fair trading, we analyse the trade
dynamics between agents located on a weighted network. We determine the
conditions under which there always exists a one-to-one map between the set of
networks and the set of limit points of the dynamics. This result is used to
understand what is the effect of the network topology on the trade dynamics and
on the final allocation. We find that the positions in the network affect the
distribution of the utility gains, given the initial allocations
arXiv link: http://arxiv.org/abs/1803.08836v7
Causal Inference for Survival Analysis
function estimation and prediction for subgroups of the data, upto individual
units. Tree ensemble methods, specifically random forests were modified for
this purpose. A real world healthcare dataset was used with about 1800 patients
with breast cancer, which has multiple patient covariates as well as disease
free survival days (DFS) and a death event binary indicator (y). We use the
type of cancer curative intervention as the treatment variable (T=0 or 1,
binary treatment case in our example). The algorithm is a 2 step approach. In
step 1, we estimate heterogeneous treatment effects using a causalTree with the
DFS as the dependent variable. Next, in step 2, for each selected leaf of the
causalTree with distinctly different average treatment effect (with respect to
survival), we fit a survival forest to all the patients in that leaf, one
forest each for treatment T=0 as well as T=1 to get estimated patient level
survival curves for each treatment (more generally, any model can be used at
this step). Then, we subtract the patient level survival curves to get the
differential survival curve for a given patient, to compare the survival
function as a result of the 2 treatments. The path to a selected leaf also
gives us the combination of patient features and their values which are
causally important for the treatment effect difference at the leaf.
arXiv link: http://arxiv.org/abs/1803.08218v1
Two-way fixed effects estimators with heterogeneous treatment effects
estimate treatment effects. We show that they estimate weighted sums of the
average treatment effects (ATE) in each group and period, with weights that may
be negative. Due to the negative weights, the linear regression coefficient may
for instance be negative while all the ATEs are positive. We propose another
estimator that solves this issue. In the two applications we revisit, it is
significantly different from the linear regression estimator.
arXiv link: http://arxiv.org/abs/1803.08807v7
Network and Panel Quantile Effects Via Distribution Regression
quantile functions and quantile effects in nonlinear network and panel models
with unobserved two-way effects, strictly exogenous covariates, and possibly
discrete outcome variables. The method is based upon projection of simultaneous
confidence bands for distribution functions constructed from fixed effects
distribution regression estimators. These fixed effects estimators are debiased
to deal with the incidental parameter problem. Under asymptotic sequences where
both dimensions of the data set grow at the same rate, the confidence bands for
the quantile functions and effects have correct joint coverage in large
samples. An empirical application to gravity models of trade illustrates the
applicability of the methods to network data.
arXiv link: http://arxiv.org/abs/1803.08154v3
Testing Continuity of a Density via g-order statistics in the Regression Discontinuity Design
the credibility of the design by testing the continuity of the density of the
running variable at the cut-off, e.g., McCrary (2008). In this paper we propose
an approximate sign test for continuity of a density at a point based on the
so-called g-order statistics, and study its properties under two complementary
asymptotic frameworks. In the first asymptotic framework, the number q of
observations local to the cut-off is fixed as the sample size n diverges to
infinity, while in the second framework q diverges to infinity slowly as n
diverges to infinity. Under both of these frameworks, we show that the test we
propose is asymptotically valid in the sense that it has limiting rejection
probability under the null hypothesis not exceeding the nominal level. More
importantly, the test is easy to implement, asymptotically valid under weaker
conditions than those used by competing methods, and exhibits finite sample
validity under stronger conditions than those needed for its asymptotic
validity. In a simulation study, we find that the approximate sign test
provides good control of the rejection probability under the null hypothesis
while remaining competitive under the alternative hypothesis. We finally apply
our test to the design in Lee (2008), a well-known application of the RDD to
study incumbency advantage.
arXiv link: http://arxiv.org/abs/1803.07951v6
Testing for Unobserved Heterogeneous Treatment Effects with Observational Data
policy evaluation literature (see e.g., Heckman and Vytlacil, 2005). This paper
proposes a nonparametric test for unobserved heterogeneous treatment effects in
a treatment effect model with a binary treatment assignment, allowing for
individuals' self-selection to the treatment. Under the standard local average
treatment effects assumptions, i.e., the no defiers condition, we derive
testable model restrictions for the hypothesis of unobserved heterogeneous
treatment effects. Also, we show that if the treatment outcomes satisfy a
monotonicity assumption, these model restrictions are also sufficient. Then, we
propose a modified Kolmogorov-Smirnov-type test which is consistent and simple
to implement. Monte Carlo simulations show that our test performs well in
finite samples. For illustration, we apply our test to study heterogeneous
treatment effects of the Job Training Partnership Act on earnings and the
impacts of fertility on family income, where the null hypothesis of homogeneous
treatment effects gets rejected in the second case but fails to be rejected in
the first application.
arXiv link: http://arxiv.org/abs/1803.07514v2
Adversarial Generalized Method of Moments
described via conditional moment restrictions. Conditional moment restrictions
are widely used, as they are the language by which social scientists describe
the assumptions they make to enable causal inference. We formulate the problem
of estimating the underling model as a zero-sum game between a modeler and an
adversary and apply adversarial training. Our approach is similar in nature to
Generative Adversarial Networks (GAN), though here the modeler is learning a
representation of a function that satisfies a continuum of moment conditions
and the adversary is identifying violating moments. We outline ways of
constructing effective adversaries in practice, including kernels centered by
k-means clustering, and random forests. We examine the practical performance of
our approach in the setting of non-parametric instrumental variable regression.
arXiv link: http://arxiv.org/abs/1803.07164v2
Large-Scale Dynamic Predictive Regressions
contribute to the literature on forecasting and economic decision making in a
data-rich environment. Under this framework, clusters of predictors generate
different latent states in the form of predictive densities that are later
synthesized within an implied time-varying latent factor model. As a result,
the latent inter-dependencies across predictive densities and biases are
sequentially learned and corrected. Unlike sparse modeling and variable
selection procedures, we do not assume a priori that there is a given subset of
active predictors, which characterize the predictive density of a quantity of
interest. We test our procedure by investigating the predictive content of a
large set of financial ratios and macroeconomic variables on both the equity
premium across different industries and the inflation rate in the U.S., two
contexts of topical interest in finance and macroeconomics. We find that our
predictive synthesis framework generates both statistically and economically
significant out-of-sample benefits while maintaining interpretability of the
forecasting variables. In addition, the main empirical results highlight that
our proposed framework outperforms both LASSO-type shrinkage regressions,
factor based dimension reduction, sequential variable selection, and
equal-weighted linear pooling methodologies.
arXiv link: http://arxiv.org/abs/1803.06738v1
Evaluating Conditional Cash Transfer Policies with Machine Learning Methods
machine learning models and the structural econometric model. Over the past
decade, machine learning has established itself as a powerful tool in many
prediction applications, but this approach is still not widely adopted in
empirical economic studies. To evaluate the benefits of this approach, I use
the most common machine learning algorithms, CART, C4.5, LASSO, random forest,
and adaboost, to construct prediction models for a cash transfer experiment
conducted by the Progresa program in Mexico, and I compare the prediction
results with those of a previous structural econometric study. Two prediction
tasks are performed in this paper: the out-of-sample forecast and the long-term
within-sample simulation. For the out-of-sample forecast, both the mean
absolute error and the root mean square error of the school attendance rates
found by all machine learning models are smaller than those found by the
structural model. Random forest and adaboost have the highest accuracy for the
individual outcomes of all subgroups. For the long-term within-sample
simulation, the structural model has better performance than do all of the
machine learning models. The poor within-sample fitness of the machine learning
model results from the inaccuracy of the income and pregnancy prediction
models. The result shows that the machine learning model performs better than
does the structural model when there are many data to learn; however, when the
data are limited, the structural model offers a more sensible prediction. The
findings of this paper show promise for adopting machine learning in economic
policy analyses in the era of big data.
arXiv link: http://arxiv.org/abs/1803.06401v1
Business Cycles in Economics
economic output variables in the economy of the scale and the scope in the
amplitude/frequency/phase/time domains in the economics. The accurate forward
looking assumptions on the business cycles oscillation dynamics can optimize
the financial capital investing and/or borrowing by the economic agents in the
capital markets. The book's main objective is to study the business cycles in
the economy of the scale and the scope, formulating the Ledenyov unified
business cycles theory in the Ledenyov classic and quantum econodynamics.
arXiv link: http://arxiv.org/abs/1803.06108v1
Practical volume computation of structured convex bodies, and an application to modeling portfolio dependencies and financial crises
general convex bodies, defined as the intersection of a simplex by a family of
parallel hyperplanes, and another family of parallel hyperplanes or a family of
concentric ellipsoids. Such convex bodies appear in modeling and predicting
financial crises. The impact of crises on the economy (labor, income, etc.)
makes its detection of prime interest. Certain features of dependencies in the
markets clearly identify times of turmoil. We describe the relationship between
asset characteristics by means of a copula; each characteristic is either a
linear or quadratic form of the portfolio components, hence the copula can be
constructed by computing volumes of convex bodies. We design and implement
practical algorithms in the exact and approximate setting, we experimentally
juxtapose them and study the tradeoff of exactness and accuracy for speed. We
analyze the following methods in order of increasing generality: rejection
sampling relying on uniformly sampling the simplex, which is the fastest
approach, but inaccurate for small volumes; exact formulae based on the
computation of integrals of probability distribution functions; an optimized
Lawrence sign decomposition method, since the polytopes at hand are shown to be
simple; Markov chain Monte Carlo algorithms using random walks based on the
hit-and-run paradigm generalized to nonlinear convex bodies and relying on new
methods for computing a ball enclosed; the latter is experimentally extended to
non-convex bodies with very encouraging results. Our C++ software, based on
CGAL and Eigen and available on github, is shown to be very effective in up to
100 dimensions. Our results offer novel, effective means of computing portfolio
dependencies and an indicator of financial crises, which is shown to correctly
identify past crises.
arXiv link: http://arxiv.org/abs/1803.05861v1
Are Bitcoin Bubbles Predictable? Combining a Generalized Metcalfe's Law and the LPPLS Model
analyzing the coincidence (and its absence) of fundamental and technical
indicators. Using a generalized Metcalfe's law based on network properties, a
fundamental value is quantified and shown to be heavily exceeded, on at least
four occasions, by bubbles that grow and burst. In these bubbles, we detect a
universal super-exponential unsustainable growth. We model this universal
pattern with the Log-Periodic Power Law Singularity (LPPLS) model, which
parsimoniously captures diverse positive feedback phenomena, such as herding
and imitation. The LPPLS model is shown to provide an ex-ante warning of market
instabilities, quantifying a high crash hazard and probabilistic bracket of the
crash time consistent with the actual corrections; although, as always, the
precise time and trigger (which straw breaks the camel's back) being exogenous
and unpredictable. Looking forward, our analysis identifies a substantial but
not unprecedented overvaluation in the price of bitcoin, suggesting many months
of volatile sideways bitcoin prices ahead (from the time of writing, March
2018).
arXiv link: http://arxiv.org/abs/1803.05663v1
Does agricultural subsidies foster Italian southern farms? A Spatial Quantile Regression Approach
supporting and stabilising agricultural sector. In 1962, EU policy-makers
developed the so-called Common Agricultural Policy (CAP) to ensure
competitiveness and a common market organisation for agricultural products,
while 2003 reform decouple the CAP from the production to focus only on income
stabilization and the sustainability of agricultural sector. Notwithstanding
farmers are highly dependent to public support, literature on the role played
by the CAP in fostering agricultural performances is still scarce and
fragmented. Actual CAP policies increases performance differentials between
Northern Central EU countries and peripheral regions. This paper aims to
evaluate the effectiveness of CAP in stimulate performances by focusing on
Italian lagged Regions. Moreover, agricultural sector is deeply rooted in
place-based production processes. In this sense, economic analysis which omit
the presence of spatial dependence produce biased estimates of the
performances. Therefore, this paper, using data on subsidies and economic
results of farms from the RICA dataset which is part of the Farm Accountancy
Data Network (FADN), proposes a spatial Augmented Cobb-Douglas Production
Function to evaluate the effects of subsidies on farm's performances. The major
innovation in this paper is the implementation of a micro-founded quantile
version of a spatial lag model to examine how the impact of the subsidies may
vary across the conditional distribution of agricultural performances. Results
show an increasing shape which switch from negative to positive at the median
and becomes statistical significant for higher quantiles. Additionally, spatial
autocorrelation parameter is positive and significant across all the
conditional distribution, suggesting the presence of significant spatial
spillovers in agricultural performances.
arXiv link: http://arxiv.org/abs/1803.05659v1
Limitations of P-Values and $R^2$ for Stepwise Regression Building: A Fairness Demonstration in Health Policy Risk Adjustment
tools, despite their well-known drawbacks. While many of their limitations have
been widely discussed in the literature, other aspects of the use of individual
statistical fit measures, especially in high-dimensional stepwise regression
settings, have not. Giving primacy to individual fit, as is done with p-values
and $R^2$, when group fit may be the larger concern, can lead to misguided
decision making. One of the most consequential uses of stepwise regression is
in health care, where these tools allocate hundreds of billions of dollars to
health plans enrolling individuals with different predicted health care costs.
The main goal of this "risk adjustment" system is to convey incentives to
health plans such that they provide health care services fairly, a component of
which is not to discriminate in access or care for persons or groups likely to
be expensive. We address some specific limitations of p-values and $R^2$ for
high-dimensional stepwise regression in this policy problem through an
illustrated example by additionally considering a group-level fairness metric.
arXiv link: http://arxiv.org/abs/1803.05513v2
Inference on a Distribution from Noisy Draws
estimated by the empirical distribution of noisy measurements of that variable.
This is common practice in, for example, teacher value-added models and other
fixed-effect models for panel data. We use an asymptotic embedding where the
noise shrinks with the sample size to calculate the leading bias in the
empirical distribution arising from the presence of noise. The leading bias in
the empirical quantile function is equally obtained. These calculations are new
in the literature, where only results on smooth functionals such as the mean
and variance have been derived. We provide both analytical and jackknife
corrections that recenter the limit distribution and yield confidence intervals
with correct coverage in large samples. Our approach can be connected to
corrections for selection bias and shrinkage estimation and is to be contrasted
with deconvolution. Simulation results confirm the much-improved sampling
behavior of the corrected estimators. An empirical illustration on
heterogeneity in deviations from the law of one price is equally provided.
arXiv link: http://arxiv.org/abs/1803.04991v5
How Smart Are `Water Smart Landscapes'?
conservation is crucially important for ensuring the security and reliability
of water services for urban residents. We analyze data from one of the
longest-running "cash for grass" policies - the Southern Nevada Water
Authority's Water Smart Landscapes program, where homeowners are paid to
replace grass with xeric landscaping. We use a twelve year long panel dataset
of monthly water consumption records for 300,000 households in Las Vegas,
Nevada. Utilizing a panel difference-in-differences approach, we estimate the
average water savings per square meter of turf removed. We find that
participation in this program reduced the average treated household's
consumption by 18 percent. We find no evidence that water savings degrade as
the landscape ages, or that water savings per unit area are influenced by the
value of the rebate. Depending on the assumed time horizon of benefits from
turf removal, we find that the WSL program cost the water authority about $1.62
per thousand gallons of water saved, which compares favorably to alternative
means of water conservation or supply augmentation.
arXiv link: http://arxiv.org/abs/1803.04593v1
A study of strategy to the remove and ease TBT for increasing export in GCC6 countries
Barriers(NTBs), meaning all trade barriers are possible other than Tariff
Barriers. And the most typical examples are (TBT), which refer to measure
Technical Regulation, Standards, Procedure for Conformity Assessment, Test &
Certification etc. Therefore, in order to eliminate TBT, WTO has made all
membership countries automatically enter into an agreement on TBT
arXiv link: http://arxiv.org/abs/1803.03394v3
Does the time horizon of the return predictive effect of investor sentiment vary with stock characteristics? A Granger causality analysis in the frequency domain
for stock returns, whereas there is little study have investigated the
relationship between the time horizon of the predictive effect of investor
sentiment and the firm characteristics. To this end, by using a Granger
causality analysis in the frequency domain proposed by Lemmens et al. (2008),
this paper examine whether the time horizon of the predictive effect of
investor sentiment on the U.S. returns of stocks vary with different firm
characteristics (e.g., firm size (Size), book-to-market equity (B/M) rate,
operating profitability (OP) and investment (Inv)). The empirical results
indicate that investor sentiment has a long-term (more than 12 months) or
short-term (less than 12 months) predictive effect on stock returns with
different firm characteristics. Specifically, the investor sentiment has strong
predictability in the stock returns for smaller Size stocks, lower B/M stocks
and lower OP stocks, both in the short term and long term, but only has a
short-term predictability for higher quantile ones. The investor sentiment
merely has predictability for the returns of smaller Inv stocks in the short
term, but has a strong short-term and long-term predictability for larger Inv
stocks. These results have important implications for the investors for the
planning of the short and the long run stock investment strategy.
arXiv link: http://arxiv.org/abs/1803.02962v1
A first look at browser-based Cryptojacking
cryptocurrencies; in particular, the mining of Monero through Coinhive and
similar code- bases. In this model, a user visiting a website will download a
JavaScript code that executes client-side in her browser, mines a
cryptocurrency, typically without her consent or knowledge, and pays out the
seigniorage to the website. Websites may consciously employ this as an
alternative or to supplement advertisement revenue, may offer premium content
in exchange for mining, or may be unwittingly serving the code as a result of a
breach (in which case the seigniorage is collected by the attacker). The
cryptocurrency Monero is preferred seemingly for its unfriendliness to
large-scale ASIC mining that would drive browser-based efforts out of the
market, as well as for its purported privacy features. In this paper, we survey
this landscape, conduct some measurements to establish its prevalence and
profitability, outline an ethical framework for considering whether it should
be classified as an attack or business opportunity, and make suggestions for
the detection, mitigation and/or prevention of browser-based mining for non-
consenting users.
arXiv link: http://arxiv.org/abs/1803.02887v1
Almost Sure Uniqueness of a Global Minimum Without Convexity
almost surely. This paper first formulates a general result that proves almost
sure uniqueness without convexity of the objective function. The general result
is then applied to a variety of applications in statistics. Four applications
are discussed, including uniqueness of M-estimators, both classical likelihood
and penalized likelihood estimators, and two applications of the argmin
theorem, threshold regression and weak identification.
arXiv link: http://arxiv.org/abs/1803.02415v3
A Nonparametric Approach to Measure the Heterogeneous Spatial Association: Under Spatial Temporal Data
about spatial analysis, geography, statistics and so on. Though large amounts
of outstanding methods has been proposed and studied, there are few of them
tend to study spatial association under heterogeneous environment.
Additionally, most of the traditional methods are based on distance statistic
and spatial weighted matrix. However, in some abstract spatial situations,
distance statistic can not be applied since we can not even observe the
geographical locations directly. Meanwhile, under these circumstances, due to
invisibility of spatial positions, designing of weight matrix can not
absolutely avoid subjectivity. In this paper, a new entropy-based method, which
is data-driven and distribution-free, has been proposed to help us investigate
spatial association while fully taking the fact that heterogeneity widely
exist. Specifically, this method is not bounded with distance statistic or
weight matrix. Asymmetrical dependence is adopted to reflect the heterogeneity
in spatial association for each individual and the whole discussion in this
paper is performed on spatio-temporal data with only assuming stationary
m-dependent over time.
arXiv link: http://arxiv.org/abs/1803.02334v2
An Online Algorithm for Learning Buyer Behavior under Realistic Pricing Restrictions
the purchasing behavior of a utility maximizing buyer, who responds to prices,
in a repeated interaction setting. The key feature of our algorithm is that it
can learn even non-linear buyer utility while working with arbitrary price
constraints that the seller may impose. This overcomes a major shortcoming of
previous approaches, which use unrealistic prices to learn these parameters
making them unsuitable in practice.
arXiv link: http://arxiv.org/abs/1803.01968v1
Testing a Goodwin model with general capital accumulation rate
accumulation rate is constant but not necessarily equal to one as in the
original model (Goodwin, 1967). In addition to this modification, we find that
addressing the methodological and reporting issues in Harvie (2000) leads to
remarkably better results, with near perfect agreement between the estimates of
equilibrium employment rates and the corresponding empirical averages, as well
as significantly improved estimates of equilibrium wage shares. Despite its
simplicity and obvious limitations, the performance of the modified Goodwin
model implied by our results show that it can be used as a starting point for
more sophisticated models for endogenous growth cycles.
arXiv link: http://arxiv.org/abs/1803.01536v1
Pricing Mechanism in Information Goods
participants in the data industry from the data supply chain perspective. A
win-win pricing strategy for the players in the data supply chain is proposed.
We obtain analytical solutions in each pricing mechanism, including the
decentralized and centralized pricing, Nash Bargaining pricing, and revenue
sharing mechanism.
arXiv link: http://arxiv.org/abs/1803.01530v1
A comment on 'Testing Goodwin: growth cycles in ten OECD countries'
reporting mistake in some of the estimated parameter values leads to
significantly different conclusions, including realistic parameter values for
the Philips curve and estimated equilibrium employment rates exhibiting on
average one tenth of the relative error of those obtained in Harvie (2000).
arXiv link: http://arxiv.org/abs/1803.01527v1
An Note on Why Geographically Weighted Regression Overcomes Multidimensional-Kernel-Based Varying-Coefficient Model
essentially same as varying-coefficient model. In the former research about
varying-coefficient model, scholars tend to use multidimensional-kernel-based
locally weighted estimation(MLWE) so that information of both distance and
direction is considered. However, when we construct the local weight matrix of
geographically weighted estimation, distance among the locations in the
neighbor is the only factor controlling the value of entries of weight matrix.
In other word, estimation of GWR is distance-kernel-based. Thus, in this paper,
under stationary and limited dependent data with multidimensional subscripts,
we analyze the local mean squared properties of without any assumption of the
form of coefficient functions and compare it with MLWE. According to the
theoretical and simulation results, geographically-weighted locally linear
estimation(GWLE) is asymptotically more efficient than MLWE. Furthermore, a
relationship between optimal bandwith selection and design of scale parameters
is also obtained.
arXiv link: http://arxiv.org/abs/1803.01402v2
Permutation Tests for Equality of Distributions of Functional Data
continuous time, though observations may occur only at discrete times. For
example, electricity and gas consumption take place in continuous time. Data
generated by a continuous time stochastic process are called functional data.
This paper is concerned with comparing two or more stochastic processes that
generate functional data. The data may be produced by a randomized experiment
in which there are multiple treatments. The paper presents a method for testing
the hypothesis that the same stochastic process generates all the functional
data. The test described here applies to both functional data and multiple
treatments. It is implemented as a combination of two permutation tests. This
ensures that in finite samples, the true and nominal probabilities that each
test rejects a correct null hypothesis are equal. The paper presents upper and
lower bounds on the asymptotic power of the test under alternative hypotheses.
The results of Monte Carlo experiments and an application to an experiment on
billing and pricing of natural gas illustrate the usefulness of the test.
arXiv link: http://arxiv.org/abs/1803.00798v4
Deep Learning for Causal Inference
specifically for causal inference and for estimating individual as well as
average treatment effects. The contribution of this paper is twofold: 1. For
generalized neighbor matching to estimate individual and average treatment
effects, we analyze the use of autoencoders for dimensionality reduction while
maintaining the local neighborhood structure among the data points in the
embedding space. This deep learning based technique is shown to perform better
than simple k nearest neighbor matching for estimating treatment effects,
especially when the data points have several features/covariates but reside in
a low dimensional manifold in high dimensional space. We also observe better
performance than manifold learning methods for neighbor matching. 2. Propensity
score matching is one specific and popular way to perform matching in order to
estimate average and individual treatment effects. We propose the use of deep
neural networks (DNNs) for propensity score matching, and present a network
called PropensityNet for this. This is a generalization of the logistic
regression technique traditionally used to estimate propensity scores and we
show empirically that DNNs perform better than logistic regression at
propensity score matching. Code for both methods will be made available shortly
on Github at: https://github.com/vikas84bf
arXiv link: http://arxiv.org/abs/1803.00149v1
Synthetic Control Methods and Big Data
framework, where the time series of a treated unit is compared to a
counterfactual constructed from a large pool of control units. I provide a
general framework for this setting, tailored to predict the counterfactual by
minimizing a tradeoff between underfitting (bias) and overfitting (variance).
The framework nests recently proposed structural and reduced form machine
learning approaches as special cases. Furthermore, difference-in-differences
with matching and the original synthetic control are restrictive cases of the
framework, in general not minimizing the bias-variance objective. Using
simulation studies I find that machine learning methods outperform traditional
methods when the number of potential controls is large or the treated unit is
substantially different from the controls. Equipped with a toolbox of
approaches, I revisit a study on the effect of economic liberalisation on
economic growth. I find effects for several countries where no effect was found
in the original study. Furthermore, I inspect how a systematically important
bank respond to increasing capital requirements by using a large pool of banks
to estimate the counterfactual. Finally, I assess the effect of a changing
product price on product sales using a novel scanner dataset.
arXiv link: http://arxiv.org/abs/1803.00096v1
Dimensional Analysis in Economics: A Study of the Neoclassical Economic Growth Model
basic principles of Dimensional Analysis in the context of the neoclassical
economic theory, in order to apply such principles to the fundamental relations
that underlay most models of economic growth. In particular, basic instruments
from Dimensional Analysis are used to evaluate the analytical consistency of
the Neoclassical economic growth model. The analysis shows that an adjustment
to the model is required in such a way that the principle of dimensional
homogeneity is satisfied.
arXiv link: http://arxiv.org/abs/1802.10528v1
Partial Identification of Expectations with Interval Data
when the conditioning variable is interval censored. When the number of bins is
small, existing methods often yield minimally informative bounds. We propose
three innovations that make meaningful inference possible in interval data
contexts. First, we prove novel nonparametric bounds for contexts where the
distribution of the censored variable is known. Second, we show that a class of
measures that describe the conditional mean across a fixed interval of the
conditioning space can often be bounded tightly even when the CEF itself
cannot. Third, we show that a constraint on CEF curvature can either tighten
bounds or can substitute for the monotonicity assumption often made in interval
data applications. We derive analytical bounds that use the first two
innovations, and develop a numerical method to calculate bounds under the
third. We show the performance of the method in simulations and then present
two applications. First, we resolve a known problem in the estimation of
mortality as a function of education: because individuals with high school or
less are a smaller and thus more negatively selected group over time, estimates
of their mortality change are likely to be biased. Our method makes it possible
to hold education rank bins constant over time, revealing that current
estimates of rising mortality for less educated women are biased upward in some
cases by a factor of three. Second, we apply the method to the estimation of
intergenerational mobility, where researchers frequently use coarsely measured
education data in the many contexts where matched parent-child income data are
unavailable. Conventional measures like the rank-rank correlation may be
uninformative once interval censoring is taken into account; CEF interval-based
measures of mobility are bounded tightly.
arXiv link: http://arxiv.org/abs/1802.10490v1
On the solution of the variational optimisation in the rational inattention framework
rational inattention framework proposed by Christopher A. Sims. The solution,
in general, does not exist, although it may exist in exceptional cases. I show
that the solution does not exist for the quadratic and the logarithmic
objective functions analysed by Sims (2003, 2006). For a linear-quadratic
objective function a solution can be constructed under restrictions on all but
one of its parameters. This approach is, therefore, unlikely to be applicable
to a wider set of economic models.
arXiv link: http://arxiv.org/abs/1802.09869v2
Identifying the occurrence or non occurrence of cognitive bias in situations resembling the Monty Hall problem
as Bertrand's box paradox and the Monty Hall problem. The practical
significance of that fact for economic decision making is uncertain because a
departure from sound reasoning may, but does not necessarily, result in a
"cognitively biased" outcome different from what sound reasoning would have
produced. Criteria are derived here, applicable to both experimental and
non-experimental situations, for heuristic reasoning in an inferential-puzzle
situations to result, or not to result, in cognitively bias. In some
situations, neither of these criteria is satisfied, and whether or not agents'
posterior probability assessments or choices are cognitively biased cannot be
determined.
arXiv link: http://arxiv.org/abs/1802.08935v1
Kernel Estimation for Panel Data with Heterogeneous Dynamics
to examine the degree of heterogeneity across cross-sectional units. We first
estimate the sample mean, autocovariances, and autocorrelations for each unit
and then apply kernel smoothing to compute their density functions. The
dependence of the kernel estimator on bandwidth makes asymptotic bias of very
high order affect the required condition on the relative magnitudes of the
cross-sectional sample size (N) and the time-series length (T). In particular,
it makes the condition on N and T stronger and more complicated than those
typically observed in the long-panel literature without kernel smoothing. We
also consider a split-panel jackknife method to correct bias and construction
of confidence intervals. An empirical application and Monte Carlo simulations
illustrate our procedure in finite samples.
arXiv link: http://arxiv.org/abs/1802.08825v4
Measuring the Demand Effects of Formal and Informal Communication : Evidence from Online Markets for Illicit Drugs
important influence on market demand. I find that consumer demand is
approximately equally influenced by communication on both formal and informal
networks- namely, product reviews and community forums. In addition, I find
empirical evidence of a vendor's ability to commit to disclosure dampening the
effect of communication on demand. I also find that product demand is more
responsive to average customer sentiment as the number of messages grows, as
may be expected in a Bayesian updating framework.
arXiv link: http://arxiv.org/abs/1802.08778v1
De-Biased Machine Learning of Global and Local Parameters Using Regularized Riesz Representers
regular (semi-parametric) and non-regular (nonparametric) linear functionals of
the conditional expectation function. Examples of regular functionals include
average treatment effects, policy effects, and derivatives. Examples of
non-regular functionals include average treatment effects, policy effects, and
derivatives conditional on a covariate subvector fixed at a point. We construct
a Neyman orthogonal equation for the target parameter that is approximately
invariant to small perturbations of the nuisance parameters. To achieve this
property, we include the Riesz representer for the functional as an additional
nuisance parameter. Our analysis yields weak “double sparsity robustness”:
either the approximation to the regression or the approximation to the
representer can be “completely dense” as long as the other is sufficiently
“sparse”. Our main results are non-asymptotic and imply asymptotic uniform
validity over large classes of models, translating into honest confidence bands
for both global and local parameters.
arXiv link: http://arxiv.org/abs/1802.08667v6
Algorithmic Collusion in Cournot Duopoly Market: Evidence from Experimental Economics
intelligence age. Whether algorithmic collusion is a creditable threat remains
as an argument. In this paper, we propose an algorithm which can extort its
human rival to collude in a Cournot duopoly competing market. In experiments,
we show that, the algorithm can successfully extorted its human rival and gets
higher profit in long run, meanwhile the human rival will fully collude with
the algorithm. As a result, the social welfare declines rapidly and stably.
Both in theory and in experiment, our work confirms that, algorithmic collusion
can be a creditable threat. In application, we hope, the frameworks, the
algorithm design as well as the experiment environment illustrated in this
work, can be an incubator or a test bed for researchers and policymakers to
handle the emerging algorithmic collusion.
arXiv link: http://arxiv.org/abs/1802.08061v1
The Security of the United Kingdom Electricity Imports under Conditions of High European Demand
supply, economic competitiveness and environmental sustainability, referred to
as the energy trilemma. Although there are clear conflicts within the trilemma,
member countries have acted to facilitate a fully integrated European
electricity market. Interconnection and cross-border electricity trade has been
a fundamental part of such market liberalisation. However, it has been
suggested that consumers are exposed to a higher price volatility as a
consequence of interconnection. Furthermore, during times of energy shortages
and high demand, issues of national sovereignty take precedence over
cooperation. In this article, the unique and somewhat peculiar conditions of
early 2017 within France, Germany and the United Kingdom have been studied to
understand how the existing integration arrangements address the energy
trilemma. It is concluded that the dominant interests are economic and national
security; issues of environmental sustainability are neglected or overridden.
Although the optimisation of European electricity generation to achieve a lower
overall carbon emission is possible, such a goal is far from being realised.
Furthermore, it is apparent that the United Kingdom, and other countries,
cannot rely upon imports from other countries during periods of high demand
and/or limited supply.
arXiv link: http://arxiv.org/abs/1802.07457v1
On the iterated estimation of dynamic discrete choice games
parameters in dynamic discrete choice games. We consider K-stage policy
iteration (PI) estimators, where K denotes the number of policy iterations
employed in the estimation. This class nests several estimators proposed in the
literature such as those in Aguirregabiria and Mira (2002, 2007), Pesendorfer
and Schmidt-Dengler (2008), and Pakes et al. (2007). First, we establish that
the K-PML estimator is consistent and asymptotically normal for all K. This
complements findings in Aguirregabiria and Mira (2007), who focus on K=1 and K
large enough to induce convergence of the estimator. Furthermore, we show under
certain conditions that the asymptotic variance of the K-PML estimator can
exhibit arbitrary patterns as a function of K. Second, we establish that the
K-MD estimator is consistent and asymptotically normal for all K. For a
specific weight matrix, the K-MD estimator has the same asymptotic distribution
as the K-PML estimator. Our main result provides an optimal sequence of weight
matrices for the K-MD estimator and shows that the optimally weighted K-MD
estimator has an asymptotic distribution that is invariant to K. The invariance
result is especially unexpected given the findings in Aguirregabiria and Mira
(2007) for K-PML estimators. Our main result implies two new corollaries about
the optimal 1-MD estimator (derived by Pesendorfer and Schmidt-Dengler (2008)).
First, the optimal 1-MD estimator is optimal in the class of K-MD estimators.
In other words, additional policy iterations do not provide asymptotic
efficiency gains relative to the optimal 1-MD estimator. Second, the optimal
1-MD estimator is more or equally asymptotically efficient than any K-PML
estimator for all K. Finally, the appendix provides appropriate conditions
under which the optimal 1-MD estimator is asymptotically efficient.
arXiv link: http://arxiv.org/abs/1802.06665v4
Achieving perfect coordination amongst agents in the co-action minority game
expected long-term payoff in the co-action minority game. We argue that the
agents will try to get into a cyclic state, where each of the $(2N +1)$ agent
wins exactly $N$ times in any continuous stretch of $(2N+1)$ days. We propose
and analyse a strategy for reaching such a cyclic state quickly, when any
direct communication between agents is not allowed, and only the publicly
available common information is the record of total number of people choosing
the first restaurant in the past. We determine exactly the average time
required to reach the periodic state for this strategy. We show that it varies
as $(N/\ln 2) [1 + \alpha \cos (2 \pi \log_2 N)$], for large $N$, where the
amplitude $\alpha$ of the leading term in the log-periodic oscillations is
found be $8 \pi^2{(\ln 2)^2} (- 2 \pi^2/\ln 2) \approx
{blue7 \times 10^{-11}}$.
arXiv link: http://arxiv.org/abs/1802.06770v2
The dynamic impact of monetary policy on regional housing prices in the US: Evidence based on factor-augmented vector autoregressions
housing prices to monetary policy shocks in the US. We address this issue by
analyzing monthly home price data for metropolitan regions using a
factor-augmented vector autoregression (FAVAR) model. Bayesian model estimation
is based on Gibbs sampling with Normal-Gamma shrinkage priors for the
autoregressive coefficients and factor loadings, while monetary policy shocks
are identified using high-frequency surprises around policy announcements as
external instruments. The empirical results indicate that monetary policy
actions typically have sizeable and significant positive effects on regional
housing prices, revealing differences in magnitude and duration. The largest
effects are observed in regions located in states on both the East and West
Coasts, notably California, Arizona and Florida.
arXiv link: http://arxiv.org/abs/1802.05870v1
Bootstrap-Assisted Unit Root Testing With Piecewise Locally Stationary Errors
accommodate nonstationary errors that can have both smooth and abrupt changes
in second- or higher-order properties. Under this framework, the limiting null
distributions of the conventional unit root test statistics are derived and
shown to contain a number of unknown parameters. To circumvent the difficulty
of direct consistent estimation, we propose to use the dependent wild bootstrap
to approximate the non-pivotal limiting null distributions and provide a
rigorous theoretical justification for bootstrap consistency. The proposed
method is compared through finite sample simulations with the recolored wild
bootstrap procedure, which was developed for errors that follow a
heteroscedastic linear process. Further, a combination of autoregressive sieve
recoloring with the dependent wild bootstrap is shown to perform well. The
validity of the dependent wild bootstrap in a nonstationary setting is
demonstrated for the first time, showing the possibility of extensions to other
inference problems associated with locally stationary processes.
arXiv link: http://arxiv.org/abs/1802.05333v1
Analysis of Financial Credit Risk Using Machine Learning
increasing number of companies making expansion overseas to capitalize on
foreign resources, a multinational corporate bankruptcy can disrupt the world's
financial ecosystem. Corporations do not fail instantaneously; objective
measures and rigorous analysis of qualitative (e.g. brand) and quantitative
(e.g. econometric factors) data can help identify a company's financial risk.
Gathering and storage of data about a corporation has become less difficult
with recent advancements in communication and information technologies. The
remaining challenge lies in mining relevant information about a company's
health hidden under the vast amounts of data, and using it to forecast
insolvency so that managers and stakeholders have time to react. In recent
years, machine learning has become a popular field in big data analytics
because of its success in learning complicated models. Methods such as support
vector machines, adaptive boosting, artificial neural networks, and Gaussian
processes can be used for recognizing patterns in the data (with a high degree
of accuracy) that may not be apparent to human analysts. This thesis studied
corporate bankruptcy of manufacturing companies in Korea and Poland using
experts' opinions and financial measures, respectively. Using publicly
available datasets, several machine learning methods were applied to learn the
relationship between the company's current state and its fate in the near
future. Results showed that predictions with accuracy greater than 95% were
achievable using any machine learning technique when informative features like
experts' assessment were used. However, when using purely financial factors to
predict whether or not a company will go bankrupt, the correlation is not as
strong.
arXiv link: http://arxiv.org/abs/1802.05326v1
A General Method for Demand Inversion
which equates the real market share to the market share predicted by a discrete
choice model. The method covers a general class of discrete choice model,
including the pure characteristics model in Berry and Pakes(2007) and the
random coefficient logit model in Berry et al.(1995) (hereafter BLP). The
method transforms the original market share inversion problem to an
unconstrained convex minimization problem, so that any convex programming
algorithm can be used to solve the inversion. Moreover, such results also imply
that the computational complexity of inverting a demand model should be no more
than that of a convex programming problem. In simulation examples, I show the
method outperforms the contraction mapping algorithm in BLP. I also find the
method remains robust in pure characteristics models with near-zero market
shares.
arXiv link: http://arxiv.org/abs/1802.04444v3
Structural Estimation of Behavioral Heterogeneity
with information friction. Profit-maximizing agents switch between trading
strategies in response to dynamic market conditions. Due to noisy private
information about the fundamental value, the agents form different evaluations
about heterogeneous strategies. We exploit a thin set---a small
sub-population---to pointly identify this nonlinear model, and estimate the
structural parameters using extended method of moments. Based on the estimated
parameters, the model produces return time series that emulate the moments of
the real data. These results are robust across different sample periods and
estimation methods.
arXiv link: http://arxiv.org/abs/1802.03735v2
A Time-Varying Network for Cryptocurrencies
yield information on risk propagation and market segmentation. To investigate
these effects, we build a time-varying network for cryptocurrencies, based on
the evolution of return cross-predictability and technological similarities. We
develop a dynamic covariate-assisted spectral clustering method to consistently
estimate the latent community structure of cryptocurrencies network that
accounts for both sets of information. We demonstrate that investors can
achieve better risk diversification by investing in cryptocurrencies from
different communities. A cross-sectional portfolio that implements an
inter-crypto momentum trading strategy earns a 1.08% daily return. By
dissecting the portfolio returns on behavioral factors, we confirm that our
results are not driven by behavioral mechanisms.
arXiv link: http://arxiv.org/abs/1802.03708v8
Long-Term Unemployed hirings: Should targeted or untargeted policies be preferred?
unemployed (i.e. long term unemployed) are more effective, with respect to
generalised incentives (without a definite target), to increase hirings of the
targeted group? Are generalized incentives able to influence hirings of the
vulnerable group? Do targeted policies have negative side effects too important
to accept them? Even though there is a huge literature on hiring subsidies,
these questions remained unresolved. We tried to answer them, comparing the
impact of two similar hiring policies, one oriented towards a target group and
one generalised, implemented on the italian labour market. We used
administrative data on job contracts, and counterfactual analysis methods. The
targeted policy had a positive and significant impact, while the generalized
policy didn't have a significant impact on the vulnerable group. Moreover, we
concluded the targeted policy didn't have any indirect negative side effect.
arXiv link: http://arxiv.org/abs/1802.03343v2
The Allen--Uzawa elasticity of substitution for nonhomogeneous production functions
substitution obtained by Uzawa for linear homogeneous functions holds true for
nonhomogeneous functions. It is shown that the criticism of the Allen-Uzawa
elasticity of substitution in the works of Blackorby, Primont, Russell is based
on an incorrect example.
arXiv link: http://arxiv.org/abs/1802.06885v1
Prediction of Shared Bicycle Demand with Wavelet Thresholding
other schedules. We analyzed one month of data from the world's largest
bike-sharing company to elicit demand behavioral cycles, initially using models
from animal tracking that showed large customers fit an Ornstein-Uhlenbeck
model with demand peaks at periodicities of 7, 12, 24 hour and 7-days. Lorenz
curves of bicycle demand showed that the majority of customer usage was
infrequent, and demand cycles from time-series models would strongly overfit
the data yielding unreliable models. Analysis of thresholded wavelets for the
space-time tensor of bike-sharing contracts was able to compress the data into
a 56-coefficient model with little loss of information, suggesting that
bike-sharing demand behavior is exceptionally strong and regular. Improvements
to predicted demand could be made by adjusting for 'noise' filtered by our
model from air quality and weather information and demand from infrequent
riders.
arXiv link: http://arxiv.org/abs/1802.02683v1
Random taste heterogeneity in discrete choice models: Flexible nonparametric finite mixture distributions
finite mixture distributions. The support of the distribution is specified as a
high-dimensional grid over the coefficient space, with equal or unequal
intervals between successive points along the same dimension; the location of
each point on the grid and the probability mass at that point are model
parameters that need to be estimated. The framework does not require the
analyst to specify the shape of the distribution prior to model estimation, but
can approximate any multivariate probability distribution function to any
arbitrary degree of accuracy. The grid with unequal intervals, in particular,
offers greater flexibility than existing multivariate nonparametric
specifications, while requiring the estimation of a small number of additional
parameters. An expectation maximization algorithm is developed for the
estimation of these models. Multiple synthetic datasets and a case study on
travel mode choice behavior are used to demonstrate the value of the model
framework and estimation algorithm. Compared to extant models that incorporate
random taste heterogeneity through continuous mixture distributions, the
proposed model provides better out-of-sample predictive ability. Findings
reveal significant differences in willingness to pay measures between the
proposed model and extant specifications. The case study further demonstrates
the ability of the proposed model to endogenously recover patterns of attribute
non-attendance and choice set formation.
arXiv link: http://arxiv.org/abs/1802.02299v1
Forecasting the impact of state pension reforms in post-Brexit England and Wales using microsimulation and deep learning
pension cost dependency ratio for England and Wales from 1991 to 2061,
evaluating the impact of the ongoing state pension reforms and changes in
international migration patterns under different Brexit scenarios. To fully
account for the recently observed volatility in life expectancies, we propose
mortality rate model based on deep learning techniques, which discovers complex
patterns in data and extrapolated trends. Our results show that the recent
reforms can effectively stave off the "pension crisis" and bring back the
system on a sounder fiscal footing. At the same time, increasingly more workers
can expect to spend greater share of their lifespan in retirement, despite the
eligibility age rises. The population ageing due to the observed postponement
of death until senectitude often occurs with the compression of morbidity, and
thus will not, perforce, intrinsically strain healthcare costs. To a lesser
degree, the future pension cost dependency ratio will depend on the post-Brexit
relations between the UK and the EU, with "soft" alignment on the free movement
lowering the relative cost of the pension system compared to the "hard" one. In
the long term, however, the ratio has a rising tendency.
arXiv link: http://arxiv.org/abs/1802.09427v2
An Experimental Investigation of Preference Misrepresentation in the Residency Match
truthful preference reporting is considered one of the major successes of
market design research. In this study, we test the degree to which these
procedures succeed in eliminating preference misrepresentation. We administered
an online experiment to 1,714 medical students immediately after their
participation in the medical residency match--a leading field application of
strategy-proof market design. When placed in an analogous, incentivized
matching task, we find that 23% of participants misrepresent their preferences.
We explore the factors that predict preference misrepresentation, including
cognitive ability, strategic positioning, overconfidence, expectations, advice,
and trust. We discuss the implications of this behavior for the design of
allocation mechanisms and the social welfare in markets that use them.
arXiv link: http://arxiv.org/abs/1802.01990v2
Voting patterns in 2016: Exploration using multilevel regression and poststratification (MRP) on pre-election polls
how different population groups voted in the 2012 and 2016 elections. We broke
the data down by demographics and state. We display our findings with a series
of graphs and maps. The R code associated with this project is available at
https://github.com/rtrangucci/mrp_2016_election/.
arXiv link: http://arxiv.org/abs/1802.00842v3
Structural analysis with mixed-frequency data: A MIDAS-SVAR model of US capital flows
The MIDAS-SVAR model allows to identify structural dynamic links exploiting the
information contained in variables sampled at different frequencies. It also
provides a general framework to test homogeneous frequency-based
representations versus mixed-frequency data models. A set of Monte Carlo
experiments suggests that the test performs well both in terms of size and
power. The MIDAS-SVAR is then used to study how monetary policy and financial
market volatility impact on the dynamics of gross capital inflows to the US.
While no relation is found when using standard quarterly data, exploiting the
variability present in the series within the quarter shows that the effect of
an interest rate shock is greater the longer the time lag between the month of
the shock and the end of the quarter
arXiv link: http://arxiv.org/abs/1802.00793v1
Are `Water Smart Landscapes' Contagious? An epidemic approach on networks to study peer effects
participation in an incentive based conservation program called `Water Smart
Landscapes' (WSL) in the city of Las Vegas, Nevada. We use 15 years of
geo-coded daily records of WSL program applications and approvals compiled by
the Southern Nevada Water Authority and Clark County Tax Assessors rolls for
home characteristics. We use this data to test whether a spatially mediated
peer effect can be observed in WSL participation likelihood at the household
level. We show that epidemic spreading models provide more flexibility in
modeling assumptions, and also provide one mechanism for addressing problems
associated with correlated unobservables than hazards models which can also be
applied to address the same questions. We build networks of neighborhood based
peers for 16 randomly selected neighborhoods in Las Vegas and test for the
existence of a peer based influence on WSL participation by using a
Susceptible-Exposed-Infected-Recovered epidemic spreading model (SEIR), in
which a home can become infected via autoinfection or through contagion from
its infected neighbors. We show that this type of epidemic model can be
directly recast to an additive-multiplicative hazard model, but not to purely
multiplicative one. Using both inference and prediction approaches we find
evidence of peer effects in several Las Vegas neighborhoods.
arXiv link: http://arxiv.org/abs/1801.10516v1
How Can We Induce More Women to Competitions?
to participate in it? In this paper, we investigate how social image concerns
affect women's decision to compete. We first construct a theoretical model and
show that participating in a competition, even under affirmative action
policies favoring women, is costly for women under public observability since
it deviates from traditional female gender norms, resulting in women's low
appearance in competitive environments. We propose and theoretically show that
introducing prosocial incentives in the competitive environment is effective
and robust to public observability since (i) it induces women who are
intrinsically motivated by prosocial incentives to the competitive environment
and (ii) it makes participating in a competition not costly for women from
social image point of view. We conduct a laboratory experiment where we
randomly manipulate the public observability of decisions to compete and test
our theoretical predictions. The results of the experiment are fairly
consistent with our theoretical predictions. We suggest that when designing
policies to promote gender equality in competitive environments, using
prosocial incentives through company philanthropy or other social
responsibility policies, either as substitutes or as complements to traditional
affirmative action policies, could be promising.
arXiv link: http://arxiv.org/abs/1801.10518v1
Nonseparable Sample Selection Models with Censored Selection Rules
models with censored selection rules. We employ a control function approach and
discuss different objects of interest based on (1) local effects conditional on
the control function, and (2) global effects obtained from integration over
ranges of values of the control function. We derive the conditions for the
identification of these different objects and suggest strategies for
estimation. Moreover, we provide the associated asymptotic theory. These
strategies are illustrated in an empirical investigation of the determinants of
female wages in the United Kingdom.
arXiv link: http://arxiv.org/abs/1801.08961v2
Ordered Kripke Model, Permissibility, and Convergence of Probabilistic Kripke Model
Kripke model, by introducing a linear order on the set of accessible states of
each state. We first show this model can be used to describe the lexicographic
belief hierarchy in epistemic game theory, and perfect rationalizability can be
characterized within this model. Then we show that each ordered Kripke model is
the limit of a sequence of standard probabilistic Kripke models with a modified
(common) belief operator, in the senses of structure and the
(epsilon-)permissibilities characterized within them.
arXiv link: http://arxiv.org/abs/1801.08767v1
Quantifying Health Shocks Over the Life Cycle
using the order two Markov chain model, rather than the standard order one
model, which is widely used in the literature. Markov chain of order two is the
minimal framework that is capable of distinguishing those who experience a
certain health expenditure level for the first time from those who have been
experiencing that or other levels for some time. In addition, using the model
we show (2) that the probability of encountering a health shock first de-
creases until around age 10, and then increases with age, particularly, after
age 40, (3) that health shock distributions among different age groups do not
differ until their percentiles reach the median range, but that above the
median the health shock distributions of older age groups gradually start to
first-order dominate those of younger groups, and (4) that the persistency of
health shocks also shows a U-shape in relation to age.
arXiv link: http://arxiv.org/abs/1801.08746v1
Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data
from a sample of several thousand anonymous mobile phone users in the San
Francisco Bay Area. The data is used to identify users' approximate typical
morning location, as well as their choices of lunchtime restaurants. We build a
model where restaurants have latent characteristics (whose distribution may
depend on restaurant observables, such as star ratings, food category, and
price range), each user has preferences for these latent characteristics, and
these preferences are heterogeneous across users. Similarly, each item has
latent characteristics that describe users' willingness to travel to the
restaurant, and each user has individual-specific preferences for those latent
characteristics. Thus, both users' willingness to travel and their base utility
for each restaurant vary across user-restaurant pairs. We use a Bayesian
approach to estimation. To make the estimation computationally feasible, we
rely on variational inference to approximate the posterior distribution, as
well as stochastic gradient descent as a computational approach. Our model
performs better than more standard competing models such as multinomial logit
and nested logit models, in part due to the personalization of the estimates.
We analyze how consumers re-allocate their demand after a restaurant closes to
nearby restaurants versus more distant restaurants with similar
characteristics, and we compare our predictions to actual outcomes. Finally, we
show how the model can be used to analyze counterfactual questions such as what
type of restaurant would attract the most consumers in a given location.
arXiv link: http://arxiv.org/abs/1801.07826v1
Accurate Evaluation of Asset Pricing Under Uncertainty and Ambiguity of Information
have become an attractive research area for investigating and modeling
ambiguous and uncertain information in today markets. This paper proposes a new
generative uncertainty mechanism based on the Bayesian Inference and
Correntropy (BIC) technique for accurately evaluating asset pricing in markets.
This technique examines the potential processes of risk, ambiguity, and
variations of market information in a controllable manner. We apply the new BIC
technique to a consumption asset-pricing model in which the consumption
variations are modeled using the Bayesian network model with observing the
dynamics of asset pricing phenomena in the data. These dynamics include the
procyclical deviations of price, the countercyclical deviations of equity
premia and equity volatility, the leverage impact and the mean reversion of
excess returns. The key findings reveal that the precise modeling of asset
information can estimate price changes in the market effectively.
arXiv link: http://arxiv.org/abs/1801.06966v2
Evolution of Regional Innovation with Spatial Knowledge Spillovers: Convergence or Divergence?
externality. We explores whether spatial knowledge spillovers among regions
exist, whether spatial knowledge spillovers promote regional innovative
activities, and whether external knowledge spillovers affect the evolution of
regional innovations in the long run. We empirically verify the theoretical
results through applying spatial statistics and econometric model in the
analysis of panel data of 31 regions in China. An accurate estimate of the
range of knowledge spillovers is achieved and the convergence of regional
knowledge growth rate is found, with clear evidences that developing regions
benefit more from external knowledge spillovers than developed regions.
arXiv link: http://arxiv.org/abs/1801.06936v3
Testing the Number of Regimes in Markov Regime Switching Models
in economics and finance. However, the asymptotic distribution of the
likelihood ratio test statistic for testing the number of regimes in Markov
regime switching models has been an unresolved problem. This paper derives the
asymptotic distribution of the likelihood ratio test statistic for testing the
null hypothesis of $M_0$ regimes against the alternative hypothesis of $M_0 +
1$ regimes for any $M_0 \geq 1$ both under the null hypothesis and under local
alternatives. We show that the contiguous alternatives converge to the null
hypothesis at a rate of $n^{-1/8}$ in regime switching models with normal
density. The asymptotic validity of the parametric bootstrap is also
established.
arXiv link: http://arxiv.org/abs/1801.06862v3
Nonfractional Memory: Filtering, Antipersistence, and Forecasting
to generate long memory due to the existence of efficient algorithms for their
simulation and forecasting. Nonetheless, there is no theoretical argument
linking the fractional difference operator with the presence of long memory in
real data. In this regard, one of the most predominant theoretical explanations
for the presence of long memory is cross-sectional aggregation of persistent
micro units. Yet, the type of processes obtained by cross-sectional aggregation
differs from the one due to fractional differencing. Thus, this paper develops
fast algorithms to generate and forecast long memory by cross-sectional
aggregation. Moreover, it is shown that the antipersistent phenomenon that
arises for negative degrees of memory in the fractional difference literature
is not present for cross-sectionally aggregated processes. Pointedly, while the
autocorrelations for the fractional difference operator are negative for
negative degrees of memory by construction, this restriction does not apply to
the cross-sectional aggregated scheme. We show that this has implications for
long memory tests in the frequency domain, which will be misspecified for
cross-sectionally aggregated processes with negative degrees of memory.
Finally, we assess the forecast performance of high-order $AR$ and $ARFIMA$
models when the long memory series are generated by cross-sectional
aggregation. Our results are of interest to practitioners developing forecasts
of long memory variables like inflation, volatility, and climate data, where
aggregation may be the source of long memory.
arXiv link: http://arxiv.org/abs/1801.06677v1
Capital Structure in U.S., a Quantile Regression Approach with Macroeconomic Impacts
empirical determinants of capital structure adjustment in different
macroeconomics states by focusing and discussing the relative importance of
firm-specific and macroeconomic characteristics from an alternative scope in
U.S. This study extends the empirical research on the topic of capital
structure by focusing on a quantile regression method to investigate the
behavior of firm-specific characteristics and macroeconomic variables across
all quantiles of distribution of leverage (total debt, long-terms debt and
short-terms debt). Thus, based on a partial adjustment model, we find that
long-term and short-term debt ratios varying regarding their partial adjustment
speeds; the short-term debt raises up while the long-term debt ratio slows down
for same periods.
arXiv link: http://arxiv.org/abs/1801.06651v1
USDA Forecasts: A meta-analysis study
groups of published studies. First, the ones that focus on the evaluation of
the United States Department of Agriculture (USDA) forecasts and second, the
ones that evaluate the market reactions to the USDA forecasts. We investigate
four questions. 1) How the studies evaluate the accuracy of the USDA forecasts?
2) How they evaluate the market reactions to the USDA forecasts? 3) Is there
any heterogeneity in the results of the mentioned studies? 4) Is there any
publication bias? About the first question, while some researchers argue that
the forecasts are unbiased, most of them maintain that they are biased,
inefficient, not optimal, or not rational. About the second question, while a
few studies claim that the forecasts are not newsworthy, most of them maintain
that they are newsworthy, provide useful information, and cause market
reactions. About the third and the fourth questions, based on our findings,
there are some clues that the results of the studies are heterogeneous, but we
didn't find enough evidences of publication bias.
arXiv link: http://arxiv.org/abs/1801.06575v1
Predicting crypto-currencies using sparse non-Gaussian state space models
variety of different econometric models. To capture salient features commonly
observed in financial time series like rapid changes in the conditional
variance, non-normality of the measurement errors and sharply increasing
trends, we develop a time-varying parameter VAR with t-distributed measurement
errors and stochastic volatility. To control for overparameterization, we rely
on the Bayesian literature on shrinkage priors that enables us to shrink
coefficients associated with irrelevant predictors and/or perform model
specification in a flexible manner. Using around one year of daily data we
perform a real-time forecasting exercise and investigate whether any of the
proposed models is able to outperform the naive random walk benchmark. To
assess the economic relevance of the forecasting gains produced by the proposed
models we moreover run a simple trading exercise.
arXiv link: http://arxiv.org/abs/1801.06373v2
A Dirichlet Process Mixture Model of Discrete Choice
truncated stick-breaking process representation of the Dirichlet process as a
flexible nonparametric mixing distribution. The proposed model is a Dirichlet
process mixture model and accommodates discrete representations of
heterogeneity, like a latent class MNL model. Yet, unlike a latent class MNL
model, the proposed discrete choice model does not require the analyst to fix
the number of mixture components prior to estimation, as the complexity of the
discrete mixing distribution is inferred from the evidence. For posterior
inference in the proposed Dirichlet process mixture model of discrete choice,
we derive an expectation maximisation algorithm. In a simulation study, we
demonstrate that the proposed model framework can flexibly capture
differently-shaped taste parameter distributions. Furthermore, we empirically
validate the model framework in a case study on motorists' route choice
preferences and find that the proposed Dirichlet process mixture model of
discrete choice outperforms a latent class MNL model and mixed MNL models with
common parametric mixing distributions in terms of both in-sample fit and
out-of-sample predictive ability. Compared to extant modelling approaches, the
proposed discrete choice model substantially abbreviates specification
searches, as it relies on less restrictive parametric assumptions and does not
require the analyst to specify the complexity of the discrete mixing
distribution prior to estimation.
arXiv link: http://arxiv.org/abs/1801.06296v1
Panel Data Quantile Regression with Grouped Fixed Effects
panel data quantile regression. We assume that the observed individuals come
from a heterogeneous population with a finite number of types. The number of
types and group membership is not assumed to be known in advance and is
estimated by means of a convex optimization problem. We provide conditions
under which group membership is estimated consistently and establish asymptotic
normality of the resulting estimators. Simulations show that the method works
well in finite samples when T is reasonably large. To illustrate the proposed
methodology we study the effects of the adoption of Right-to-Carry concealed
weapon laws on violent crime rates using panel data of 51 U.S. states from 1977
- 2010.
arXiv link: http://arxiv.org/abs/1801.05041v2
Characterizing Assumption of Rationality by Incomplete Information
incomplete information framework. We use the lexicographic model with
incomplete information and show that a belief hierarchy expresses common
assumption of rationality within a complete information framework if and only
if there is a belief hierarchy within the corresponding incomplete information
framework that expresses common full belief in caution, rationality, every good
choice is supported, and prior belief in the original utility functions.
arXiv link: http://arxiv.org/abs/1801.04714v1
Heterogeneous structural breaks in panel data models
allows us to identify heterogeneous structural breaks. We model individual
heterogeneity using a grouped pattern. For each group, we allow common
structural breaks in the coefficients. However, the number, timing, and size of
these breaks can differ across groups. We develop a hybrid estimation procedure
of the grouped fixed effects approach and adaptive group fused Lasso. We show
that our method can consistently identify the latent group structure, detect
structural breaks, and estimate the regression parameters. Monte Carlo results
demonstrate the good performance of the proposed method in finite samples. An
empirical application to the relationship between income and democracy
illustrates the importance of considering heterogeneous structural breaks.
arXiv link: http://arxiv.org/abs/1801.04672v2
Censored Quantile Instrumental Variable Estimation with Stata
independent variable. Chernozhukov et al. (2015) introduced a censored quantile
instrumental variable estimator (CQIV) for use in those applications, which has
been applied by Kowalski (2016), among others. In this article, we introduce a
Stata command, cqiv, that simplifes application of the CQIV estimator in Stata.
We summarize the CQIV estimator and algorithm, we describe the use of the cqiv
command, and we provide empirical examples.
arXiv link: http://arxiv.org/abs/1801.05305v3
Hyper-rational choice theory
pursue goals for increasing their personal interests. In most conditions, the
behavior of an actor is not independent of the person and others' behavior.
Here, we present a new concept of rational choice as a hyper-rational choice
which in this concept, the actor thinks about profit or loss of other actors in
addition to his personal profit or loss and then will choose an action which is
desirable to him. We implement the hyper-rational choice to generalize and
expand the game theory. Results of this study will help to model the behavior
of people considering environmental conditions, the kind of behavior
interactive, valuation system of itself and others and system of beliefs and
internal values of societies. Hyper-rationality helps us understand how human
decision makers behave in interactive decisions.
arXiv link: http://arxiv.org/abs/1801.10520v2
Solving Dynamic Discrete Choice Models: Integrated or Expected Value Function?
estimation literature. Since the structural errors are practically always
continuous and unbounded in nature, researchers often use the expected value
function. The idea to solve for the expected value function made solution more
practical and estimation feasible. However, as we show in this paper, the
expected value function is impractical compared to an alternative: the
integrated (ex ante) value function. We provide brief descriptions of the
inefficacy of the former, and benchmarks on actual problems with varying
cardinality of the state space and number of decisions. Though the two
approaches solve the same problem in theory, the benchmarks support the claim
that the integrated value function is preferred in practice.
arXiv link: http://arxiv.org/abs/1801.03978v1
Assessing the effect of advertising expenditures upon sales: a Bayesian structural time series model
Bayesian structural time series model to explain the relationship between
advertising expenditures of a country-wide fast-food franchise network with its
weekly sales. Thanks to the flexibility and modularity of the model, it is well
suited to generalization to other markets or situations. Its Bayesian nature
facilitates incorporating a priori information (the manager's views),
which can be updated with relevant data. This aspect of the model will be used
to present a strategy of budget scheduling across time and channels.
arXiv link: http://arxiv.org/abs/1801.03050v3
Implications of macroeconomic volatility in the Euro area
stochastic volatility in the error term to assess the effects of an uncertainty
shock in the Euro area. This allows us to treat macroeconomic uncertainty as a
latent quantity during estimation. Only a limited number of contributions to
the literature estimate uncertainty and its macroeconomic consequences jointly,
and most are based on single country models. We analyze the special case of a
shock restricted to the Euro area, where member states are highly related by
construction. We find significant results of a decrease in real activity for
all countries over a period of roughly a year following an uncertainty shock.
Moreover, equity prices, short-term interest rates and exports tend to decline,
while unemployment levels increase. Dynamic responses across countries differ
slightly in magnitude and duration, with Ireland, Slovakia and Greece
exhibiting different reactions for some macroeconomic fundamentals.
arXiv link: http://arxiv.org/abs/1801.02925v2
Dynamic Pricing and Energy Management Strategy for EV Charging Stations under Uncertainties
electric vehicle (EV) charging service providers. To set the charging prices,
the service providers faces three uncertainties: the volatility of wholesale
electricity price, intermittent renewable energy generation, and
spatial-temporal EV charging demand. The main objective of our work here is to
help charging service providers to improve their total profits while enhancing
customer satisfaction and maintaining power grid stability, taking into account
those uncertainties. We employ a linear regression model to estimate the EV
charging demand at each charging station, and introduce a quantitative measure
for customer satisfaction. Both the greedy algorithm and the dynamic
programming (DP) algorithm are employed to derive the optimal charging prices
and determine how much electricity to be purchased from the wholesale market in
each planning horizon. Simulation results show that DP algorithm achieves an
increased profit (up to 9%) compared to the greedy algorithm (the benchmark
algorithm) under certain scenarios. Additionally, we observe that the
integration of a low-cost energy storage into the system can not only improve
the profit, but also smooth out the charging price fluctuation, protecting the
end customers from the volatile wholesale market.
arXiv link: http://arxiv.org/abs/1801.02783v1
Revealed Price Preference: Theory and Empirical Analysis
introduce a revealed preference relation over prices. We show that the absence
of cycles in this relation characterizes a consumer who trades off the utility
of consumption against the disutility of expenditure. Our model can be applied
whenever a consumer's demand over a strict subset of all available goods is
being analyzed; it can also be extended to settings with discrete goods and
nonlinear prices. To illustrate its use, we apply our model to a single-agent
data set and to a data set with repeated cross-sections. We develop a novel
test of linear hypotheses on partially identified parameters to estimate the
proportion of the population who are revealed better off due to a price change
in the latter application. This new technique can be used for nonparametric
counterfactual analysis more broadly.
arXiv link: http://arxiv.org/abs/1801.02702v3
On a Constructive Theory of Markets
Gambling 'Theory' of Financial Markets for Practitioners and Theorists." It
presents important background for that article --- why gambling is important,
even necessary, for real-world traders --- the reason for the superiority of
the strategic/gambling approach to the competing market ideologies of market
fundamentalism and the scientific approach --- and its potential to uncover
profitable trading systems. Much of this article was drawn from Chapter 1 of
the book "The Strategic Analysis of Financial Markets (in 2 volumes)" World
Scientific, 2017.
arXiv link: http://arxiv.org/abs/1801.02994v1
A Consumer Behavior Based Approach to Multi-Stage EV Charging Station Placement
stations under the scenarios of different electric vehicle (EV) penetration
rates. The EV charging market is modeled as the oligopoly. A consumer behavior
based approach is applied to forecast the charging demand of the charging
stations using a nested logit model. The impacts of both the urban road network
and the power grid network on charging station planning are also considered. At
each planning stage, the optimal station placement strategy is derived through
solving a Bayesian game among the service providers. To investigate the
interplay of the travel pattern, the consumer behavior, urban road network,
power grid network, and the charging station placement, a simulation platform
(The EV Virtual City 1.0) is developed using Java on Repast.We conduct a case
study in the San Pedro District of Los Angeles by importing the geographic and
demographic data of that region into the platform. The simulation results
demonstrate a strong consistency between the charging station placement and the
traffic flow of EVs. The results also reveal an interesting phenomenon that
service providers prefer clustering instead of spatial separation in this
oligopoly market.
arXiv link: http://arxiv.org/abs/1801.02135v1
Placement of EV Charging Stations --- Balancing Benefits among Multiple Entities
(EV) charging stations with incremental EV penetration rates. A nested logit
model is employed to analyze the charging preference of the individual consumer
(EV owner), and predict the aggregated charging demand at the charging
stations. The EV charging industry is modeled as an oligopoly where the entire
market is dominated by a few charging service providers (oligopolists). At the
beginning of each planning stage, an optimal placement policy for each service
provider is obtained through analyzing strategic interactions in a Bayesian
game. To derive the optimal placement policy, we consider both the
transportation network graph and the electric power network graph. A simulation
software --- The EV Virtual City 1.0 --- is developed using Java to investigate
the interactions among the consumers (EV owner), the transportation network
graph, the electric power network graph, and the charging stations. Through a
series of experiments using the geographic and demographic data from the city
of San Pedro District of Los Angeles, we show that the charging station
placement is highly consistent with the heatmap of the traffic flow. In
addition, we observe a spatial economic phenomenon that service providers
prefer clustering instead of separation in the EV charging market.
arXiv link: http://arxiv.org/abs/1801.02129v1
Stochastic Dynamic Pricing for EV Charging Stations with Renewables Integration and Energy Storage
management policy for electric vehicle (EV) charging service providers. In the
presence of renewable energy integration and energy storage system, EV charging
service providers must deal with multiple uncertainties --- charging demand
volatility, inherent intermittency of renewable energy generation, and
wholesale electricity price fluctuation. The motivation behind our work is to
offer guidelines for charging service providers to determine proper charging
prices and manage electricity to balance the competing objectives of improving
profitability, enhancing customer satisfaction, and reducing impact on power
grid in spite of these uncertainties. We propose a new metric to assess the
impact on power grid without solving complete power flow equations. To protect
service providers from severe financial losses, a safeguard of profit is
incorporated in the model. Two algorithms --- stochastic dynamic programming
(SDP) algorithm and greedy algorithm (benchmark algorithm) --- are applied to
derive the pricing and electricity procurement policy. A Pareto front of the
multiobjective optimization is derived. Simulation results show that using SDP
algorithm can achieve up to 7% profit gain over using greedy algorithm.
Additionally, we observe that the charging service provider is able to reshape
spatial-temporal charging demands to reduce the impact on power grid via
pricing signals.
arXiv link: http://arxiv.org/abs/1801.02128v1
Why Markets are Inefficient: A Gambling "Theory" of Financial Markets For Practitioners and Theorists
Analysis of Financial Markets (SAFM) theory, that explains the operation of
financial markets using the analytical perspective of an enlightened gambler.
The gambler understands that all opportunities for superior performance arise
from suboptimal decisions by humans, but understands also that knowledge of
human decision making alone is not enough to understand market behavior --- one
must still model how those decisions lead to market prices. Thus are there
three parts to the model: gambling theory, human decision making, and strategic
problem solving. A new theory is necessary because at this writing in 2017,
there is no theory of financial markets acceptable to both practitioners and
theorists. Theorists' efficient market theory, for example, cannot explain
bubbles and crashes nor the exceptional returns of famous investors and
speculators such as Warren Buffett and George Soros. At the same time, a new
theory must be sufficiently quantitative, explain market "anomalies" and
provide predictions in order to satisfy theorists. It is hoped that the SAFM
framework will meet these requirements.
arXiv link: http://arxiv.org/abs/1801.01948v1
Does it Pay to Buy the Pot in the Canadian 6/49 Lotto? Implications for Lottery Design
few government sponsored lotteries that has the potential for a favorable
strategy we call "buying the pot." By buying the pot we mean that a syndicate
buys each ticket in the lottery, ensuring that it holds a jackpot winner. We
assume that the other bettors independently buy small numbers of tickets. This
paper presents (1) a formula for the syndicate's expected return, (2)
conditions under which buying the pot produces a significant positive expected
return, and (3) the implications of these findings for lottery design.
arXiv link: http://arxiv.org/abs/1801.02959v1
A Method for Winning at Lotteries
purely mechanical strategy to achieve expected returns of 10% to 25% in an
equiprobable lottery with no take and no carryover pool. We prove that an
optimal strategy (Nash equilibrium) in a game between the syndicate and other
players consists of betting one of each ticket (the "trump ticket"), and extend
that result to proportional ticket selection in non-equiprobable lotteries. The
strategy can be adjusted to accommodate lottery taxes and carryover pools. No
"irrationality" need be involved for the strategy to succeed --- it requires
only that a large group of non-syndicate bettors each choose a few tickets
independently.
arXiv link: http://arxiv.org/abs/1801.02958v1
SABCEMM-A Simulator for Agent-Based Computational Economic Market Models
Computational Economic Market Models) for agent-based computational economic
market (ABCEM) models. Our simulation tool is implemented in C++ and we can
easily run ABCEM models with several million agents. The object-oriented
software design enables the isolated implementation of building blocks for
ABCEM models, such as agent types and market mechanisms. The user can design
and compare ABCEM models in a unified environment by recombining existing
building blocks using the XML-based SABCEMM configuration file. We introduce an
abstract ABCEM model class which our simulation tool is built upon.
Furthermore, we present the software architecture as well as computational
aspects of SABCEMM. Here, we focus on the efficiency of SABCEMM with respect to
the run time of our simulations. We show the great impact of different random
number generators on the run time of ABCEM models. The code and documentation
is published on GitHub at https://github.com/SABCEMM/SABCEMM, such that all
results can be reproduced by the reader.
arXiv link: http://arxiv.org/abs/1801.01811v2
Dynamic and granular loss reserving with copulae
the past years. To meet all future claims rising from policies, it is requisite
to quantify the outstanding loss liabilities. Loss reserving methods based on
aggregated data from run-off triangles are predominantly used to calculate the
claims reserves. Conventional reserving techniques have some disadvantages:
loss of information from the policy and the claim's development due to the
aggregation, zero or negative cells in the triangle; usually small number of
observations in the triangle; only few observations for recent accident years;
and sensitivity to the most recent paid claims.
To overcome these dilemmas, granular loss reserving methods for individual
claim-by-claim data will be derived. Reserves' estimation is a crucial part of
the risk valuation process, which is now a front burner in economics. Since
there is a growing demand for prediction of total reserves for different types
of claims or even multiple lines of business, a time-varying copula framework
for granular reserving will be established.
arXiv link: http://arxiv.org/abs/1801.01792v1
Comparing the Forecasting Performances of Linear Models for Electricity Prices with High RES Penetration
frequentist versus Bayesian autoregressive and vector autoregressive
specifications, for hourly day-ahead electricity prices, both with and without
renewable energy sources. The accuracy of point and density forecasts are
inspected in four main European markets (Germany, Denmark, Italy and Spain)
characterized by different levels of renewable energy power generation. Our
results show that the Bayesian VAR specifications with exogenous variables
dominate other multivariate and univariate specifications, in terms of both
point and density forecasting.
arXiv link: http://arxiv.org/abs/1801.01093v3
A New Wald Test for Hypothesis Testing Based on MCMC outputs
is proposed for hypothesis testing. The new statistic can be explained as MCMC
version of Wald test and has several important advantages that make it very
convenient in practical applications. First, it is well-defined under improper
prior distributions and avoids Jeffrey-Lindley's paradox. Second, it's
asymptotic distribution can be proved to follow the $\chi^2$ distribution so
that the threshold values can be easily calibrated from this distribution.
Third, it's statistical error can be derived using the Markov chain Monte Carlo
(MCMC) approach. Fourth, most importantly, it is only based on the posterior
MCMC random samples drawn from the posterior distribution. Hence, it is only
the by-product of the posterior outputs and very easy to compute. In addition,
when the prior information is available, the finite sample theory is derived
for the proposed test statistic. At last, the usefulness of the test is
illustrated with several applications to latent variable models widely used in
economics and finance.
arXiv link: http://arxiv.org/abs/1801.00973v1
Complexity Theory, Game Theory, and Economics: The Barbados Lectures
Theory, Game Theory, and Economics," taught at the Bellairs Research Institute
of McGill University, Holetown, Barbados, February 19--23, 2017, as the 29th
McGill Invitational Workshop on Computational Complexity.
The goal of this mini-course is twofold: (i) to explain how complexity theory
has helped illuminate several barriers in economics and game theory; and (ii)
to illustrate how game-theoretic questions have led to new and interesting
complexity theory, including recent several breakthroughs. It consists of two
five-lecture sequences: the Solar Lectures, focusing on the communication and
computational complexity of computing equilibria; and the Lunar Lectures,
focusing on applications of complexity theory in game theory and economics. No
background in game theory is assumed.
arXiv link: http://arxiv.org/abs/1801.00734v3
Resource Abundance and Life Expectancy
since 1960 on life expectancy in the nations that they were resource poor prior
to the discoveries. Previous literature explains the relation between nations
wealth and life expectancy, but it has been silent about the impacts of
resource discoveries on life expectancy. We attempt to fill this gap in this
study. An important advantage of this study is that as the previous researchers
argued resource discovery could be an exogenous variable. We use longitudinal
data from 1960 to 2014 and we apply three modern empirical methods including
Difference-in-Differences, Event studies, and Synthetic Control approach, to
investigate the main question of the research which is 'how resource
discoveries affect life expectancy?'. The findings show that resource
discoveries in Ecuador, Yemen, Oman, and Equatorial Guinea have positive and
significant impacts on life expectancy, but the effects for the European
countries are mostly negative.
arXiv link: http://arxiv.org/abs/1801.00369v1
Estimation and Inference of Treatment Effects with $L_2$-Boosting in High-Dimensional Settings
many controls or instrumental variables, making it essential to choose an
appropriate approach to variable selection. In this paper, we provide results
for valid inference after post- or orthogonal $L_2$-Boosting is used for
variable selection. We consider treatment effects after selecting among many
control variables and instrumental variable models with potentially many
instruments. To achieve this, we establish new results for the rate of
convergence of iterated post-$L_2$-Boosting and orthogonal $L_2$-Boosting in a
high-dimensional setting similar to Lasso, i.e., under approximate sparsity
without assuming the beta-min condition. These results are extended to the 2SLS
framework and valid inference is provided for treatment effect analysis. We
give extensive simulation results for the proposed methods and compare them
with Lasso. In an empirical application, we construct efficient IVs with our
proposed methods to estimate the effect of pre-merger overlap of bank branch
networks in the US on the post-merger stock returns of the acquirer bank.
arXiv link: http://arxiv.org/abs/1801.00364v2
Confidence set for group membership
group assignments in grouped panel models. It covers the true group memberships
jointly for all units with pre-specified probability and is constructed by
inverting many simultaneous unit-specific one-sided tests for group membership.
We justify our approach under $N, T \to \infty$ asymptotics using tools from
high-dimensional statistics, some of which we extend in this paper. We provide
Monte Carlo evidence that the confidence set has adequate coverage in finite
samples.An empirical application illustrates the use of our confidence set.
arXiv link: http://arxiv.org/abs/1801.00332v6
Debiased Machine Learning of Set-Identified Linear Models
boundary (i.e., support function) where the selection among a very large number
of covariates is based on modern regularized tools. I characterize the boundary
using a semiparametric moment equation. Combining Neyman-orthogonality and
sample splitting ideas, I construct a root-N consistent, uniformly
asymptotically Gaussian estimator of the boundary and propose a multiplier
bootstrap procedure to conduct inference. I apply this result to the partially
linear model, the partially linear IV model and the average partial derivative
with an interval-valued outcome.
arXiv link: http://arxiv.org/abs/1712.10024v6
Variational Bayes Estimation of Discrete-Margined Copula Models with Application to Time Series
with discrete, or a combination of discrete and continuous, margins. The method
is based on a variational approximation to a tractable augmented posterior, and
is faster than previous likelihood-based approaches. We use it to estimate
drawable vine copulas for univariate and multivariate Markov ordinal and mixed
time series. These have dimension $rT$, where $T$ is the number of observations
and $r$ is the number of series, and are difficult to estimate using previous
methods. The vine pair-copulas are carefully selected to allow for
heteroskedasticity, which is a feature of most ordinal time series data. When
combined with flexible margins, the resulting time series models also allow for
other common features of ordinal data, such as zero inflation, multiple modes
and under- or over-dispersion. Using six example series, we illustrate both the
flexibility of the time series copula models, and the efficacy of the
variational Bayes estimator for copulas of up to 792 dimensions and 60
parameters. This far exceeds the size and complexity of copula models for
discrete data that can be estimated using previous methods.
arXiv link: http://arxiv.org/abs/1712.09150v2
An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls
control methods for policy evaluation. We recast the causal inference problem
as a counterfactual prediction and a structural breaks testing problem. This
allows us to exploit insights from conformal prediction and structural breaks
testing to develop permutation inference procedures that accommodate modern
high-dimensional estimators, are valid under weak and easy-to-verify
conditions, and are provably robust against misspecification. Our methods work
in conjunction with many different approaches for predicting counterfactual
mean outcomes in the absence of the policy intervention. Examples include
synthetic controls, difference-in-differences, factor and matrix completion
models, and (fused) time series panel data models. Our approach demonstrates an
excellent small-sample performance in simulations and is taken to a data
application where we re-evaluate the consequences of decriminalizing indoor
prostitution. Open-source software for implementing our conformal inference
methods is available.
arXiv link: http://arxiv.org/abs/1712.09089v10
Simultaneous Confidence Intervals for High-dimensional Linear Models with Many Endogenous Variables
important role in recent econometric literature. In this work we allow for
models with many endogenous variables and many instrument variables to achieve
identification. Because of the high-dimensionality in the second stage,
constructing honest confidence regions with asymptotically correct coverage is
non-trivial. Our main contribution is to propose estimators and confidence
regions that would achieve that. The approach relies on moment conditions that
have an additional orthogonal property with respect to nuisance parameters.
Moreover, estimation of high-dimension nuisance parameters is carried out via
new pivotal procedures. In order to achieve simultaneously valid confidence
regions we use a multiplier bootstrap procedure to compute critical values and
establish its validity.
arXiv link: http://arxiv.org/abs/1712.08102v4
On Long Memory Origins and Forecast Horizons
the fractional difference operator. We argue that the most cited theoretical
arguments for the presence of long memory do not imply the fractional
difference operator, and assess the performance of the autoregressive
fractionally integrated moving average $(ARFIMA)$ model when forecasting series
with long memory generated by nonfractional processes. We find that high-order
autoregressive $(AR)$ models produce similar or superior forecast performance
than $ARFIMA$ models at short horizons. Nonetheless, as the forecast horizon
increases, the $ARFIMA$ models tend to dominate in forecast performance. Hence,
$ARFIMA$ models are well suited for forecasts of long memory processes
regardless of the long memory generating mechanism, particularly for medium and
long forecast horizons. Additionally, we analyse the forecasting performance of
the heterogeneous autoregressive ($HAR$) model which imposes restrictions on
high-order $AR$ models. We find that the structure imposed by the $HAR$ model
produces better long horizon forecasts than $AR$ models of the same order, at
the price of inferior short horizon forecasts in some cases. Our results have
implications for, among others, Climate Econometrics and Financial Econometrics
models dealing with long memory series at different forecast horizons. We show
in an example that while a short memory autoregressive moving average $(ARMA)$
model gives the best performance when forecasting the Realized Variance of the
S&P 500 up to a month ahead, the $ARFIMA$ model gives the best performance for
longer forecast horizons.
arXiv link: http://arxiv.org/abs/1712.08057v1
Cointegration in functional autoregressive processes
processes with a unit root of finite type, where $H$ is an infinite
dimensional separable Hilbert space, and derives a generalization of the
Granger-Johansen Representation Theorem valid for any integration order
$d=1,2,\dots$. An existence theorem shows that the solution of an AR with a
unit root of finite type is necessarily integrated of some finite integer $d$
and displays a common trends representation with a finite number of common
stochastic trends of the type of (cumulated) bilateral random walks and an
infinite dimensional cointegrating space. A characterization theorem clarifies
the connections between the structure of the AR operators and $(i)$ the order
of integration, $(ii)$ the structure of the attractor space and the
cointegrating space, $(iii)$ the expression of the cointegrating relations, and
$(iv)$ the Triangular representation of the process. Except for the fact that
the number of cointegrating relations that are integrated of order 0 is
infinite, the representation of $H$-valued ARs with a unit root of
finite type coincides with that of usual finite dimensional VARs, which
corresponds to the special case $H=R^p$.
arXiv link: http://arxiv.org/abs/1712.07522v2
Transformation Models in High-Dimensions
econometricians. In many applications, the dependent variable is transformed so
that homogeneity or normal distribution of the error holds. In this paper, we
analyze transformation models in a high-dimensional setting, where the set of
potential covariates is large. We propose an estimator for the transformation
parameter and we show that it is asymptotically normally distributed using an
orthogonalized moment condition where the nuisance functions depend on the
target parameter. In a simulation study, we show that the proposed estimator
works well in small samples. A common practice in labor economics is to
transform wage with the log-function. In this study, we test if this
transformation holds in CPS data from the United States.
arXiv link: http://arxiv.org/abs/1712.07364v1
Towards a General Large Sample Theory for Regularized Estimators
estimators are pervasive in estimation problems wherein "plug-in" type
estimators are either ill-defined or ill-behaved. Within this framework, we
derive, under primitive conditions, consistency and a generalization of the
asymptotic linearity property. We also provide data-driven methods for choosing
tuning parameters that, under some conditions, achieve the aforementioned
properties. We illustrate the scope of our approach by presenting a wide range
of applications.
arXiv link: http://arxiv.org/abs/1712.07248v4
Assessment Voting in Large Electorates
applied to binary decisions in democratic societies. In the first round, a
randomly-selected number of citizens cast their vote on one of the two
alternatives at hand, thereby irrevocably exercising their right to vote. In
the second round, after the results of the first round have been published, the
remaining citizens decide whether to vote for one alternative or to ab- stain.
The votes from both rounds are aggregated, and the final outcome is obtained by
applying the majority rule, with ties being broken by fair randomization.
Within a costly voting framework, we show that large elec- torates will choose
the preferred alternative of the majority with high prob- ability, and that
average costs will be low. This result is in contrast with the literature on
one-round voting, which predicts either higher voting costs (when voting is
compulsory) or decisions that often do not represent the preferences of the
majority (when voting is voluntary).
arXiv link: http://arxiv.org/abs/1712.05470v2
Quasi-Oracle Estimation of Heterogeneous Treatment Effects
many statistical challenges, such as personalized medicine and optimal resource
allocation. In this paper, we develop a general class of two-step algorithms
for heterogeneous treatment effect estimation in observational studies. We
first estimate marginal effects and treatment propensities in order to form an
objective function that isolates the causal component of the signal. Then, we
optimize this data-adaptive objective function. Our approach has several
advantages over existing methods. From a practical perspective, our method is
flexible and easy to use: In both steps, we can use any loss-minimization
method, e.g., penalized regression, deep neural networks, or boosting;
moreover, these methods can be fine-tuned by cross validation. Meanwhile, in
the case of penalized kernel regression, we show that our method has a
quasi-oracle property: Even if the pilot estimates for marginal effects and
treatment propensities are not particularly accurate, we achieve the same error
bounds as an oracle who has a priori knowledge of these two nuisance
components. We implement variants of our approach based on penalized
regression, kernel ridge regression, and boosting in a variety of simulation
setups, and find promising performance relative to existing baselines.
arXiv link: http://arxiv.org/abs/1712.04912v4
Fisher-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India
heterogeneous effects in randomized experiments. These key features include
best linear predictors of the effects using machine learning proxies, average
effects sorted by impact groups, and average characteristics of most and least
impacted units. The approach is valid in high dimensional settings, where the
effects are proxied (but not necessarily consistently estimated) by predictive
and causal machine learning methods. We post-process these proxies into
estimates of the key features. Our approach is generic, it can be used in
conjunction with penalized methods, neural networks, random forests, boosted
trees, and ensemble methods, both predictive and causal. Estimation and
inference are based on repeated data splitting to avoid overfitting and achieve
validity. We use quantile aggregation of the results across many potential
splits, in particular taking medians of p-values and medians and other
quantiles of confidence intervals. We show that quantile aggregation lowers
estimation risks over a single split procedure, and establish its principal
inferential properties. Finally, our analysis reveals ways to build provably
better machine learning proxies through causal learning: we can use the
objective functions that we develop to construct the best linear predictors of
the effects, to obtain better machine learning proxies in the initial step. We
illustrate the use of both inferential tools and causal learners with a
randomized field experiment that evaluates a combination of nudges to stimulate
demand for immunization in India.
arXiv link: http://arxiv.org/abs/1712.04802v8
Finite-Sample Optimal Estimation and Inference on Average Treatment Effects Under Unconfoundedness
unconfoundedness conditional on the realizations of the treatment variable and
covariates. Given nonparametric smoothness and/or shape restrictions on the
conditional mean of the outcome variable, we derive estimators and confidence
intervals (CIs) that are optimal in finite samples when the regression errors
are normal with known variance. In contrast to conventional CIs, our CIs use a
larger critical value that explicitly takes into account the potential bias of
the estimator. When the error distribution is unknown, feasible versions of our
CIs are valid asymptotically, even when $n$-inference is not possible
due to lack of overlap, or low smoothness of the conditional mean. We also
derive the minimum smoothness conditions on the conditional mean that are
necessary for $n$-inference. When the conditional mean is restricted to
be Lipschitz with a large enough bound on the Lipschitz constant, the optimal
estimator reduces to a matching estimator with the number of matches set to
one. We illustrate our methods in an application to the National Supported Work
Demonstration.
arXiv link: http://arxiv.org/abs/1712.04594v5
The Calculus of Democratization and Development
Answers", we seek to carry out a study in following the description in the
'Questions for Further Study.' To that end, we studied 33 countries in the
Sub-Saharan Africa region, who all went through an election which should signal
a "step-up" for their democracy, one in which previously homogenous regimes
transfer power to an opposition party that fairly won the election. After doing
so, liberal-democracy indicators and democracy indicators were evaluated in the
five years prior to and after the election took place, and over that ten-year
period, we examine the data for trends. If we see positive or negative trends
over this time horizon, we are able to conclude that it was the recent increase
in the quality of their democracy which led to it. Having investigated examples
of this in depth, there seem to be three main archetypes which drive the
results. Countries with positive results to their democracy from the election
have generally positive effects on their development, countries with more
"plateau" like results also did well, but countries for whom the descent to
authoritarianism was continued by this election found more negative results.
arXiv link: http://arxiv.org/abs/1712.04117v1
Set Identified Dynamic Economies and Robustness to Misspecification
to misspecification of the mechanism generating frictions. Economies with
frictions are treated as perturbations of a frictionless economy that are
consistent with a variety of mechanisms. We derive a representation for the law
of motion for such economies and we characterize parameter set identification.
We derive a link from model aggregate predictions to distributional information
contained in qualitative survey data and specify conditions under which the
identified set is refined. The latter is used to semi-parametrically estimate
distortions due to frictions in macroeconomic variables. Based on these
estimates, we propose a novel test for complete models. Using consumer and
business survey data collected by the European Commission, we apply our method
to estimate distortions due to financial frictions in the Spanish economy. We
investigate the implications of these estimates for the adequacy of the
standard model of financial frictions SW-BGG (Smets and Wouters (2007),
Bernanke, Gertler, and Gilchrist (1999)).
arXiv link: http://arxiv.org/abs/1712.03675v2
RNN-based counterfactual prediction, with an application to homestead policy and public schooling
intervention on an outcome over time. We train recurrent neural networks (RNNs)
on the history of control unit outcomes to learn a useful representation for
predicting future outcomes. The learned representation of control units is then
applied to the treated units for predicting counterfactual outcomes. RNNs are
specifically structured to exploit temporal dependencies in panel data, and are
able to learn negative and nonlinear interactions between control unit
outcomes. We apply the method to the problem of estimating the long-run impact
of U.S. homestead policy on public school spending.
arXiv link: http://arxiv.org/abs/1712.03553v7
A Random Attention Model
when attention is not only limited but also random. In contrast to earlier
approaches, we introduce a Random Attention Model (RAM) where we abstain from
any particular attention formation, and instead consider a large class of
nonparametric random attention rules. Our model imposes one intuitive
condition, termed Monotonic Attention, which captures the idea that each
consideration set competes for the decision-maker's attention. We then develop
revealed preference theory within RAM and obtain precise testable implications
for observable choice probabilities. Based on these theoretical findings, we
propose econometric methods for identification, estimation, and inference of
the decision maker's preferences. To illustrate the applicability of our
results and their concrete empirical content in specific settings, we also
develop revealed preference theory and accompanying econometric methods under
additional nonparametric assumptions on the consideration set for binary choice
problems. Finally, we provide general purpose software implementation of our
estimation and inference results, and showcase their performance using
simulations.
arXiv link: http://arxiv.org/abs/1712.03448v3
Aggregating Google Trends: Multivariate Testing and Analysis
Previous studies have utilized Google Trends web search data for economic
forecasting. We expand this work by providing algorithms to combine and
aggregate search volume data, so that the resulting data is both consistent
over time and consistent between data series. We give a brand equity example,
where Google Trends is used to analyze shopping data for 100 top ranked brands
and these data are used to nowcast economic variables. We describe the
importance of out of sample prediction and show how principal component
analysis (PCA) can be used to improve the signal to noise ratio and prevent
overfitting in nowcasting models. We give a finance example, where exploratory
data analysis and classification is used to analyze the relationship between
Google Trends searches and stock prices.
arXiv link: http://arxiv.org/abs/1712.03152v2
On Metropolis Growth
consumed energy of metropolis cities which are hybrid complex systems
comprising social networks, engineering systems, agricultural output, economic
activity and energy components. We abstract a city in terms of two fundamental
variables; $s$ resource cells (of unit area) that represent energy-consuming
geographic or spatial zones (e.g. land, housing or infrastructure etc.) and a
population comprising $n$ mobile units that can migrate between these cells. We
show that with a constant metropolis area (fixed $s$), the variance and entropy
of consumed energy initially increase with $n$, reach a maximum and then
eventually diminish to zero as saturation is reached. These metrics are
indicators of the spatial mobility of the population. Under certain situations,
the variance is bounded as a quadratic function of the mean consumed energy of
the metropolis. However, when population and metropolis area are endogenous,
growth in the latter is arrested when $n\leqs{2}\log(s)$ due to
diminished population density. Conversely, the population growth reaches
equilibrium when $n\geq {s}n$ or equivalently when the aggregate of both
over-populated and under-populated areas is large. Moreover, we also draw the
relationship between our approach and multi-scalar information, when economic
dependency between a metropolis's sub-regions is based on the entropy of
consumed energy. Finally, if the city's economic size (domestic product etc.)
is proportional to the consumed energy, then for a constant population density,
we show that the economy scales linearly with the surface area (or $s$).
arXiv link: http://arxiv.org/abs/1712.02937v2
Online Red Packets: A Large-scale Empirical Study of Gift Giving on WeChat
as monetary gifts in Asian countries for thousands of years. In recent years,
online red packets have become widespread in China through the WeChat platform.
Exploiting a unique dataset consisting of 61 million group red packets and
seven million users, we conduct a large-scale, data-driven study to understand
the spread of red packets and the effect of red packets on group activity. We
find that the cash flows between provinces are largely consistent with
provincial GDP rankings, e.g., red packets are sent from users in the south to
those in the north. By distinguishing spontaneous from reciprocal red packets,
we reveal the behavioral patterns in sending red packets: males, seniors, and
people with more in-group friends are more inclined to spontaneously send red
packets, while red packets from females, youths, and people with less in-group
friends are more reciprocal. Furthermore, we use propensity score matching to
study the external effects of red packets on group dynamics. We show that red
packets increase group participation and strengthen in-group relationships,
which partly explain the benefits and motivations for sending red packets.
arXiv link: http://arxiv.org/abs/1712.02926v1
On monitoring development indicators using high resolution satellite images
socio-economic indicators from daytime satellite imagery. The diverse set of
indicators are often not intuitively related to observable features in
satellite images, and are not even always well correlated with each other. Our
predictive tool is more accurate than using night light as a proxy, and can be
used to predict missing data, smooth out noise in surveys, monitor development
progress of a region, and flag potential anomalies. Finally, we use predicted
variables to do robustness analysis of a regression study of high rate of
stunting in India.
arXiv link: http://arxiv.org/abs/1712.02282v3
Determination of Pareto exponents in economic models driven by Markov multiplicative processes
distribution of sizes in a dynamic economic system in which units experience
random multiplicative shocks and are occasionally reset. Each unit has a
Markov-switching type which influences their growth rate and reset probability.
We show that the size distribution has a Pareto upper tail, with exponent equal
to the unique positive solution to an equation involving the spectral radius of
a certain matrix-valued function. Under a non-lattice condition on growth
rates, an eigenvector associated with the Pareto exponent provides the
distribution of types in the upper tail of the size distribution.
arXiv link: http://arxiv.org/abs/1712.01431v5
The Effect of Partisanship and Political Advertising on Close Family Ties
studies public institutions and political processes, ignoring private effects
including strained family ties. Using anonymized smartphone-location data and
precinct-level voting, we show that Thanksgiving dinners attended by
opposing-party precinct residents were 30-50 minutes shorter than same-party
dinners. This decline from a mean of 257 minutes survives extensive spatial and
demographic controls. Dinner reductions in 2016 tripled for travelers from
media markets with heavy political advertising --- an effect not observed in
2015 --- implying a relationship to election-related behavior. Effects appear
asymmetric: while fewer Democratic-precinct residents traveled in 2016 than
2015, political differences shortened Thanksgiving dinners more among
Republican-precinct residents. Nationwide, 34 million person-hours of
cross-partisan Thanksgiving discourse were lost in 2016 to partisan effects.
arXiv link: http://arxiv.org/abs/1711.10602v2
Identification of and correction for publication bias
selective publication leads to biased estimates and distorted inference. This
paper proposes two approaches for identifying the conditional probability of
publication as a function of a study's results, the first based on systematic
replication studies and the second based on meta-studies. For known conditional
publication probabilities, we propose median-unbiased estimators and associated
confidence sets that correct for selective publication. We apply our methods to
recent large-scale replication studies in experimental economics and
psychology, and to meta-studies of the effects of minimum wages and de-worming
programs.
arXiv link: http://arxiv.org/abs/1711.10527v1
Constructive Identification of Heterogeneous Elasticities in the Cobb-Douglas Production Function
Cobb-Douglas production function. The identification is constructive with
closed-form formulas for the elasticity with respect to each input for each
firm. We propose that the flexible input cost ratio plays the role of a control
function under "non-collinear heterogeneity" between elasticities with respect
to two flexible inputs. The ex ante flexible input cost share can be used to
identify the elasticities with respect to flexible inputs for each firm. The
elasticities with respect to labor and capital can be subsequently identified
for each firm under the timing assumption admitting the functional
independence.
arXiv link: http://arxiv.org/abs/1711.10031v1
Forecasting of a Hierarchical Functional Time Series on Example of Macromodel for Day and Night Air Pollution in Silesia Region: A Critical Overview
of hierarchy of its components, i.e., in modelling trade accounts related to
foreign exchange or in optimization of regional air protection policy.
A problem of reconciliation of forecasts obtained on different levels of
hierarchy has been addressed in the statistical and econometric literature for
many times and concerns bringing together forecasts obtained independently at
different levels of hierarchy.
This paper deals with this issue in case of a hierarchical functional time
series. We present and critically discuss a state of art and indicate
opportunities of an application of these methods to a certain environment
protection problem. We critically compare the best predictor known from the
literature with our own original proposal. Within the paper we study a
macromodel describing a day and night air pollution in Silesia region divided
into five subregions.
arXiv link: http://arxiv.org/abs/1712.03797v1
The Research on the Stagnant Development of Shantou Special Economic Zone Under Reform and Opening-Up Policy
Zone under Reform and Opening-Up Policy from 1980 through 2016 with a focus on
policy making issues and its influences on local economy. This paper is divided
into two parts, 1980 to 1991, 1992 to 2016 in accordance with the separation of
the original Shantou District into three cities: Shantou, Chaozhou and Jieyang
in the end of 1991. This study analyzes the policy making issues in the
separation of the original Shantou District, the influences of the policy on
Shantou's economy after separation, the possibility of merging the three cities
into one big new economic district in the future and reasons that lead to the
stagnant development of Shantou in recent 20 years. This paper uses statistical
longitudinal analysis in analyzing economic problems with applications of
non-parametric statistics through generalized additive model and time series
forecasting methods. The paper is authored by Bowen Cai solely, who is the
graduate student in the PhD program of Applied and Computational Mathematics
and Statistics at the University of Notre Dame with concentration in big data
analysis.
arXiv link: http://arxiv.org/abs/1711.08877v1
Estimation Considerations in Contextual Bandits
outcome model as well as the exploration method used, particularly in the
presence of rich heterogeneity or complex outcome models, which can lead to
difficult estimation problems along the path of learning. We study a
consideration for the exploration vs. exploitation framework that does not
arise in multi-armed bandits but is crucial in contextual bandits; the way
exploration and exploitation is conducted in the present affects the bias and
variance in the potential outcome model estimation in subsequent stages of
learning. We develop parametric and non-parametric contextual bandits that
integrate balancing methods from the causal inference literature in their
estimation to make it less prone to problems of estimation bias. We provide the
first regret bound analyses for contextual bandits with balancing in the domain
of linear contextual bandits that match the state of the art regret bounds. We
demonstrate the strong practical advantage of balanced contextual bandits on a
large number of supervised learning datasets and on a synthetic example that
simulates model mis-specification and prejudice in the initial training data.
Additionally, we develop contextual bandits with simpler assignment policies by
leveraging sparse model estimation methods from the econometrics literature and
demonstrate empirically that in the early stages they can improve the rate of
learning and decrease regret.
arXiv link: http://arxiv.org/abs/1711.07077v4
Robust Synthetic Control
comparative case studies. Like the classical method, we present an algorithm to
estimate the unobservable counterfactual of a treatment unit. A distinguishing
feature of our algorithm is that of de-noising the data matrix via singular
value thresholding, which renders our approach robust in multiple facets: it
automatically identifies a good subset of donors, overcomes the challenges of
missing data, and continues to work well in settings where covariate
information may not be provided. To begin, we establish the condition under
which the fundamental assumption in synthetic control-like approaches holds,
i.e. when the linear relationship between the treatment unit and the donor pool
prevails in both the pre- and post-intervention periods. We provide the first
finite sample analysis for a broader class of models, the Latent Variable
Model, in contrast to Factor Models previously considered in the literature.
Further, we show that our de-noising procedure accurately imputes missing
entries, producing a consistent estimator of the underlying signal matrix
provided $p = \Omega( T^{-1 + \zeta})$ for some $\zeta > 0$; here, $p$ is the
fraction of observed data and $T$ is the time interval of interest. Under the
same setting, we prove that the mean-squared-error (MSE) in our prediction
estimation scales as $O(\sigma^2/p + 1/T)$, where $\sigma^2$ is the
noise variance. Using a data aggregation method, we show that the MSE can be
made as small as $O(T^{-1/2+\gamma})$ for any $\gamma \in (0, 1/2)$, leading to
a consistent estimator. We also introduce a Bayesian framework to quantify the
model uncertainty through posterior probabilities. Our experiments, using both
real-world and synthetic datasets, demonstrate that our robust generalization
yields an improvement over the classical synthetic control method.
arXiv link: http://arxiv.org/abs/1711.06940v1
Calibration of Distributionally Robust Empirical Optimization Models
problems with smooth $\phi$-divergence penalties and smooth concave objective
functions, and develop a theory for data-driven calibration of the non-negative
"robustness parameter" $\delta$ that controls the size of the deviations from
the nominal model. Building on the intuition that robust optimization reduces
the sensitivity of the expected reward to errors in the model by controlling
the spread of the reward distribution, we show that the first-order benefit of
“little bit of robustness" (i.e., $\delta$ small, positive) is a significant
reduction in the variance of the out-of-sample reward while the corresponding
impact on the mean is almost an order of magnitude smaller. One implication is
that substantial variance (sensitivity) reduction is possible at little cost if
the robustness parameter is properly calibrated. To this end, we introduce the
notion of a robust mean-variance frontier to select the robustness parameter
and show that it can be approximated using resampling methods like the
bootstrap. Our examples show that robust solutions resulting from "open loop"
calibration methods (e.g., selecting a $90%$ confidence level regardless of
the data and objective function) can be very conservative out-of-sample, while
those corresponding to the robustness parameter that optimizes an estimate of
the out-of-sample expected reward (e.g., via the bootstrap) with no regard for
the variance are often insufficiently robust.
arXiv link: http://arxiv.org/abs/1711.06565v2
Economic Complexity Unfolded: Interpretable Model for the Productive Structure of Economies
productive structure of an economy. It resides on the premise of hidden
capabilities - fundamental endowments underlying the productive structure. In
general, measuring the capabilities behind economic complexity directly is
difficult, and indirect measures have been suggested which exploit the fact
that the presence of the capabilities is expressed in a country's mix of
products. We complement these studies by introducing a probabilistic framework
which leverages Bayesian non-parametric techniques to extract the dominant
features behind the comparative advantage in exported products. Based on
economic evidence and trade data, we place a restricted Indian Buffet Process
on the distribution of countries' capability endowment, appealing to a culinary
metaphor to model the process of capability acquisition. The approach comes
with a unique level of interpretability, as it produces a concise and
economically plausible description of the instantiated capabilities.
arXiv link: http://arxiv.org/abs/1711.07327v2
Improved Density and Distribution Function Estimation
restrictions, kernel density and distribution function estimators with implied
generalised empirical likelihood probabilities as weights achieve a reduction
in variance due to the systematic use of this extra information. The particular
interest here is the estimation of densities or distributions of (generalised)
residuals in semi-parametric models defined by a finite number of moment
restrictions. Such estimates are of great practical interest, being potentially
of use for diagnostic purposes, including tests of parametric assumptions on an
error distribution, goodness-of-fit tests or tests of overidentifying moment
restrictions. The paper gives conditions for the consistency and describes the
asymptotic mean squared error properties of the kernel density and distribution
estimators proposed in the paper. A simulation study evaluates the small sample
performance of these estimators. Supplements provide analytic examples to
illustrate situations where kernel weighting provides a reduction in variance
together with proofs of the results in the paper.
arXiv link: http://arxiv.org/abs/1711.04793v2
Uniform Inference for Characteristic Effects of Large Continuous-Time Linear Models
where the structural parameter depends on a set of characteristics, whose
effects are of interest. The leading example is the linear factor model in
financial economics where factor betas depend on observed characteristics such
as firm specific instruments and macroeconomic variables, and their effects
pick up long-run time-varying beta fluctuations. We specify the factor betas as
the sum of characteristic effects and an orthogonal idiosyncratic parameter
that captures high-frequency movements. It is often the case that researchers
do not know whether or not the latter exists, or its strengths, and thus the
inference about the characteristic effects should be valid uniformly over a
broad class of data generating processes for idiosyncratic parameters. We
construct our estimation and inference in a two-step continuous-time GMM
framework. It is found that the limiting distribution of the estimated
characteristic effects has a discontinuity when the variance of the
idiosyncratic parameter is near the boundary (zero), which makes the usual
"plug-in" method using the estimated asymptotic variance only valid pointwise
and may produce either over- or under- coveraging probabilities. We show that
the uniformity can be achieved by cross-sectional bootstrap. Our procedure
allows both known and estimated factors, and also features a bias correction
for the effect of estimating unknown factors.
arXiv link: http://arxiv.org/abs/1711.04392v2
How fragile are information cascades?
cascades. That is, when agents make decisions based on their private
information, as well as observing the actions of those before them, then it
might be rational to ignore their private signal and imitate the action of
previous individuals. If the individuals are choosing between a right and a
wrong state, and the initial actions are wrong, then the whole cascade will be
wrong. This issue is due to the fact that cascades can be based on very little
information.
We show that if agents occasionally disregard the actions of others and base
their action only on their private information, then wrong cascades can be
avoided. Moreover, we study the optimal asymptotic rate at which the error
probability at time $t$ can go to zero. The optimal policy is for the player at
time $t$ to follow their private information with probability $p_{t} = c/t$,
leading to a learning rate of $c'/t$, where the constants $c$ and $c'$ are
explicit.
arXiv link: http://arxiv.org/abs/1711.04024v2
Testing for observation-dependent regime switching in mixture autoregressive models
specified either as constants (`mixture models') or are governed by a
finite-state Markov chain (`Markov switching models') are long-standing
problems that have also attracted recent interest. This paper considers testing
for regime switching when the regime switching probabilities are time-varying
and depend on observed data (`observation-dependent regime switching').
Specifically, we consider the likelihood ratio test for observation-dependent
regime switching in mixture autoregressive models. The testing problem is
highly nonstandard, involving unidentified nuisance parameters under the null,
parameters on the boundary, singular information matrices, and higher-order
approximations of the log-likelihood. We derive the asymptotic null
distribution of the likelihood ratio test statistic in a general mixture
autoregressive setting using high-level conditions that allow for various forms
of dependence of the regime switching probabilities on past observations, and
we illustrate the theory using two particular mixture autoregressive models.
The likelihood ratio test has a nonstandard asymptotic distribution that can
easily be simulated, and Monte Carlo studies show the test to have satisfactory
finite sample size and power properties.
arXiv link: http://arxiv.org/abs/1711.03959v1
SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements
SHOPPER uses interpretable components to model the forces that drive how a
customer chooses products; in particular, we designed SHOPPER to capture how
items interact with other items. We develop an efficient posterior inference
algorithm to estimate these forces from large-scale data, and we analyze a
large dataset from a major chain grocery store. We are interested in answering
counterfactual queries about changes in prices. We found that SHOPPER provides
accurate predictions even under price interventions, and that it helps identify
complementary and substitutable pairs of products.
arXiv link: http://arxiv.org/abs/1711.03560v3
Measuring Price Discovery between Nearby and Deferred Contracts in Storable and Non-Storable Commodity Futures Markets
the speed at which they process information is of value in understanding the
pricing discovery process. Using price discovery measures, including Putnins
(2013) information leadership share and intraday data, we quantify the
proportional contribution of price discovery between nearby and deferred
contracts in the corn and live cattle futures markets. Price discovery is more
systematic in the corn than in the live cattle market. On average, nearby
contracts lead all deferred contracts in price discovery in the corn market,
but have a relatively less dominant role in the live cattle market. In both
markets, the nearby contract loses dominance when its relative volume share
dips below 50%, which occurs about 2-3 weeks before expiration in corn and 5-6
weeks before expiration in live cattle. Regression results indicate that the
share of price discovery is most closely linked to trading volume but is also
affected, to far less degree, by time to expiration, backwardation, USDA
announcements and market crashes. The effects of these other factors vary
between the markets which likely reflect the difference in storability as well
as other market-related characteristics.
arXiv link: http://arxiv.org/abs/1711.03506v1
Identification and Estimation of Spillover Effects in Randomized Experiments
experiments where units' outcomes may depend on the treatment assignments of
other units within a group. I show that the commonly-used reduced-form
linear-in-means regression identifies a weighted sum of spillover effects with
some negative weights, and that the difference in means between treated and
controls identifies a combination of direct and spillover effects entering with
different signs. I propose nonparametric estimators for average direct and
spillover effects that overcome these issues and are consistent and
asymptotically normal under a precise relationship between the number of
parameters of interest, the total sample size and the treatment assignment
mechanism. These findings are illustrated using data from a conditional cash
transfer program and with simulations. The empirical results reveal the
potential pitfalls of failing to flexibly account for spillover effects in
policy evaluation: the estimated difference in means and the reduced-form
linear-in-means coefficients are all close to zero and statistically
insignificant, whereas the nonparametric estimators I propose reveal large,
nonlinear and significant spillover effects.
arXiv link: http://arxiv.org/abs/1711.02745v8
In search of a new economic model determined by logistic growth
economic growth models within the framework of the Lie group theory. We propose
a new growth model based on the assumption of logistic growth in factors. It is
employed to derive new production functions and introduce a new notion of wage
share. In the process it is shown that the new functions compare reasonably
well against relevant economic data. The corresponding problem of maximization
of profit under conditions of perfect competition is solved with the aid of one
of these functions. In addition, it is explained in reasonably rigorous
mathematical terms why Bowley's law no longer holds true in post-1960 data.
arXiv link: http://arxiv.org/abs/1711.02625v5
Semiparametric Estimation of Structural Functions in Nonseparable Triangular Models
provide a theoretically appealing framework for the modelling of complex
structural relationships. However, they are not commonly used in practice due
to the need for exogenous variables with large support for identification, the
curse of dimensionality in estimation, and the lack of inferential tools. This
paper introduces two classes of semiparametric nonseparable triangular models
that address these limitations. They are based on distribution and quantile
regression modelling of the reduced form conditional distributions of the
endogenous variables. We show that average, distribution and quantile
structural functions are identified in these systems through a control function
approach that does not require a large support condition. We propose a
computationally attractive three-stage procedure to estimate the structural
functions where the first two stages consist of quantile or distribution
regressions. We provide asymptotic theory and uniform inference methods for
each stage. In particular, we derive functional central limit theorems and
bootstrap functional central limit theorems for the distribution regression
estimators of the structural functions. These results establish the validity of
the bootstrap for three-stage estimators of structural functions, and lead to
simple inference algorithms. We illustrate the implementation and applicability
of all our methods with numerical simulations and an empirical application to
demand analysis.
arXiv link: http://arxiv.org/abs/1711.02184v3
Identifying the Effects of a Program Offer with an Application to Head Start
heterogeneity in both choice sets and preferences to evaluate the average
effects of a program offer. I show how to exploit the model structure to define
parameters capturing these effects and then computationally characterize their
identified sets under instrumental variable variation in choice sets. I
illustrate these tools by analyzing the effects of providing an offer to the
Head Start preschool program using data from the Head Start Impact Study. I
find that such a policy affects a large number of children who take up the
offer, and that they subsequently have positive effects on test scores. These
effects arise from children who do not have any preschool as an outside option.
A cost-benefit analysis reveals that the earning benefits associated with the
test score gains can be large and outweigh the net costs associated with offer
take up.
arXiv link: http://arxiv.org/abs/1711.02048v6
Equity in Startups
and economic growth. An important feature of the startup phenomenon has been
the wealth created through equity in startups to all stakeholders. These
include the startup founders, the investors, and also the employees through the
stock-option mechanism and universities through licenses of intellectual
property. In the employee group, the allocation to important managers like the
chief executive, vice-presidents and other officers, and independent board
members is also analyzed. This report analyzes how equity was allocated in more
than 400 startups, most of which had filed for an initial public offering. The
author has the ambition of informing a general audience about best practice in
equity split, in particular in Silicon Valley, the central place for startup
innovation.
arXiv link: http://arxiv.org/abs/1711.00661v1
Startups and Stanford University
and economic growth. Silicon Valley has been the place where the startup
phenomenon was the most obvious and Stanford University was a major component
of that success. Companies such as Google, Yahoo, Sun Microsystems, Cisco,
Hewlett Packard had very strong links with Stanford but even these vary famous
success stories cannot fully describe the richness and diversity of the
Stanford entrepreneurial activity. This report explores the dynamics of more
than 5000 companies founded by Stanford University alumni and staff, through
their value creation, their field of activities, their growth patterns and
more. The report also explores some features of the founders of these companies
such as their academic background or the number of years between their Stanford
experience and their company creation.
arXiv link: http://arxiv.org/abs/1711.00644v1
Sophisticated and small versus simple and sizeable: When does it pay off to introduce drifting coefficients in Bayesian VARs?
time-varying parameter VAR framework via thorough predictive exercises for the
Euro Area, the United Kingdom and the United States. It turns out that
sophisticated dynamics through drifting coefficients are important in small
data sets, while simpler models tend to perform better in sizeable data sets.
To combine the best of both worlds, novel shrinkage priors help to mitigate the
curse of dimensionality, resulting in competitive forecasts for all scenarios
considered. Furthermore, we discuss dynamic model selection to improve upon the
best performing individual model for each point in time.
arXiv link: http://arxiv.org/abs/1711.00564v4
Orthogonal Machine Learning: Power and Limitations
parameters of interest even when high-dimensional or nonparametric nuisance
parameters are estimated at an $n^{-1/4}$ rate. The key is to employ
Neyman-orthogonal moment equations which are first-order insensitive to
perturbations in the nuisance parameters. We show that the $n^{-1/4}$
requirement can be improved to $n^{-1/(2k+2)}$ by employing a $k$-th order
notion of orthogonality that grants robustness to more complex or
higher-dimensional nuisance parameters. In the partially linear regression
setting popular in causal inference, we show that we can construct second-order
orthogonal moments if and only if the treatment residual is not normally
distributed. Our proof relies on Stein's lemma and may be of independent
interest. We conclude by demonstrating the robustness benefits of an explicit
doubly-orthogonal estimation procedure for treatment effect.
arXiv link: http://arxiv.org/abs/1711.00342v6
On some further properties and application of Weibull-R family of distributions
distributions (Alzaghal, Ghosh and Alzaatreh (2016)). We derive some new
structural properties of the Weibull-R family of distributions. We provide
various characterizations of the family via conditional moments, some functions
of order statistics and via record values.
arXiv link: http://arxiv.org/abs/1711.00171v1
Macroeconomics and FinTech: Uncovering Latent Macroeconomic Effects on Peer-to-Peer Lending
trend that is displacing traditional retail banking. Studies on P2P lending
have focused on predicting individual interest rates or default probabilities.
However, the relationship between aggregated P2P interest rates and the general
economy will be of interest to investors and borrowers as the P2P credit market
matures. We show that the variation in P2P interest rates across grade types
are determined by three macroeconomic latent factors formed by Canonical
Correlation Analysis (CCA) - macro default, investor uncertainty, and the
fundamental value of the market. However, the variation in P2P interest rates
across term types cannot be explained by the general economy.
arXiv link: http://arxiv.org/abs/1710.11283v1
Nonparametric Identification in Index Models of Link Formation
index and a degree heterogeneity index. We provide nonparametric identification
results in a single large network setting for the potentially nonparametric
homophily effect function, the realizations of unobserved individual fixed
effects and the unknown distribution of idiosyncratic pairwise shocks, up to
normalization, for each possible true value of the unknown parameters. We
propose a novel form of scale normalization on an arbitrary interquantile
range, which is not only theoretically robust but also proves particularly
convenient for the identification analysis, as quantiles provide direct
linkages between the observable conditional probabilities and the unknown index
values. We then use an inductive "in-fill and out-expansion" algorithm to
establish our main results, and consider extensions to more general settings
that allow nonseparable dependence between homophily and degree heterogeneity,
as well as certain extents of network sparsity and weaker assumptions on the
support of unobserved heterogeneity. As a byproduct, we also propose a concept
called "modeling equivalence" as a refinement of "observational equivalence",
and use it to provide a formal discussion about normalization, identification
and their interplay with counterfactuals.
arXiv link: http://arxiv.org/abs/1710.11230v5
Artificial Intelligence as Structural Estimation: Economic Interpretations of Deep Blue, Bonanza, and AlphaGo
number of tasks, but understanding and explaining AI remain challenging. This
paper clarifies the connections between machine-learning algorithms to develop
AIs and the econometrics of dynamic structural models through the case studies
of three famous game AIs. Chess-playing Deep Blue is a calibrated value
function, whereas shogi-playing Bonanza is an estimated value function via
Rust's (1987) nested fixed-point method. AlphaGo's "supervised-learning policy
network" is a deep neural network implementation of Hotz and Miller's (1993)
conditional choice probability estimation; its "reinforcement-learning value
network" is equivalent to Hotz, Miller, Sanders, and Smith's (1994) conditional
choice simulation method. Relaxing these AIs' implicit econometric assumptions
would improve their structural interpretability.
arXiv link: http://arxiv.org/abs/1710.10967v3
Matrix Completion Methods for Causal Panel Data Models
panel data, where some units are exposed to a treatment during some periods and
the goal is estimating counterfactual (untreated) outcomes for the treated
unit/period combinations. We propose a class of matrix completion estimators
that uses the observed elements of the matrix of control outcomes corresponding
to untreated unit/periods to impute the "missing" elements of the control
outcome matrix, corresponding to treated units/periods. This leads to a matrix
that well-approximates the original (incomplete) matrix, but has lower
complexity according to the nuclear norm for matrices. We generalize results
from the matrix completion literature by allowing the patterns of missing data
to have a time series dependency structure that is common in social science
applications. We present novel insights concerning the connections between the
matrix completion literature, the literature on interactive fixed effects
models and the literatures on program evaluation under unconfoundedness and
synthetic control methods. We show that all these estimators can be viewed as
focusing on the same objective function. They differ solely in the way they
deal with identification, in some cases solely through regularization (our
proposed nuclear norm matrix completion estimator) and in other cases primarily
through imposing hard restrictions (the unconfoundedness and synthetic control
approaches). The proposed method outperforms unconfoundedness-based or
synthetic control estimators in simulations based on real data.
arXiv link: http://arxiv.org/abs/1710.10251v5
Shape-Constrained Density Estimation via Optimal Transport
sufficiently strong constraint, $\log-$concavity being a common example, has
the effect of restoring consistency without requiring additional parameters.
Since many results in economics require densities to satisfy a regularity
condition, these estimators are also attractive for the structural estimation
of economic models. In all of the examples of regularity conditions provided by
Bagnoli and Bergstrom (2005) and Ewerhart (2013), $\log-$concavity is
sufficient to ensure that the density satisfies the required conditions.
However, in many cases $\log-$concavity is far from necessary, and it has the
unfortunate side effect of ruling out sub-exponential tail behavior.
In this paper, we use optimal transport to formulate a shape constrained
density estimator. We initially describe the estimator using a $\rho-$concavity
constraint. In this setting we provide results on consistency, asymptotic
distribution, convexity of the optimization problem defining the estimator, and
formulate a test for the null hypothesis that the population density satisfies
a shape constraint. Afterward, we provide sufficient conditions for these
results to hold using an arbitrary shape constraint. This generalization is
used to explore whether the California Department of Transportation's decision
to award construction contracts with the use of a first price auction is cost
minimizing. We estimate the marginal costs of construction firms subject to
Myerson's (1981) regularity condition, which is a requirement for the first
price reverse auction to be cost minimizing. The proposed test fails to reject
that the regularity condition is satisfied.
arXiv link: http://arxiv.org/abs/1710.09069v2
Asymptotic Distribution and Simultaneous Confidence Bands for Ratios of Quantile Functions
used in medical research to compare treatment and control groups or in
economics to compare various economic variables when repeated cross-sectional
data are available. Inspired by the so-called growth incidence curves
introduced in poverty research, we argue that the ratio of quantile functions
is a more appropriate and informative tool to compare two distributions. We
present an estimator for the ratio of quantile functions and develop
corresponding simultaneous confidence bands, which allow to assess significance
of certain features of the quantile functions ratio. Derived simultaneous
confidence bands rely on the asymptotic distribution of the quantile functions
ratio and do not require re-sampling techniques. The performance of the
simultaneous confidence bands is demonstrated in simulations. Analysis of the
expenditure data from Uganda in years 1999, 2002 and 2005 illustrates the
relevance of our approach.
arXiv link: http://arxiv.org/abs/1710.09009v1
Calibrated Projection in MATLAB: Users' Manual
to construct confidence intervals proposed by Kaido, Molinari and Stoye (2017).
This manual provides details on how to use the package for inference on
projections of partially identified parameters. It also explains how to use the
MATLAB functions we developed to compute confidence intervals on solutions of
nonlinear optimization problems with estimated constraints.
arXiv link: http://arxiv.org/abs/1710.09707v1
Calibration of Machine Learning Classifiers for Probability of Default Modelling
probability of default. The validation of such predictive models is based both
on rank ability, and also on calibration (i.e. how accurately the probabilities
output by the model map to the observed probabilities). In this study we cover
the current best practices regarding calibration for binary classification, and
explore how different approaches yield different results on real world credit
scoring data. The limitations of evaluating credit scoring models using only
rank ability metrics are explored. A benchmark is run on 18 real world
datasets, and results compared. The calibration techniques used are Platt
Scaling and Isotonic Regression. Also, different machine learning models are
used: Logistic Regression, Random Forest Classifiers, and Gradient Boosting
Classifiers. Results show that when the dataset is treated as a time series,
the use of re-calibration with Isotonic Regression is able to improve the long
term calibration better than the alternative methods. Using re-calibration, the
non-parametric models are able to outperform the Logistic Regression on Brier
Score Loss.
arXiv link: http://arxiv.org/abs/1710.08901v1
Propensity score matching for multiple treatment levels: A CODA-based contribution
multiple treatment levels under the strong unconfoundedness assumption with the
help of the Aitchison distance proposed in the field of compositional data
analysis (CODA).
arXiv link: http://arxiv.org/abs/1710.08558v1
Existence in Multidimensional Screening with General Nonlinear Preferences
for the multidimensional screening problem with general nonlinear preferences.
We first formulate the principal's problem as a maximization problem with
$G$-convexity constraints and then use $G$-convex analysis to prove existence.
arXiv link: http://arxiv.org/abs/1710.08549v2
Electricity Market Theory Based on Continuous Time Commodity Model
re-examine the pricing theories applied in electricity market design. The
theory of spot pricing is the basis of electricity market design in many
countries, but it has two major drawbacks: one is that it is still based on the
traditional hourly scheduling/dispatch model, ignores the crucial time
continuity in electric power production and consumption and does not treat the
inter-temporal constraints seriously; the second is that it assumes that the
electricity products are homogeneous in the same dispatch period and cannot
distinguish the base, intermediate and peak power with obviously different
technical and economic characteristics. To overcome the shortcomings, this
paper presents a continuous time commodity model of electricity, including spot
pricing model and load duration model. The market optimization models under the
two pricing mechanisms are established with the Riemann and Lebesgue integrals
respectively and the functional optimization problem are solved by the
Euler-Lagrange equation to obtain the market equilibria. The feasibility of
pricing according to load duration is proved by strict mathematical derivation.
Simulation results show that load duration pricing can correctly identify and
value different attributes of generators, reduce the total electricity
purchasing cost, and distribute profits among the power plants more equitably.
The theory and methods proposed in this paper will provide new ideas and
theoretical foundation for the development of electric power markets.
arXiv link: http://arxiv.org/abs/1710.07918v1
Modal Regression using Kernel Density Estimation: a Review
estimation. Modal regression is an alternative approach for investigating
relationship between a response variable and its covariates. Specifically,
modal regression summarizes the interactions between the response variable and
covariates using the conditional mode or local modes. We first describe the
underlying model of modal regression and its estimators based on kernel density
estimation. We then review the asymptotic properties of the estimators and
strategies for choosing the smoothing bandwidth. We also discuss useful
algorithms and similar alternative approaches for modal regression, and propose
future direction in this field.
arXiv link: http://arxiv.org/abs/1710.07004v2
Minimax Linear Estimation at a Boundary Point
unknown function at a boundary point of its domain in a Gaussian white noise
model under the restriction that the first-order derivative of the unknown
function is Lipschitz continuous (the second-order H\"{o}lder class). The
result is then applied to construct the minimax optimal estimator for the
regression discontinuity design model, where the parameter of interest involves
function values at boundary points.
arXiv link: http://arxiv.org/abs/1710.06809v1
Revenue-based Attribution Modeling for Online Advertising
quantify how revenue should be attributed to online advertising inputs. We
adopt and further develop relative importance method, which is based on
regression models that have been extensively studied and utilized to
investigate the relationship between advertising efforts and market reaction
(revenue). Relative importance method aims at decomposing and allocating
marginal contributions to the coefficient of determination (R^2) of regression
models as attribution values. In particular, we adopt two alternative
submethods to perform this decomposition: dominance analysis and relative
weight analysis. Moreover, we demonstrate an extension of the decomposition
methods from standard linear model to additive model. We claim that our new
approaches are more flexible and accurate in modeling the underlying
relationship and calculating the attribution values. We use simulation examples
to demonstrate the superior performance of our new approaches over traditional
methods. We further illustrate the value of our proposed approaches using a
real advertising campaign dataset.
arXiv link: http://arxiv.org/abs/1710.06561v1
Inference on Auctions with Weak Assumptions on Information
question of inference on auction fundamentals (e.g. valuation distributions,
welfare measures) under weak assumptions on information structure. The question
is important as it allows us to learn about the valuation distribution in a
robust way, i.e., without assuming that a particular information structure
holds across observations. We leverage the recent contributions of
Bergemann2013 in the robust mechanism design literature that exploit the
link between Bayesian Correlated Equilibria and Bayesian Nash Equilibria in
incomplete information games to construct an econometrics framework for
learning about auction fundamentals using observed data on bids. We showcase
our construction of identified sets in private value and common value auctions.
Our approach for constructing these sets inherits the computational simplicity
of solving for correlated equilibria: checking whether a particular valuation
distribution belongs to the identified set is as simple as determining whether
a {\it linear} program is feasible. A similar linear program can be used to
construct the identified set on various welfare measures and counterfactual
objects. For inference and to summarize statistical uncertainty, we propose
novel finite sample methods using tail inequalities that are used to construct
confidence regions on sets. We also highlight methods based on Bayesian
bootstrap and subsampling. A set of Monte Carlo experiments show adequate
finite sample properties of our inference procedures. We illustrate our methods
using data from OCS auctions.
arXiv link: http://arxiv.org/abs/1710.03830v2
A Unified Approach on the Local Power of Panel Unit Root Tests
asymptotic power for panel unit root tests, which is one of the most important
issues in nonstationary panel data literature. Two most widely used panel unit
root tests known as Levin-Lin-Chu (LLC, Levin, Lin and Chu (2002)) and
Im-Pesaran-Shin (IPS, Im, Pesaran and Shin (2003)) tests are systematically
studied for various situations to illustrate our method. Our approach is
characteristic function based, and can be used directly in deriving the moments
of the asymptotic distributions of these test statistics under the null and the
local-to-unity alternatives. For the LLC test, the approach provides an
alternative way to obtain the results that can be derived by the existing
method. For the IPS test, the new results are obtained, which fills the gap in
the literature where few results exist, since the IPS test is non-admissible.
Moreover, our approach has the advantage in deriving Edgeworth expansions of
these tests, which are also given in the paper. The simulations are presented
to illustrate our theoretical findings.
arXiv link: http://arxiv.org/abs/1710.02944v1
Forecasting Across Time Series Databases using Recurrent Neural Networks on Groups of Similar Series: A Clustering Approach
containing large quantities of similar time series are available. Forecasting
time series in these domains with traditional univariate forecasting procedures
leaves great potentials for producing accurate forecasts untapped. Recurrent
neural networks (RNNs), and in particular Long Short-Term Memory (LSTM)
networks, have proven recently that they are able to outperform
state-of-the-art univariate time series forecasting methods in this context
when trained across all available time series. However, if the time series
database is heterogeneous, accuracy may degenerate, so that on the way towards
fully automatic forecasting methods in this space, a notion of similarity
between the time series needs to be built into the methods. To this end, we
present a prediction model that can be used with different types of RNN models
on subgroups of similar time series, which are identified by time series
clustering techniques. We assess our proposed methodology using LSTM networks,
a widely popular RNN variant. Our method achieves competitive results on
benchmarking datasets under competition evaluation procedures. In particular,
in terms of mean sMAPE accuracy, it consistently outperforms the baseline LSTM
model and outperforms all other methods on the CIF2016 forecasting competition
dataset.
arXiv link: http://arxiv.org/abs/1710.03222v2
When Should You Adjust Standard Errors for Clustering?
associated standard errors that account for "clustering" of units, where
clusters are defined by factors such as geography. Clustering adjustments are
typically motivated by the concern that unobserved components of outcomes for
units within clusters are correlated. However, this motivation does not provide
guidance about questions such as: (i) Why should we adjust standard errors for
clustering in some situations but not others? How can we justify the common
practice of clustering in observational studies but not randomized experiments,
or clustering by state but not by gender? (ii) Why is conventional clustering a
potentially conservative "all-or-nothing" adjustment, and are there alternative
methods that respond to data and are less conservative? (iii) In what settings
does the choice of whether and how to cluster make a difference? We address
these questions using a framework of sampling and design inference. We argue
that clustering can be needed to address sampling issues if sampling follows a
two stage process where in the first stage, a subset of clusters are sampled
from a population of clusters, and in the second stage, units are sampled from
the sampled clusters. Then, clustered standard errors account for the existence
of clusters in the population that we do not see in the sample. Clustering can
be needed to account for design issues if treatment assignment is correlated
with membership in a cluster. We propose new variance estimators to deal with
intermediate settings where conventional cluster standard errors are
unnecessarily conservative and robust standard errors are too small.
arXiv link: http://arxiv.org/abs/1710.02926v4
A Note on Gale, Kuhn, and Tucker's Reductions of Zero-Sum Games
packaging some strategies with respect to a probability distribution on them.
In terms of value, they gave conditions for a desirable reduction. We show that
a probability distribution for a desirable reduction relies on optimal
strategies in the original game. Also, we correct an improper example given by
them to show that the reverse of a theorem does not hold.
arXiv link: http://arxiv.org/abs/1710.02326v1
Finite Time Identification in Unstable Linear Systems
well-studied problem in the literature, both in the low and high-dimensional
settings. However, there are hardly any results for the unstable case,
especially regarding finite time bounds. For this setting, classical results on
least-squares estimation of the dynamics parameters are not applicable and
therefore new concepts and technical approaches need to be developed to address
the issue. Unstable linear systems arise in key real applications in control
theory, econometrics, and finance. This study establishes finite time bounds
for the identification error of the least-squares estimates for a fairly large
class of heavy-tailed noise distributions, and transition matrices of such
systems. The results relate the time length (samples) required for estimation
to a function of the problem dimension and key characteristics of the true
underlying transition matrix and the noise distribution. To establish them,
appropriate concentration inequalities for random matrices and for sequences of
martingale differences are leveraged.
arXiv link: http://arxiv.org/abs/1710.01852v2
Rate-Optimal Estimation of the Intercept in a Semiparametric Sample-Selection Model
model in cases where the outcome varaible is observed subject to a selection
rule. The intercept is often in this context of inherent interest; for example,
in a program evaluation context, the difference between the intercepts in
outcome equations for participants and non-participants can be interpreted as
the difference in average outcomes of participants and their counterfactual
average outcomes if they had chosen not to participate. The new estimator can
under mild conditions exhibit a rate of convergence in probability equal to
$n^{-p/(2p+1)}$, where $p\ge 2$ is an integer that indexes the strength of
certain smoothness assumptions. This rate of convergence is shown in this
context to be the optimal rate of convergence for estimation of the intercept
parameter in terms of a minimax criterion. The new estimator, unlike other
proposals in the literature, is under mild conditions consistent and
asymptotically normal with a rate of convergence that is the same regardless of
the degree to which selection depends on unobservables in the outcome equation.
Simulation evidence and an empirical example are included.
arXiv link: http://arxiv.org/abs/1710.01423v3
A Justification of Conditional Confidence Intervals
conditional means or variances, parameter uncertainty has to be taken into
account. Attempts to incorporate parameter uncertainty are typically based on
the unrealistic assumption of observing two independent processes, where one is
used for parameter estimation, and the other for conditioning upon. Such
unrealistic foundation raises the question whether these intervals are
theoretically justified in a realistic setting. This paper presents an
asymptotic justification for this type of intervals that does not require such
an unrealistic assumption, but relies on a sample-split approach instead. By
showing that our sample-split intervals coincide asymptotically with the
standard intervals, we provide a novel, and realistic, justification for
confidence intervals of conditional objects. The analysis is carried out for a
rich class of time series models.
arXiv link: http://arxiv.org/abs/1710.00643v2
A Note on the Multi-Agent Contracts in Continuous Time
decision-making problem with asymmetric information. In this paper, we extend
the single-agent dynamic incentive contract model in continuous-time to a
multi-agent scheme in finite horizon and allow the terminal reward to be
dependent on the history of actions and incentives. We first derive a set of
sufficient conditions for the existence of optimal contracts in the most
general setting and conditions under which they form a Nash equilibrium. Then
we show that the principal's problem can be converted to solving
Hamilton-Jacobi-Bellman (HJB) equation requiring a static Nash equilibrium.
Finally, we provide a framework to solve this problem by solving partial
differential equations (PDE) derived from backward stochastic differential
equations (BSDE).
arXiv link: http://arxiv.org/abs/1710.00377v2
Heterogeneous Employment Effects of Job Search Programmes: A Machine Learning Approach
programmes for unemployed workers. To investigate possibly heterogeneous
employment effects, we combine non-experimental causal empirical models with
Lasso-type estimators. The empirical analyses are based on rich administrative
data from Swiss social security records. We find considerable heterogeneities
only during the first six months after the start of training. Consistent with
previous results of the literature, unemployed persons with fewer employment
opportunities profit more from participating in these programmes. Furthermore,
we also document heterogeneous employment effects by residence status. Finally,
we show the potential of easy-to-implement programme participation rules for
improving average employment effects of these active labour market programmes.
arXiv link: http://arxiv.org/abs/1709.10279v2
Inference for VARs Identified with Sign Restrictions
autoregressions (SVARs) by imposing sign restrictions on the responses of a
subset of the endogenous variables to a particular structural shock
(sign-restricted SVARs). Most methods that have been used to construct
pointwise coverage bands for impulse responses of sign-restricted SVARs are
justified only from a Bayesian perspective. This paper demonstrates how to
formulate the inference problem for sign-restricted SVARs within a
moment-inequality framework. In particular, it develops methods of constructing
confidence bands for impulse response functions of sign-restricted SVARs that
are valid from a frequentist perspective. The paper also provides a comparison
of frequentist and Bayesian coverage bands in the context of an empirical
application - the former can be substantially wider than the latter.
arXiv link: http://arxiv.org/abs/1709.10196v2
Forecasting with Dynamic Panel Data Models
series using cross sectional information in panel data. We construct point
predictors using Tweedie's formula for the posterior mean of heterogeneous
coefficients under a correlated random effects distribution. This formula
utilizes cross-sectional information to transform the unit-specific (quasi)
maximum likelihood estimator into an approximation of the posterior mean under
a prior distribution that equals the population distribution of the random
coefficients. We show that the risk of a predictor based on a non-parametric
estimate of the Tweedie correction is asymptotically equivalent to the risk of
a predictor that treats the correlated-random-effects distribution as known
(ratio-optimality). Our empirical Bayes predictor performs well compared to
various competitors in a Monte Carlo study. In an empirical application we use
the predictor to forecast revenues for a large panel of bank holding companies
and compare forecasts that condition on actual and severely adverse
macroeconomic conditions.
arXiv link: http://arxiv.org/abs/1709.10193v1
Estimation of Graphical Models using the $L_{1,2}$ Norm
of dependence among agents. A widely-used estimator is the Graphical Lasso
(GLASSO), which amounts to a maximum likelihood estimation regularized using
the $L_{1,1}$ matrix norm on the precision matrix $\Omega$. The $L_{1,1}$ norm
is a lasso penalty that controls for sparsity, or the number of zeros in
$\Omega$. We propose a new estimator called Structured Graphical Lasso
(SGLASSO) that uses the $L_{1,2}$ mixed norm. The use of the $L_{1,2}$ penalty
controls for the structure of the sparsity in $\Omega$. We show that when the
network size is fixed, SGLASSO is asymptotically equivalent to an infeasible
GLASSO problem which prioritizes the sparsity-recovery of high-degree nodes.
Monte Carlo simulation shows that SGLASSO outperforms GLASSO in terms of
estimating the overall precision matrix and in terms of estimating the
structure of the graphical model. In an empirical illustration using a classic
firms' investment dataset, we obtain a network of firms' dependence that
exhibits the core-periphery structure, with General Motors, General Electric
and U.S. Steel forming the core group of firms.
arXiv link: http://arxiv.org/abs/1709.10038v2
Estimation of Peer Effects in Endogenous Social Networks: Control Function Approach
in which the peer group, defined by a social network, is endogenous in the
outcome equation for peer effects. Endogeneity is due to unobservable
individual characteristics that influence both link formation in the network
and the outcome of interest. We propose two estimators of the peer effect
equation that control for the endogeneity of the social connections using a
control function approach. We leave the functional form of the control function
unspecified and treat it as unknown. To estimate the model, we use a sieve
semiparametric approach, and we establish asymptotics of the semiparametric
estimator.
arXiv link: http://arxiv.org/abs/1709.10024v3
Quasi-random Monte Carlo application in CGE systematic sensitivity analysis
be assessed by conducting a Systematic Sensitivity Analysis. Different methods
have been used in the literature for SSA of CGE models such as Gaussian
Quadrature and Monte Carlo methods. This paper explores the use of Quasi-random
Monte Carlo methods based on the Halton and Sobol' sequences as means to
improve the efficiency over regular Monte Carlo SSA, thus reducing the
computational requirements of the SSA. The findings suggest that by using
low-discrepancy sequences, the number of simulations required by the regular MC
SSA methods can be notably reduced, hence lowering the computational time
required for SSA of CGE models.
arXiv link: http://arxiv.org/abs/1709.09755v1
Inference for Impulse Responses under Model Uncertainty
responses are constructed by estimating VAR models in levels - ignoring
cointegration rank uncertainty. We investigate the consequences of ignoring
this uncertainty. We adapt several methods for handling model uncertainty and
highlight their shortcomings. We propose a new method -
Weighted-Inference-by-Model-Plausibility (WIMP) - that takes rank uncertainty
into account in a data-driven way. In simulations the WIMP outperforms all
other methods considered, delivering intervals that are robust to rank
uncertainty, yet not overly conservative. We also study potential ramifications
of rank uncertainty on applied macroeconomic analysis by re-assessing the
effects of fiscal policy shocks.
arXiv link: http://arxiv.org/abs/1709.09583v3
Identification of hedonic equilibrium and nonseparable simultaneous equations
nonparametrically identified in hedonic equilibrium models, where products are
differentiated along more than one dimension and agents are characterized by
several dimensions of unobserved heterogeneity. With products differentiated
along a quality index and agents characterized by scalar unobserved
heterogeneity, single crossing conditions on preferences and technology provide
identifying restrictions in Ekeland, Heckman and Nesheim (2004) and Heckman,
Matzkin and Nesheim (2010). We develop similar shape restrictions in the
multi-attribute case. These shape restrictions, which are based on optimal
transport theory and generalized convexity, allow us to identify preferences
for goods differentiated along multiple dimensions, from the observation of a
single market. We thereby derive nonparametric identification results for
nonseparable simultaneous equations and multi-attribute hedonic equilibrium
models with (possibly) multiple dimensions of unobserved heterogeneity. One of
our results is a proof of absolute continuity of the distribution of
endogenously traded qualities, which is of independent interest.
arXiv link: http://arxiv.org/abs/1709.09570v6
Zero-rating of Content and its Effect on the Quality of Service in the Internet
on whether or not monetary interactions should be regulated between content and
access providers. Among the several topics discussed, `differential pricing'
has recently received attention due to `zero-rating' platforms proposed by some
service providers. In the differential pricing scheme, Internet Service
Providers (ISPs) can exempt data access charges for on content from certain CPs
(zero-rated) while no exemption is on content from other CPs. This allows the
possibility for Content Providers (CPs) to make `sponsorship' agreements to
zero-rate their content and attract more user traffic. In this paper, we study
the effect of differential pricing on various players in the Internet. We first
consider a model with a monopolistic ISP and multiple CPs where users select
CPs based on the quality of service (QoS) and data access charges. We show that
in a differential pricing regime 1) a CP offering low QoS can make have higher
surplus than a CP offering better QoS through sponsorships. 2) Overall QoS
(mean delay) for end users can degrade under differential pricing schemes. In
the oligopolistic market with multiple ISPs, users tend to select the ISP with
lowest ISP resulting in same type of conclusions as in the monopolistic market.
We then study how differential pricing effects the revenue of ISPs.
arXiv link: http://arxiv.org/abs/1709.09334v2
Sharp bounds and testability of a Roy model of STEM major choices
essential features, namely sector specific unobserved heterogeneity and
self-selection on the basis of potential outcomes. We characterize sharp bounds
on the joint distribution of potential outcomes and testable implications of
the Roy self-selection model under an instrumental constraint on the joint
distribution of potential outcomes we call stochastically monotone instrumental
variable (SMIV). We show that testing the Roy model selection is equivalent to
testing stochastic monotonicity of observed outcomes relative to the
instrument. We apply our sharp bounds to the derivation of a measure of
departure from Roy self-selection to identify values of observable
characteristics that induce the most costly misallocation of talent and sector
and are therefore prime targets for intervention. Special emphasis is put on
the case of binary outcomes, which has received little attention in the
literature to date. For richer sets of outcomes, we emphasize the distinction
between pointwise sharp bounds and functional sharp bounds, and its importance,
when constructing sharp bounds on functional features, such as inequality
measures. We analyze a Roy model of college major choice in Canada and Germany
within this framework, and we take a new look at the under-representation of
women in STEM.
arXiv link: http://arxiv.org/abs/1709.09284v2
Discrete Choice and Rational Inattention: a General Equivalence Result
rational inattention models. Matejka and McKay (2015, AER) showed that when
information costs are modelled using the Shannon entropy function, the
resulting choice probabilities in the rational inattention model take the
multinomial logit form. By exploiting convex-analytic properties of the
discrete choice model, we show that when information costs are modelled using a
class of generalized entropy functions, the choice probabilities in any
rational inattention model are observationally equivalent to some additive
random utility discrete choice model and vice versa. Thus any additive random
utility model can be given an interpretation in terms of boundedly rational
behavior. This includes empirically relevant specifications such as the probit
and nested logit models.
arXiv link: http://arxiv.org/abs/1709.09117v1
Inference on Estimators defined by Mathematical Programming
programming problems, focusing on the important special cases of linear
programming (LP) and quadratic programming (QP). In these settings, the
coefficients in both the objective function and the constraints of the
mathematical programming problem may be estimated from data and hence involve
sampling error. Our inference approach exploits the characterization of the
solutions to these programming problems by complementarity conditions; by doing
so, we can transform the problem of doing inference on the solution of a
constrained optimization problem (a non-standard inference problem) into one
involving inference based on a set of inequalities with pre-estimated
coefficients, which is much better understood. We evaluate the performance of
our procedure in several Monte Carlo simulations and an empirical application
to the classic portfolio selection problem in finance.
arXiv link: http://arxiv.org/abs/1709.09115v1
Bounds On Treatment Effects On Transitions
transition probabilities. We show that even under random assignment only the
instantaneous average treatment effect is point identified. Since treated and
control units drop out at different rates, randomization only ensures the
comparability of treatment and controls at the time of randomization, so that
long-run average treatment effects are not point identified. Instead we derive
informative bounds on these average treatment effects. Our bounds do not impose
(semi)parametric restrictions, for example, proportional hazards. We also
explore various assumptions such as monotone treatment response, common shocks
and positively correlated outcomes that tighten the bounds.
arXiv link: http://arxiv.org/abs/1709.08981v1
Fixed Effect Estimation of Large T Panel Data Models
models for long panels, where the number of time periods is relatively large.
We focus on semiparametric models with unobserved individual and time effects,
where the distribution of the outcome variable conditional on covariates and
unobserved effects is specified parametrically, while the distribution of the
unobserved effects is left unrestricted. Compared to existing reviews on long
panels (Arellano and Hahn 2007; a section in Arellano and Bonhomme 2011) we
discuss models with both individual and time effects, split-panel Jackknife
bias corrections, unbalanced panels, distribution and quantile effects, and
other extensions. Understanding and correcting the incidental parameter bias
caused by the estimation of many fixed effects is our main focus, and the
unifying theme is that the order of this bias is given by the simple formula
p/n for all models discussed, with p the number of estimated parameters and n
the total sample size.
arXiv link: http://arxiv.org/abs/1709.08980v2
Counterparty Credit Limits: The Impact of a Risk-Mitigation Measure on Everyday Trading
institution to cap its maximum possible exposure to a specified counterparty.
CCLs help institutions to mitigate counterparty credit risk via selective
diversification of their exposures. In this paper, we analyze how CCLs impact
the prices that institutions pay for their trades during everyday trading. We
study a high-quality data set from a large electronic trading platform in the
foreign exchange spot market, which enables institutions to apply CCLs. We find
empirically that CCLs had little impact on the vast majority of trades in this
data. We also study the impact of CCLs using a new model of trading. By
simulating our model with different underlying CCL networks, we highlight that
CCLs can have a major impact in some situations.
arXiv link: http://arxiv.org/abs/1709.08238v3
Is completeness necessary? Estimation in nonidentified linear models
high-dimensional or nonparametric machine learning methods, where the
identification of structural parameters is often challenging and untestable. In
linear settings, this identification hinges on the completeness condition,
which requires the nonsingularity of a high-dimensional matrix or operator and
may fail for finite samples or even at the population level. Regularized
estimators provide a solution by enabling consistent estimation of structural
or average structural functions, sometimes even under identification failure.
We show that the asymptotic distribution in these cases can be nonstandard. We
develop a comprehensive theory of regularized estimators, which include methods
such as high-dimensional ridge regularization, gradient descent, and principal
component analysis (PCA). The results are illustrated for high-dimensional and
nonparametric instrumental variable regressions and are supported through
simulation experiments.
arXiv link: http://arxiv.org/abs/1709.03473v5
Principal Components and Regularized Estimation of Factor Models
consistently estimated by the method of principal components, and principal
components can be constructed by iterative least squares regressions. Replacing
least squares with ridge regressions turns out to have the effect of shrinking
the singular values of the common component and possibly reducing its rank. The
method is used in the machine learning literature to recover low-rank matrices.
We study the procedure from the perspective of estimating a minimum-rank
approximate factor model. We show that the constrained factor estimates are
biased but can be more efficient in terms of mean-squared errors. Rank
consideration suggests a data-dependent penalty for selecting the number of
factors. The new criterion is more conservative in cases when the nominal
number of factors is inflated by the presence of weak factors or large
measurement noise. The framework is extended to incorporate a priori linear
constraints on the loadings. We provide asymptotic results that can be used to
test economic hypotheses.
arXiv link: http://arxiv.org/abs/1708.08137v2
Bias Reduction in Instrumental Variable Estimation through First-Stage Shrinkage
first-stage fit is poor. I show that better first-stage prediction can
alleviate this bias. In a two-stage linear regression model with Normal noise,
I consider shrinkage in the estimation of the first-stage instrumental variable
coefficients. For at least four instrumental variables and a single endogenous
regressor, I establish that the standard 2SLS estimator is dominated with
respect to bias. The dominating IV estimator applies James-Stein type shrinkage
in a first-stage high-dimensional Normal-means problem followed by a
control-function approach in the second stage. It preserves invariances of the
structural instrumental variable equations.
arXiv link: http://arxiv.org/abs/1708.06443v2
Unbiased Shrinkage Estimation
we care only about some parameters of a model, I show that we can reduce
variance without incurring bias if we have additional information about the
distribution of covariates. In a linear regression model with homoscedastic
Normal noise, I consider shrinkage estimation of the nuisance parameters
associated with control variables. For at least three control variables and
exogenous treatment, I establish that the standard least-squares estimator is
dominated with respect to squared-error loss in the treatment effect even among
unbiased estimators and even when the target parameter is low-dimensional. I
construct the dominating estimator by a variant of James-Stein shrinkage in a
high-dimensional Normal-means problem. It can be interpreted as an invariant
generalized Bayes estimator with an uninformative (improper) Jeffreys prior in
the target parameter.
arXiv link: http://arxiv.org/abs/1708.06436v2
Comparing distributions by multiple testing across quantiles or CDF values
quantiles or values there is a statistically significant difference. This
provides more information than the binary "reject" or "do not reject" decision
of a global goodness-of-fit test. Framing our question as multiple testing
across the continuum of quantiles $\tau\in(0,1)$ or values $r\inR$, we
show that the Kolmogorov--Smirnov test (interpreted as a multiple testing
procedure) achieves strong control of the familywise error rate. However, its
well-known flaw of low sensitivity in the tails remains. We provide an
alternative method that retains such strong control of familywise error rate
while also having even sensitivity, i.e., equal pointwise type I error rates at
each of $n\to\infty$ order statistics across the distribution. Our one-sample
method computes instantly, using our new formula that also instantly computes
goodness-of-fit $p$-values and uniform confidence bands. To improve power, we
also propose stepdown and pre-test procedures that maintain control of the
asymptotic familywise error rate. One-sample and two-sample cases are
considered, as well as extensions to regression discontinuity designs and
conditional distributions. Simulations, empirical examples, and code are
provided.
arXiv link: http://arxiv.org/abs/1708.04658v1
Identification of Treatment Effects under Conditional Partial Independence
commonly used but nonrefutable assumption. We derive identified sets for
various treatment effect parameters under nonparametric deviations from this
conditional independence assumption. These deviations are defined via a
conditional treatment assignment probability, which makes it straightforward to
interpret. Our results can be used to assess the robustness of empirical
conclusions obtained under the baseline conditional independence assumption.
arXiv link: http://arxiv.org/abs/1707.09563v1
Econométrie et Machine Learning
a predictive model, for a variable of interest, using explanatory variables (or
features). However, these two fields developed in parallel, thus creating two
different cultures, to paraphrase Breiman (2001). The first was to build
probabilistic models to describe economic phenomena. The second uses algorithms
that will learn from their mistakes, with the aim, most often to classify
(sounds, images, etc.). Recently, however, learning models have proven to be
more effective than traditional econometric techniques (with a price to pay
less explanatory power), and above all, they manage to manage much larger data.
In this context, it becomes necessary for econometricians to understand what
these two cultures are, what opposes them and especially what brings them
closer together, in order to appropriate tools developed by the statistical
learning community to integrate them into Econometric models.
arXiv link: http://arxiv.org/abs/1708.06992v2
Smoothed GMM for quantile models
parameters identified by general conditional quantile restrictions, under much
weaker assumptions than previously seen in the literature. This includes
instrumental variables nonlinear quantile regression as a special case. More
specifically, we consider a set of unconditional moments implied by the
conditional quantile restrictions, providing conditions for local
identification. Since estimators based on the sample moments are generally
impossible to compute numerically in practice, we study feasible estimators
based on smoothed sample moments. We propose a method of moments estimator for
exactly identified models, as well as a generalized method of moments estimator
for over-identified models. We establish consistency and asymptotic normality
of both estimators under general conditions that allow for weakly dependent
data and nonlinear structural models. Simulations illustrate the finite-sample
properties of the methods. Our in-depth empirical application concerns the
consumption Euler equation derived from quantile utility maximization.
Advantages of the quantile Euler equation include robustness to fat tails,
decoupling of risk attitude from the elasticity of intertemporal substitution,
and log-linearization without any approximation error. For the four countries
we examine, the quantile estimates of discount factor and elasticity of
intertemporal substitution are economically reasonable for a range of quantiles
above the median, even when two-stage least squares estimates are not
reasonable.
arXiv link: http://arxiv.org/abs/1707.03436v2
Machine-Learning Tests for Effects on Multiple Outcomes
off-the-shelf methods from the computer-science field of machine learning to
create a "discovery engine" for data from randomized controlled trials (RCTs).
The applied problem we seek to solve is that economists invest vast resources
into carrying out RCTs, including the collection of a rich set of candidate
outcome measures. But given concerns about inference in the presence of
multiple testing, economists usually wind up exploring just a small subset of
the hypotheses that the available data could be used to test. This prevents us
from extracting as much information as possible from each RCT, which in turn
impairs our ability to develop new theories or strengthen the design of policy
interventions. Our proposed solution combines the basic intuition of reverse
regression, where the dependent variable of interest now becomes treatment
assignment itself, with methods from machine learning that use the data
themselves to flexibly identify whether there is any function of the outcomes
that predicts (or has signal about) treatment group status. This leads to
correctly-sized tests with appropriate $p$-values, which also have the
important virtue of being easy to implement in practice. One open challenge
that remains with our work is how to meaningfully interpret the signal that
these methods find.
arXiv link: http://arxiv.org/abs/1707.01473v2
Nonseparable Multinomial Choice Models in Cross-Section and Panel Data
choices among discrete alternatives. We analyze identification of binary and
multinomial choice models when the choice utilities are nonseparable in
observed attributes and multidimensional unobserved heterogeneity with
cross-section and panel data. We show that derivatives of choice probabilities
with respect to continuous attributes are weighted averages of utility
derivatives in cross-section models with exogenous heterogeneity. In the
special case of random coefficient models with an independent additive effect,
we further characterize that the probability derivative at zero is proportional
to the population mean of the coefficients. We extend the identification
results to models with endogenous heterogeneity using either a control function
or panel data. In time stationary panel models with two periods, we find that
differences over time of derivatives of choice probabilities identify utility
derivatives "on the diagonal," i.e. when the observed attributes take the same
values in the two periods. We also show that time stationarity does not
identify structural derivatives "off the diagonal" both in continuous and
multinomial choice panel models.
arXiv link: http://arxiv.org/abs/1706.08418v2
On Heckits, LATE, and Numerical Equivalence
functional form assumptions. We study parametric estimators of the local
average treatment effect (LATE) derived from a widely used class of latent
threshold crossing models and show they yield LATE estimates algebraically
equivalent to the instrumental variables (IV) estimator. Our leading example is
Heckman's (1979) two-step ("Heckit") control function estimator which, with
two-sided non-compliance, can be used to compute estimates of a variety of
causal parameters. Equivalence with IV is established for a semi-parametric
family of control function estimators and shown to hold at interior solutions
for a class of maximum likelihood estimators. Our results suggest differences
between structural and IV estimates often stem from disagreements about the
target parameter rather than from functional form assumptions per se. In cases
where equivalence fails, reporting structural estimates of LATE alongside IV
provides a simple means of assessing the credibility of structural
extrapolation exercises.
arXiv link: http://arxiv.org/abs/1706.05982v4
Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Estimation of Stochastic Volatility Models
depends on actual parameter values in terms of sampling efficiency. While draws
from the posterior utilizing the standard centered parameterization break down
when the volatility of volatility parameter in the latent state equation is
small, non-centered versions of the model show deficiencies for highly
persistent latent variable series. The novel approach of
ancillarity-sufficiency interweaving has recently been shown to aid in
overcoming these issues for a broad class of multilevel models. In this paper,
we demonstrate how such an interweaving strategy can be applied to stochastic
volatility models in order to greatly improve sampling efficiency for all
parameters and throughout the entire parameter range. Moreover, this method of
"combining best of different worlds" allows for inference for parameter
constellations that have previously been infeasible to estimate without the
need to select a particular parameterization beforehand.
arXiv link: http://arxiv.org/abs/1706.05280v1
Sampling-based vs. Design-based Uncertainty in Regression Analysis
based on data for all 50 states in the United States or on data for all visits
to a website. What is the interpretation of the estimated parameters and the
standard errors? In practice, researchers typically assume that the sample is
randomly drawn from a large population of interest and report standard errors
that are designed to capture sampling variation. This is common even in
applications where it is difficult to articulate what that population of
interest is, and how it differs from the sample. In this article, we explore an
alternative approach to inference, which is partly design-based. In a
design-based setting, the values of some of the regressors can be manipulated,
perhaps through a policy intervention. Design-based uncertainty emanates from
lack of knowledge about the values that the regression outcome would have taken
under alternative interventions. We derive standard errors that account for
design-based uncertainty instead of, or in addition to, sampling-based
uncertainty. We show that our standard errors in general are smaller than the
usual infinite-population sampling-based standard errors and provide conditions
under which they coincide.
arXiv link: http://arxiv.org/abs/1706.01778v2
Optimal sequential treatment allocation
sequentially. We study a problem in which the policy maker is not only
interested in the expected cumulative welfare but is also concerned about the
uncertainty/risk of the treatment outcomes. At the outset, the total number of
treatment assignments to be made may even be unknown. A sequential treatment
policy which attains the minimax optimal regret is proposed. We also
demonstrate that the expected number of suboptimal treatments only grows slowly
in the number of treatments. Finally, we study a setting where outcomes are
only observed with delay.
arXiv link: http://arxiv.org/abs/1705.09952v4
Inference on Breakdown Frontiers
between the set of assumptions which lead to a specific conclusion and those
which do not. In a potential outcomes model with a binary treatment, we
consider two conclusions: First, that ATE is at least a specific value (e.g.,
nonnegative) and second that the proportion of units who benefit from treatment
is at least a specific value (e.g., at least 50%). For these conclusions, we
derive the breakdown frontier for two kinds of assumptions: one which indexes
relaxations of the baseline random assignment of treatment assumption, and one
which indexes relaxations of the baseline rank invariance assumption. These
classes of assumptions nest both the point identifying assumptions of random
assignment and rank invariance and the opposite end of no constraints on
treatment selection or the dependence structure between potential outcomes.
This frontier provides a quantitative measure of robustness of conclusions to
relaxations of the baseline point identifying assumptions. We derive
$N$-consistent sample analog estimators for these frontiers. We then
provide two asymptotically valid bootstrap procedures for constructing lower
uniform confidence bands for the breakdown frontier. As a measure of
robustness, estimated breakdown frontiers and their corresponding confidence
bands can be presented alongside traditional point estimates and confidence
intervals obtained under point identifying assumptions. We illustrate this
approach in an empirical application to the effect of child soldiering on
wages. We find that sufficiently weak conclusions are robust to simultaneous
failures of rank invariance and random assignment, while some stronger
conclusions are fairly robust to failures of rank invariance but not
necessarily to relaxations of random assignment.
arXiv link: http://arxiv.org/abs/1705.04765v3
Are Unobservables Separable?
unobservables are additively separable, especially, when the former are
endogenous. This is done because it is widely recognized that identification
and estimation challenges arise when interactions between the two are allowed
for. Starting from a nonseparable IV model, where the instrumental variable is
independent of unobservables, we develop a novel nonparametric test of
separability of unobservables. The large-sample distribution of the test
statistics is nonstandard and relies on a novel Donsker-type central limit
theorem for the empirical distribution of nonparametric IV residuals, which may
be of independent interest. Using a dataset drawn from the 2015 US Consumer
Expenditure Survey, we find that the test rejects the separability in Engel
curves for most of the commodities.
arXiv link: http://arxiv.org/abs/1705.01654v4
Optimal Invariant Tests in an Instrumental Variables Regression With Heteroskedastic and Autocorrelated Errors
to derive an invariant test for the causal structural parameter. Contrary to
popular belief, we show that there exist model symmetries when equation errors
are heteroskedastic and autocorrelated (HAC). Our theory is consistent with
existing results for the homoskedastic model (Andrews, Moreira, and Stock
(2006) and Chamberlain (2007)). We use these symmetries to propose the
conditional integrated likelihood (CIL) test for the causality parameter in the
over-identified model. Theoretical and numerical findings show that the CIL
test performs well compared to other tests in terms of power and
implementation. We recommend that practitioners use the Anderson-Rubin (AR)
test in the just-identified model, and the CIL test in the over-identified
model.
arXiv link: http://arxiv.org/abs/1705.00231v2
Bootstrap-Based Inference for Cube Root Asymptotics
M-estimators exhibiting a Chernoff (1964)-type limiting distribution. For
estimators of this kind, the standard nonparametric bootstrap is inconsistent.
The method proposed herein is based on the nonparametric bootstrap, but
restores consistency by altering the shape of the criterion function defining
the estimator whose distribution we seek to approximate. This modification
leads to a generic and easy-to-implement resampling method for inference that
is conceptually distinct from other available distributional approximations. We
illustrate the applicability of our results with four examples in econometrics
and machine learning.
arXiv link: http://arxiv.org/abs/1704.08066v3
Sparse Bayesian vector autoregressions in huge dimensions
stochastic volatility that is capable of handling vast dimensional information
sets. Three features are introduced to permit reliable estimation of the model.
First, we assume that the reduced-form errors in the VAR feature a factor
stochastic volatility structure, allowing for conditional equation-by-equation
estimation. Second, we apply recently developed global-local shrinkage priors
to the VAR coefficients to cure the curse of dimensionality. Third, we utilize
recent innovations to efficiently sample from high-dimensional multivariate
Gaussian distributions. This makes simulation-based fully Bayesian inference
feasible when the dimensionality is large but the time series length is
moderate. We demonstrate the merits of our approach in an extensive simulation
study and apply the model to US macroeconomic data to evaluate its forecasting
capabilities.
arXiv link: http://arxiv.org/abs/1704.03239v3
Tests for qualitative features in the random coefficients model
that allows for unobserved heterogeneity in the population by modeling the
regression coefficients as random variables. Given data from this model, the
statistical challenge is to recover information about the joint density of the
random coefficients which is a multivariate and ill-posed problem. Because of
the curse of dimensionality and the ill-posedness, pointwise nonparametric
estimation of the joint density is difficult and suffers from slow convergence
rates. Larger features, such as an increase of the density along some direction
or a well-accentuated mode can, however, be much easier detected from data by
means of statistical tests. In this article, we follow this strategy and
construct tests and confidence statements for qualitative features of the joint
density, such as increases, decreases and modes. We propose a multiple testing
approach based on aggregating single tests which are designed to extract shape
information on fixed scales and directions. Using recent tools for Gaussian
approximations of multivariate empirical processes, we derive expressions for
the critical value. We apply our method to simulated and real data.
arXiv link: http://arxiv.org/abs/1704.01066v3
Model Selection for Explosive Models
AIC, BIC, HQIC) for distinguishing between the unit root model and the various
kinds of explosive models. The explosive models include the local-to-unit-root
model, the mildly explosive model and the regular explosive model. Initial
conditions with different order of magnitude are considered. Both the OLS
estimator and the indirect inference estimator are studied. It is found that
BIC and HQIC, but not AIC, consistently select the unit root model when data
come from the unit root model. When data come from the local-to-unit-root
model, both BIC and HQIC select the wrong model with probability approaching 1
while AIC has a positive probability of selecting the right model in the limit.
When data come from the regular explosive model or from the mildly explosive
model in the form of $1+n^{\alpha }/n$ with $\alpha \in (0,1)$, all three
information criteria consistently select the true model. Indirect inference
estimation can increase or decrease the probability for information criteria to
select the right model asymptotically relative to OLS, depending on the
information criteria and the true model. Simulation results confirm our
asymptotic results in finite sample.
arXiv link: http://arxiv.org/abs/1703.02720v1
$L_2$Boosting for Economic Applications
number of parameters $p$ is high compared to the number of observations $n$ or
even larger, are available for applied researchers. Boosting algorithms
represent one of the major advances in machine learning and statistics in
recent years and are suitable for the analysis of such data sets. While Lasso
has been applied very successfully for high-dimensional data sets in Economics,
boosting has been underutilized in this field, although it has been proven very
powerful in fields like Biostatistics and Pattern Recognition. We attribute
this to missing theoretical results for boosting. The goal of this paper is to
fill this gap and show that boosting is a competitive method for inference of a
treatment effect or instrumental variable (IV) estimation in a high-dimensional
setting. First, we present the $L_2$Boosting with componentwise least squares
algorithm and variants which are tailored for regression problems which are the
workhorse for most Econometric problems. Then we show how $L_2$Boosting can be
used for estimation of treatment effects and IV estimation. We highlight the
methods and illustrate them with simulations and empirical examples. For
further results and technical details we refer to Luo and Spindler (2016, 2017)
and to the online supplement of the paper.
arXiv link: http://arxiv.org/abs/1702.03244v1
Policy Learning with Observational Data
treatment assignment policy that satisfies application-specific constraints,
such as budget, fairness, simplicity, or other functional form constraints. For
example, policies may be restricted to take the form of decision trees based on
a limited set of easily observable individual characteristics. We propose a new
approach to this problem motivated by the theory of semiparametrically
efficient estimation. Our method can be used to optimize either binary
treatments or infinitesimal nudges to continuous treatments, and can leverage
observational data where causal effects are identified using a variety of
strategies, including selection on observables and instrumental variables.
Given a doubly robust estimator of the causal effect of assigning everyone to
treatment, we develop an algorithm for choosing whom to treat, and establish
strong guarantees for the asymptotic utilitarian regret of the resulting
policy.
arXiv link: http://arxiv.org/abs/1702.02896v6
Estimating Average Treatment Effects: Supplementary Analyses and Remaining Challenges
effects under unconfounded treatment assignment in settings with a fixed number
of covariates. More recently attention has focused on settings with a large
number of covariates. In this paper we extend lessons from the earlier
literature to this new setting. We propose that in addition to reporting point
estimates and standard errors, researchers report results from a number of
supplementary analyses to assist in assessing the credibility of their
estimates.
arXiv link: http://arxiv.org/abs/1702.01250v1
Representation of I(1) and I(2) autoregressive Hilbertian processes
vector autoregressive processes to accommodate processes that take values in an
arbitrary complex separable Hilbert space. This more general setting is of
central relevance for statistical applications involving functional time
series. We first obtain a range of necessary and sufficient conditions for a
pole in the inverse of a holomorphic index-zero Fredholm operator pencil to be
of first or second order. Those conditions form the basis for our development
of I(1) and I(2) representations of autoregressive Hilbertian processes.
Cointegrating and attractor subspaces are characterized in terms of the
behavior of the autoregressive operator pencil in a neighborhood of one.
arXiv link: http://arxiv.org/abs/1701.08149v4