# Methodology and Data Sources

## General Principles

1. All data sources are documented with provenance, access date, and version.
2. No external data is fetched or incorporated without explicit user approval.
3. Raw data is preserved unchanged in `data/raw/`; all transformations are scripted and reproducible.
4. Final analytical outputs are produced in R (numbered scripts `00_run_all.R` through `05_figures.R`) and rendered via Quarto for independent verification and replication.

---

## World Bank Instrument Types in This Dataset

The dataset combines recommendations from two distinct World Bank instrument families. Their differences are fundamental to interpreting the analysis.

### PFR / PER (Public Finance Review / Public Expenditure Review)

| Dimension | Detail |
|-----------|--------|
| **Category** | AAA (Advisory and Analytical Activities) — non-lending |
| **Purpose** | Diagnostic analysis of public finances: revenue systems, expenditure efficiency, fiscal sustainability, tax policy gaps |
| **Recommendations** | Findings, options, and recommendations; government is free to accept, modify, or ignore |
| **Bindingness** | Non-binding; purely advisory. No financial consequence for non-adoption |
| **Frequency** | Irregular; typically once every 5–10 years per country; sectoral PFRs may be more frequent |
| **Naming** | PER is the earlier name; PFR is the current designation after a World Bank renaming |
| **Also includes** | Economic Updates, Economic Monitors, and other analytical products coded as "PFR" in this dataset |

### DPO / DPF (Development Policy Operation / Development Policy Financing)

| Dimension | Detail |
|-----------|--------|
| **Category** | Lending instrument — financial operation |
| **Purpose** | Disburse budget support in exchange for government implementation of agreed policy reforms |
| **Conditionality** | Reforms are codified as "prior actions" — specific, verifiable policy steps the government must complete before each tranche is disbursed |
| **Bindingness** | Legally binding in operational terms: disbursement contingent on prior action completion |
| **Frequency** | Series-based (typically 2–3 tranches over 2–4 years); may be repeated in successive series |
| **Sub-types** | DPL (Development Policy Loan), DPL/PRSC (Poverty Reduction Support Credit), DPF (current umbrella term) |

### Why This Distinction Matters

PFRs function as the **diagnostic upstream layer**: they identify tax policy weaknesses and propose reform options. DPOs represent the **operational downstream layer**: selected recommendations are translated into time-bound prior actions with financial incentives attached. A recommendation appearing only in a PFR carries no enforcement mechanism. The same recommendation reappearing as a DPO prior action signals formal government commitment to act. Tracking PFR→DPO migration is a reliable signal of genuine reform momentum.

**Implication for this analysis:** Counts of recommendations are not equivalent across instruments. A DPO prior action represents a binding reform commitment; a PFR recommendation represents analytical advice. Aggregating them gives the full scope of World Bank engagement, but instrument-disaggregated analysis (Task 2D) is essential for interpretation.

---

## Segment 1: Tax Advisory History

**Objective:** Map World Bank taxation advisory interventions in EAP countries over the past 20 years. Identify what the Bank has recommended, how focus has shifted over time, and where gaps exist in thematic and geographic coverage.

**Data Sources:**
1. **DPO Prior Actions (FY2004–FY2024):** 59 entries (D01–D59) from the OPCS Development Policy Actions Database (DPAD), filtered to EAP region and theme code 114 (Public Administration — Tax Policy). Extracted and classified by user.
2. **PFR/PER Recommendations (2015–2026):** 66 entries (P01–P66) from World Bank PFRs, PERs, Economic Updates, and Economic Monitors covering EAP countries. Extracted and classified by user.
3. **Overlap:** 5 entries appear in both instruments (tagged Source = "Both"). Total unique rows: 125.

**Countries (19 + 2 multi-country):**
Indonesia, Philippines, Vietnam, Thailand, Malaysia, Myanmar, Cambodia, Lao PDR, Mongolia, Papua New Guinea, Samoa, Tonga, Fiji, Solomon Islands, Kiribati, Tuvalu, Vanuatu; plus EAP Region (multi-country) and Pacific Islands 9 (multi-country PFR).

**Taxonomy — Tax Categories (10):**
VAT/GST | CIT General | CIT Incentives | CIT International | PIT General | Capital Income Tax | Property Tax | Excise Tax | Wealth Tax (zero entries) | Other (SME tax, extractives, subnational, tax administration, GAAR)

**Taxonomy — Direction (8 primary, collapsed from 21 raw values):**
Increase Rate | Decrease Rate | Expand Base | Narrow Base | Introduce New Tax | Rationalize/Simplify | Improve Administration | Remove Tax

**Direction Collapsing Rule:** Compound directions (e.g., "Increase Rate / Expand Base") are mapped to the first-listed (primary) direction. Original compound values are preserved in the raw dataset for reference.

**Key Fields:** Entry_ID, Source, FY_Year, Country, Sub_Region, Income_Group, Instrument_Type, Document_Title, Unified_Tax_Category, Tax_Subcategory, Direction, Direction_Primary (collapsed), Rate_Change_Detail, Base_Change, SEZ_Flag, Recommendation_Summary, Estimated_Fiscal_Impact

**Year field caveat:** DPO entries use World Bank Fiscal Year (July–June); PFR entries use calendar year of publication.

---

## Segment 2: Taxation Trends

**Objective:** Analyze actual tax revenue outcomes for EAP countries over the recommendation period. Compare advisory activity (Segment 1) with revenue performance to assess implementation and identify gaps.

**Analytical tasks (Agent 3 — Revenue Statistics):**
- Pull tax revenue data from WDI and IMF GFS (subject to user approval)
- Compute tax-to-GDP ratios and component trends per country (VAT, CIT, PIT, excise, property)
- Assess whether revenue from a given tax category changed following a recommendation
- Flag implementation gaps (recommendation made, revenue did not respond)
- Flag organic improvements (revenue improved without a recommendation)

**Data Sources:**
- *Pending user approval to fetch external data*

---

## Segment 3: Inequality (WID)

**Objective:** Document inequality patterns across the 17 EAP economies and an 83-country comparison set, both for personal income (top 10% / top 1% / bottom 50% shares, Gini) and for the capital-vs-labour factor split. The segment is descriptive and correlational; it does NOT attempt causal identification of tax → inequality.

**Data Sources:**
- World Inequality Database (https://wid.world/), accessed via the `wid` R package (Tier-1-direct exemption analogous to `wbstats`). Approved as P13.
- Pre-tax national income shares: top 10%, top 1%, bottom 50%, Gini (`sptinc992j`, `gptinc992j`).
- Factor shares: capital share and labour share of national income (codes verified against WID metadata at fetch time).

**Time period:** 1995–2023 (long-run trend window; subset to 2004–2024 for joins with Segment 2).

## Segment 4: Tax Mix for the Future

**Objective:** Apply Stochastic Frontier Analysis (SFA) to estimate each EAP economy's tax-effort frontier and capacity gap; project revenue gains under three 2030 reform scenarios; produce a country-by-country priority list of tax-policy reforms; and finally a cross-cutting synthesis section that integrates Segments 1–3.

**Data Sources:**
- All Segment 2 fetched data (P1–P5, P10).
- New: WDI structural controls (P11 EAP-17, P12 donor pool 76) — agriculture share, urbanisation, trade openness, natural resource rents, population.
- All Segment 3 inequality data (P13).
