# SEC Reference Data Fields vs. openfunds Data Model ## 1. Overview of SEC Structured Data Sources The SEC provides **four distinct structured data sources** that contain reference data for US-registered funds. Each covers different aspects: | Source | Form | Content | Format | Granularity | |--------|------|---------|--------|-------------| | **Series/Class CSV** | — | Identity & identifiers | CSV/XML | Trust → Series → Class | | **XBRL Risk/Return** | N-1A (485BPOS, 497K) | Prospectus-derived structured data | XBRL → flat files | Series & Class level | | **N-PORT Data Sets** | NPORT-P | Portfolio holdings & fund financials | XML → flat files | Series & Holding level | | **Submissions API** | — | Filing history metadata | JSON | Entity (CIK) level | --- ## 2. Complete Field Inventory by SEC Source ### 2.1 Series/Class Reference CSV This is the **identity backbone** — maps the hierarchy of trust → fund → share class. | Field | Description | openfunds Equivalent | |-------|-------------|---------------------| | `Reporting File Number` | 811-XXXXX Investment Co. Act number | — (no direct equivalent) | | `CIK Number` | 10-digit SEC entity identifier | — (SEC-specific) | | `Entity Name` | Trust/investment company name | OFST005010 Umbrella Name | | `Entity Org Type` | Organization type code | OFST160100 Legal Form | | `Series ID` | S###### fund series identifier | — (SEC-specific) | | `Series Name` | Fund name | OFST010110 Legal Fund Name Only | | `Class ID` | C###### share class identifier | — (SEC-specific) | | `Class Name` | Share class name (e.g. "Admiral Shares") | OFST020060 Full Share Class Name | | `Class Ticker` | Exchange ticker symbol | OFST020020 Bloomberg Code (partial) | | `Address_1, Address_2, City, State, Zip Code` | Registrant address | — | **Coverage**: ~15,000+ investment company trusts, ~50,000+ series, ~100,000+ classes. ### 2.2 XBRL Risk/Return Summary (from Prospectus — the richest source) This dataset is **extracted from prospectus XBRL filings** and is the closest to what openfunds covers. It contains the structured data that prospectuses specify. #### A. Fund Identity & Structure | XBRL Element | Description | Data Type | openfunds Equivalent | |-------------|-------------|-----------|---------------------| | `RiskReturnHeading` | Prospectus section heading | Text | — | | `ObjectiveHeading` | Heading of objectives section | Text | — | | `ObjectivePrimaryTextBlock` | Investment objective narrative | Text Block | OFST010300 Investment Objective | | `ObjectiveSecondaryTextBlock` | Additional objective detail | Text Block | OFST010300 Investment Objective | | `StrategyHeading` | Heading of strategy section | Text | — | | `StrategyNarrativeTextBlock` | Principal investment strategies | Text Block | — (no single openfunds equivalent) | #### B. Fee & Expense Data (Shareholder Fees — paid directly by investor) | XBRL Element | Description | Data Type | openfunds Equivalent | |-------------|-------------|-----------|---------------------| | `MaximumSalesChargeImposedOnPurchasesOverOfferingPrice` | Front-end load | Ratio | OFST451320 Max Subscription Fee In Favour Of Distributor | | `MaximumDeferredSalesChargeOverOfferingPrice` | Back-end load (CDSC) | Ratio | OFST451391 Contingent Deferred Sales Charge Exit Fee | | `MaximumDeferredSalesChargeOverOther` | CDSC on other basis | Ratio | OFST451392 Contingent Deferred Sales Charge Upfront Fee | | `MaximumSalesChargeOnReinvestedDividendsAndDistributions` | Load on reinvested dividends | Ratio | — | | `RedemptionFeeOverRedemption` | Redemption fee (% of amount) | Ratio | OFST451440 Max Redemption Fee In Favour Of Fund | | `RedemptionFee` | Redemption fee (flat $) | Monetary | OFST451439 Min Redemption Fee In Favour Of Fund | | `ExchangeFeeOverRedemption` | Exchange fee (% of amount) | Ratio | — | | `ExchangeFee` | Exchange fee (flat $) | Monetary | — | | `MaximumAccountFeeOverAssets` | Account maintenance fee (%) | Ratio | — | | `MaximumAccountFee` | Account maintenance fee ($) | Monetary | — | | `MaximumCumulativeSalesChargeOverOfferingPrice` | Cumulative max sales charge | Ratio | — | #### C. Annual Fund Operating Expenses (ongoing costs deducted from fund assets) | XBRL Element | Description | Data Type | openfunds Equivalent | |-------------|-------------|-----------|---------------------| | `ManagementFeesOverAssets` | Management fee | Ratio | OFST452010 Management Fee Maximum | | `DistributionAndService12b1FeesOverAssets` | 12b-1 distribution fee | Ratio | OFST454165 Distribution Fee Maximum | | `Component1OtherExpensesOverAssets` | Other expense component 1 | Ratio | — | | `Component2OtherExpensesOverAssets` | Other expense component 2 | Ratio | — | | `Component3OtherExpensesOverAssets` | Other expense component 3 | Ratio | — | | `OtherExpensesOverAssets` | Total other expenses | Ratio | — | | `AcquiredFundFeesAndExpensesOverAssets` | Acquired fund fees (fund-of-funds) | Ratio | — | | `ExpensesOverAssets` | **Total Annual Fund Operating Expenses** | Ratio | OFST452100 TER Excluding Performance Fee | | `FeeWaiverOrReimbursementOverAssets` | Fee waiver/reimbursement | Ratio | — | | `NetExpensesOverAssets` | Net expenses after waivers | Ratio | OFST452200 Ongoing Charges | #### D. Expense Example (hypothetical cost projections) | XBRL Element | Description | Data Type | openfunds Equivalent | |-------------|-------------|-----------|---------------------| | `ExpenseExampleYear01` | Cost for $10K after 1 year | Monetary | — | | `ExpenseExampleYear03` | Cost for $10K after 3 years | Monetary | — | | `ExpenseExampleYear05` | Cost for $10K after 5 years | Monetary | — | | `ExpenseExampleYear10` | Cost for $10K after 10 years | Monetary | — | | `ExpenseExampleNoRedemptionYear01` | Cost if no redemption, 1 year | Monetary | — | | `ExpenseExampleNoRedemptionYear03` | Cost if no redemption, 3 years | Monetary | — | | `ExpenseExampleNoRedemptionYear05` | Cost if no redemption, 5 years | Monetary | — | | `ExpenseExampleNoRedemptionYear10` | Cost if no redemption, 10 years | Monetary | — | #### E. Performance Data | XBRL Element | Description | Data Type | openfunds Equivalent | |-------------|-------------|-----------|---------------------| | `AnnualReturn[YYYY]` | Annual return for calendar year | Ratio | OFDY025000-range (Performance data) | | `BarChartHighestQuarterlyReturn` | Best quarter return | Ratio | — | | `BarChartLowestQuarterlyReturn` | Worst quarter return | Ratio | — | | `BarChartHighestQuarterlyReturnDate` | Date of best quarter | Date | — | | `BarChartLowestQuarterlyReturnDate` | Date of worst quarter | Date | — | | `AverageAnnualReturnYear01` | Average annual return, 1 year | Ratio | OFDY025000-range | | `AverageAnnualReturnYear05` | Average annual return, 5 years | Ratio | OFDY025000-range | | `AverageAnnualReturnYear10` | Average annual return, 10 years | Ratio | OFDY025000-range | | `AverageAnnualReturnSinceInception` | Return since inception | Ratio | — | | `AverageAnnualReturnInceptionDate` | Inception date | Date | OFST020560 Share Class Launch Date | | Performance dimensions: Before Taxes, After Taxes on Distributions, After Taxes on Distributions and Sales | | | — | #### F. Risk Disclosures | XBRL Element | Description | Data Type | openfunds Equivalent | |-------------|-------------|-----------|---------------------| | `RiskHeading` | Risk section heading | Text | — | | `RiskNarrativeTextBlock` | Principal risks narrative | Text Block | — | | `RiskLoseMoney` | "You may lose money" statement | String | — | | `RiskMoneyMarketFundMayImposeFeesOrSuspendSales` | MMF gate/fee risk | Boolean | — | | `RiskMoneyMarketFundPriceFluctuates` | MMF NAV fluctuation risk | Boolean | — | | `BarChartAndPerformanceTableHeading` | Performance section heading | Text | — | | `PerformanceNarrativeTextBlock` | Performance context narrative | Text Block | — | #### G. Portfolio Turnover | XBRL Element | Description | Data Type | openfunds Equivalent | |-------------|-------------|-----------|---------------------| | `PortfolioTurnoverHeading` | Section heading | Text | — | | `PortfolioTurnoverTextBlock` | Turnover narrative | Text Block | — | | `PortfolioTurnoverRate` | Turnover rate (%) | Ratio | OFRE000025-range (Fund Ratios) | ### 2.3 N-PORT Data Sets (Portfolio Holdings — quarterly) This provides **dynamic portfolio data** not typically in openfunds static fields. #### A. Fund-Level Information (`FUND_REPORTED_INFO`) | Field | Description | openfunds Equivalent | |-------|-------------|---------------------| | `SERIES_NAME` | Fund name | OFST010110 Legal Fund Name Only | | `SERIES_ID` | SEC series identifier | — | | `SERIES_LEI` | LEI of the fund series | OFST010030 LEI Of Fund | | `TOTAL_ASSETS` | Total assets (USD) | OFDY000010-range (AuM/TNA) | | `TOTAL_LIABILITIES` | Total liabilities | — | | `NET_ASSETS` | Net assets (TNA) | OFDY000010-range | | `SALES_FLOW_MON1/2/3` | Monthly inflows | — | | `REDEMPTION_FLOW_MON1/2/3` | Monthly outflows | — | | Credit spread sensitivities (3m,1y,5y,10y,30y) | Risk measures | — | #### B. Interest Rate Risk (`INTEREST_RATE_RISK`) | Field | Description | openfunds Equivalent | |-------|-------------|---------------------| | `CURRENCY_CODE` | Currency of exposure | OFST010410 Fund Currency | | `INTRST_RATE_CHANGE_*_DV01` | DV01 by maturity bucket | — | | `INTRST_RATE_CHANGE_*_DV100` | Impact of 100bp shift | — | #### C. Monthly Returns (`MONTHLY_TOTAL_RETURN`) | Field | Description | openfunds Equivalent | |-------|-------------|---------------------| | `CLASS_ID` | Share class identifier | — | | `MONTHLY_TOTAL_RETURN1/2/3` | Monthly returns per class | OFDY025000-range | #### D. Portfolio Holdings (`FUND_REPORTED_HOLDING`) | Field | Description | openfunds Equivalent | |-------|-------------|---------------------| | `ISSUER_NAME` | Holding issuer name | OFPH-range (Portfolio Holdings) | | `ISSUER_LEI` | LEI of issuer | OFPH-range | | `ISSUER_TITLE` | Security title/description | OFPH-range | | `ISSUER_CUSIP` | CUSIP of holding | OFPH-range | | `BALANCE` | Position size | OFPH-range | | `UNIT` | Shares/principal/other | OFPH-range | | `CURRENCY_CODE` | **Currency of holding** | OFPH-range | | `CURRENCY_VALUE` | Value in reporting currency | OFPH-range | | `EXCHANGE_RATE` | FX rate applied | — | | `PERCENTAGE` | % of net assets | OFPH-range | | `PAYOFF_PROFILE` | Long/Short/N/A | OFPH-range | | `ASSET_CAT` | **Asset type classification** | OFST350000 MiFID Securities Classification (concept) | | `ISSUER_TYPE` | Corporate/Government/etc. | — | | `INVESTMENT_COUNTRY` | **Country of issuer (ISO)** | OFPH-range | | `IS_RESTRICTED_SECURITY` | Restricted security flag | — | | `FAIR_VALUE_LEVEL` | Fair value hierarchy (1/2/3) | — | #### E. Holding Identifiers (`IDENTIFIERS`) | Field | Description | openfunds Equivalent | |-------|-------------|---------------------| | `IDENTIFIER_ISIN` | **ISIN** | OFST020000 ISIN | | `IDENTIFIER_TICKER` | Ticker | — | | `OTHER_IDENTIFIER` | SEDOL, etc. | OFST020040 SEDOL | --- ## 3. Mapping: What openfunds Fields CAN Be Found in SEC Data? ### Fully Available (structured, machine-readable) | openfunds Category | openfunds OF-ID | openfunds Field | SEC Source | SEC Field | |--------------------|-----------------|-----------------|-----------|-----------| | **Key Fact: Company** | OFST001000 | Fund Group Name | Series/Class CSV | Entity Name | | **Key Fact: Umbrella** | OFST005010 | Umbrella Name | Series/Class CSV | Entity Name | | **Key Fact: Fund** | OFST010030 | LEI Of Fund | N-PORT | SERIES_LEI | | | OFST010110 | Legal Fund Name Only | Series/Class CSV + N-PORT | Series Name | | | OFST010300 | Investment Objective | XBRL R/R | ObjectivePrimaryTextBlock | | | OFST010410 | Fund Currency | N-PORT | CURRENCY_CODE (inferred) | | **Key Fact: Share Class** | OFST020000 | ISIN | N-PORT Holdings | IDENTIFIER_ISIN | | | OFST020005 | CUSIP | N-PORT Holdings | ISSUER_CUSIP | | | OFST020040 | SEDOL | N-PORT Holdings | OTHER_IDENTIFIER | | | OFST020060 | Full Share Class Name | Series/Class CSV | Class Name | | **Classification** | OFST350000 | Securities Classification | N-PORT | ASSET_CAT | | **Fees** | OFST451320 | Max Subscription Fee (Distributor) | XBRL R/R | MaximumSalesChargeImposedOnPurchasesOverOfferingPrice | | | OFST451391 | CDSC Exit Fee | XBRL R/R | MaximumDeferredSalesChargeOverOfferingPrice | | | OFST451440 | Max Redemption Fee | XBRL R/R | RedemptionFeeOverRedemption | | | OFST452010 | Management Fee Maximum | XBRL R/R | ManagementFeesOverAssets | | | OFST452100 | TER Excl. Performance Fee | XBRL R/R | ExpensesOverAssets | | | OFST452200 | Ongoing Charges | XBRL R/R | NetExpensesOverAssets | | | OFST454165 | Distribution Fee Maximum | XBRL R/R | DistributionAndService12b1FeesOverAssets | | **Performance** | OFDY025xxx | Return periods | XBRL R/R | AverageAnnualReturnYear01/05/10 | | **Dynamic: AuM** | OFDY000xxx | TNA / AuM | N-PORT | NET_ASSETS, TOTAL_ASSETS | ### Partially Available (derivable from prospectus text, not structured) These fields exist in the **text of the prospectus** but are NOT in the SEC structured datasets. They would need to be extracted by an LLM — which is exactly the use case: | openfunds OF-ID | openfunds Field | Where in Prospectus | |-----------------|-----------------|---------------------| | OFST010420 | Open-ended Or Closed-ended Fund Structure | Registration form type implies this (N-1A = open-end) | | OFST010440 | Fiscal Year End | Mentioned in prospectus text, in Submissions JSON | | OFST010500 | Is Fund Of Funds | Inferred from `AcquiredFundFeesAndExpensesOverAssets > 0` | | OFST010580 | Is ETF | Inferred from form type or share class structure | | OFST010720 | Is Passive Fund | Strategy narrative mentions "index" tracking | | OFST010730 | Management Approach Type | Strategy narrative (active/passive/enhanced) | | OFST020300 | Valuation Frequency | Prospectus "Pricing of Fund Shares" section | | OFST020400 | Distribution Policy | Prospectus "Dividends and Distributions" section | | OFST020540 | Share Class Currency | Inferred; US funds typically USD | | OFST020558 | Subscription Period Start Date | Only for closed-end or interval funds | | OFST400xxx | Minimum Investment | Prospectus "Purchase and Sale of Fund Shares" | | OFST451027 | Has Performance Fee | Prospectus fee table | | OFST451100 | Hurdle Rate | Prospectus fee table | | OFST013000 | Prospectus Date | Filing date in submissions API | ### NOT Available in SEC Data These openfunds fields are **European/international-specific** or **distribution-channel-specific** and have no SEC equivalent: | openfunds Category | Examples | |--------------------|---------| | UCITS/AIFMD fields | OFST160100 Legal Form (SICAV/FCP), OFST011200 Is UCITS With Leveraged Benchmark | | European regulatory | OFST350100 EFAMA EFC Category, OFST010075 CSSF Code | | Distribution-specific | OFST453151 Is Trailer Fee Clean, OFST451305 Applied Subscription Fee | | MiFID/PRIIPs/KID | OFEM-range (MiFID Template), OFEP-range (PRIIPs Template) | | ESG/Sustainability | OFST820xxx, OFEE-range (EU sustainability regulation specific) | | Country registrations | OFST6000XX (country-specific registration fields) | | Solvency II | OFST500xxx | | Swiss/German/UK specific | OFST700xxx | --- ## 4. Summary: Is Asset Class, Currencies, Fees, Risk Data in the SEC Dataset? | Data Category | In SEC Structured Data? | Source | Notes | |--------------|------------------------|--------|-------| | **Asset Class** | **YES** | N-PORT `ASSET_CAT` field | Values: equity, debt, derivative, etc. | | **Currencies** | **YES** | N-PORT `CURRENCY_CODE` per holding; interest rate risk by currency | Per-holding currency + fund-level | | **Fees (sales loads)** | **YES** | XBRL R/R | Front-end load, back-end load, redemption fee | | **Fees (operating expenses)** | **YES** | XBRL R/R | Management fee, 12b-1, TER, net expense ratio | | **Risk data (narrative)** | **YES** | XBRL R/R | Principal risks text block | | **Risk data (quantitative)** | **YES** | N-PORT | DV01, credit spread sensitivity, VaR | | **Performance** | **YES** | XBRL R/R + N-PORT | Annual returns, avg annual returns, monthly returns | | **Investment Objective** | **YES** | XBRL R/R | Full text of objective | | **Strategy** | **YES** | XBRL R/R | Full text of principal strategies | | **Portfolio Turnover** | **YES** | XBRL R/R | Turnover rate | | **Portfolio Holdings** | **YES** | N-PORT | Security-level: name, CUSIP, ISIN, country, asset type, value | | **Country of Issuer** | **YES** | N-PORT `INVESTMENT_COUNTRY` | ISO country code per holding | | **Minimum Investment** | **PARTIAL** | In prospectus text, not structured | LLM extraction target | | **Distribution Policy** | **PARTIAL** | In prospectus text, not structured | LLM extraction target | | **ESG/Sustainability** | **NO** | Not in SEC structured data | European regulation specific | | **UCITS Classification** | **NO** | N/A for US funds | European regulation specific | --- ## 5. Implication for LLM Training Dataset The SEC provides an excellent foundation for your LLM training dataset: ### Ground Truth (structured) — available directly from SEC: - Fee tables (management fee, expense ratio, loads, 12b-1) - Performance data (1yr, 5yr, 10yr returns) - Investment objective text - Principal risks text - Portfolio turnover rate - Total net assets - Fund/class identifiers (CIK, Series ID, Class ID, Ticker, CUSIP) ### Extraction Targets (in prospectus text, to be derived by LLM): - Minimum initial investment amounts - Distribution frequency and policy - Share class currency - Open/closed-end structure - Active vs. passive management - Benchmark index name - Tax status information - Purchase/redemption cut-off times - Settlement cycle This creates a natural supervised learning setup: the XBRL structured data serves as **labels/ground truth**, and the prospectus HTML/text serves as **input**, enabling the LLM to learn the mapping from legal language to structured reference data.