June 24, 2026 · Shipping strategy

What to Centralize and How: A Field-Level Guide to Logistics Data for Shippers and 3PLs

Yuzhao (David) Zhang Transportation strategy · Greater Toronto Area

What you'll get from this article

Know exactly which fields to collect for each dataset. Not just "shipment data" — the specific fields in the shipment record, the invoice line record, and the rate record that make freight cost questions answerable.
Understand how to link invoices to shipments in practice. The matching key problem is where most data projects stall. Here is a priority order for linking, and what to do when the obvious key is missing.
Clean up reference data before it breaks the analysis. The same carrier appearing under four different names across five source files kills every query. Here is what to normalize and how.
Build exception flags that surface real cost problems. Specific rules and thresholds — not "flag variances," but what condition, what threshold, and what action each flag should drive.
See what the finished master record looks like. A working field structure that connects the shipment, the invoice, and the rate agreement in one row.
Understand what 3PLs need to add. Additional fields for margin visibility and customer billing that shippers do not need but 3PLs cannot skip.

The decision to centralize logistics data is easy. The harder question is: exactly what fields go in each dataset, how do you connect them, and what do you build on top? This article answers those questions at the level of the work itself.

If you are looking for the case for why centralizing logistics data matters in the first place, start with Your Freight Invoices Are Too Expensive — But First, Centralize the Data. This article picks up where that one ends: you have decided to build the dataset. Now what, specifically, does that mean?

The shipment record: the backbone everything else connects to

Every other dataset links back to the shipment. If the shipment record is incomplete or inconsistently structured, every downstream analysis inherits the problem. Before collecting anything else, get the shipment record right.

The shipment record answers the most basic question in freight cost analysis: what actually moved? Without it, the invoice is a charge with no context. With it, every charge can be matched to a specific movement, a specific carrier, a specific customer, and a specific lane.

Field	What it captures	Why it matters
Shipment ID	Your internal reference for this movement	Primary key — every other record links to this
Order number	The customer order that generated the shipment	Connects freight cost back to the sale and customer
Carrier PRO / tracking number	The carrier's own reference for the shipment	Required to match against carrier invoices and tracking events
Bill of lading number	The shipping document reference	Fallback matching key when PRO is missing from invoice
Carrier name (normalized)	The carrier used for this movement	Must be consistent across all records — see reference data section
Service level code	Standard ground, expedited, LTL, etc.	Required to validate that the billed service matches what was tendered
Origin postal code / city	Where the shipment was picked up	Lane definition and zone calculation
Destination postal code / city	Where the shipment was delivered	Lane definition, zone calculation, and residential flag
Location type (destination)	Commercial, residential, limited access	Determines which accessorial surcharges are expected vs unexpected
Customer ID	Which customer this shipment belongs to	Required to analyze freight cost by customer
Tendered weight	The weight recorded at the warehouse when the shipment was booked	Baseline for spotting carrier reweigh adjustments on the invoice
Number of pieces / pallets	Piece count or pallet count	LTL pricing is often per-hundredweight; piece count affects minimum charges
Freight class (LTL)	NMFC class used at booking	Baseline for spotting carrier reclass adjustments; see LTL reweigh and reclass charges explained
Scheduled pickup date	The date the carrier was expected to pick up	Identifies late pickups that may trigger re-pickup or delay fees
Actual pickup date	The date the carrier actually picked up	Identifies missed pickup windows
Actual delivery date	The date the shipment was delivered	On-time delivery tracking; fuel surcharge effective date calculation
Quoted freight cost	The rate that was shown at booking	Foundation for quote-to-invoice variance tracking

A common gap: teams collect shipment header data (carrier, origin, destination, dates) but skip tendered weight, freight class, and quoted cost. Those three fields are required for any meaningful audit. Without tendered weight, you cannot detect a reweigh. Without freight class, you cannot detect a reclass. Without quoted cost, invoice variance is invisible.

The invoice record: line level is not optional

Invoice data collected as monthly totals — even by carrier — cannot support freight cost analysis. The cost story lives inside individual charge lines, not in the summary. A carrier invoice for $420 tells you nothing. A carrier invoice showing $280 base freight, $62 fuel surcharge, $48 liftgate delivery, and $30 appointment fee tells you four separate things, each with a different explanation and a different owner.

The invoice record requires one row per charge line, not one row per invoice. If your source data arrives as invoice-level totals, push back before accepting it. The line-level breakdown is not more data for its own sake — it is the only level at which accessorial patterns, rate discrepancies, and billing errors are visible.

Field	What it captures	Notes
Invoice number	The carrier's invoice reference	Required for duplicate detection — same invoice number from the same carrier on two records is a flag
Invoice date	The date the carrier issued the invoice	Used to match the correct fuel surcharge table and rate agreement effective period
Carrier PRO / tracking number	The carrier's shipment reference on this invoice	Primary link back to the shipment record
Charge type	Base freight, fuel surcharge, liftgate, appointment, reweigh adjustment, residential, re-delivery, etc.	Must be categorized consistently — not as "miscellaneous" or "other"
Charge amount	The dollar amount for this specific charge line	One amount per charge type per row
Billed weight	The weight the carrier used to calculate the freight charge	Compare against tendered weight in shipment record to detect reweigh
Billed freight class (LTL)	The class the carrier used for rating	Compare against tendered class in shipment record to detect reclass
Carrier name (normalized)	Which carrier issued this invoice	Must match the normalized carrier name in the shipment record exactly
Invoice currency	CAD, USD, or other	Required for accurate cost comparison across carriers on cross-border lanes
Credit or adjustment flag	Whether this line is a charge or a credit	Credits need to be matched to the original invoice they correct

One recurring problem: carrier invoices often arrive with charge types labeled inconsistently. One carrier calls it "fuel surcharge," another calls it "energy surcharge," a third buries it inside "accessorial fees." Before analysis is possible, these need to be mapped to a consistent internal charge type taxonomy. Building that mapping table once — carrier by carrier — saves hours of confusion every month going forward.

The rate record: what should have been charged

Without rate data, invoice data can only tell you what was charged. It cannot tell you whether what was charged was correct. Rate data is the baseline that turns every invoice line into a specific question: was this calculated correctly against the agreement that was in place when this shipment moved?

Rate agreements are typically stored as carrier PDFs or spreadsheet exports. The work of centralization is translating those documents into queryable records with effective dates so the right rate row applies to the right shipment.

Field	What it captures	Notes
Carrier name (normalized)	Which carrier this rate applies to	Must match exactly across all records
Service level	Standard ground, expedited, LTL standard, etc.	Rates vary by service — a single carrier may have five different rate tables
Origin zone or postal range	Where the shipment originates	Rate tables are often zone-based, not lane-specific
Destination zone or postal range	Where the shipment is going	Zone-to-zone matrix is the most common format
Weight break	The weight range this rate row covers	Rates typically step down per-unit as weight increases
Base rate	The agreed linehaul rate for this zone/weight combination	May be expressed as a flat amount or a rate per 100 lbs (cwt)
Discount percentage (if applicable)	The negotiated discount applied to the carrier's published tariff	Many contracts are structured as "X% off tariff" rather than absolute rates
Minimum charge	The floor below which no shipment is billed	Light shipments on short lanes frequently hit the minimum — this needs to be modeled, not ignored
Fuel surcharge table reference	Which fuel index and rate table governs fuel for this agreement	Fuel surcharge is recalculated weekly or monthly — the table and effective period must be captured
Accessorial rate by type	The agreed rate for each accessorial charge	Liftgate, appointment, residential, wait time, limited access — each needs its own agreed rate row
Effective date (start)	The date this rate came into effect	Critical — without this, the wrong rate row is applied to historical shipments
Effective date (end)	The date this rate was superseded	Open-ended for the current agreement; filled when a new agreement replaces it

The effective date pair is the most commonly skipped field and the one that causes the most analysis errors. A rate that changed in March should not be applied to January shipments. If the effective dates are not captured, every historical comparison is potentially wrong.

The linking problem: how to connect invoices to shipments

The most common place a data centralization effort stalls is the match between invoice and shipment. The two records come from different systems, use different reference numbers, and were created by different teams at different moments in time. Getting them to connect reliably is not automatic.

Use this priority order when matching.

Priority	Matching key	Reliability	Common problems
1	Carrier PRO number — match between invoice PRO and shipment PRO	High	PRO not captured in TMS at booking; PRO assigned after pickup and not updated in source system
2	Bill of lading number — match between invoice BOL and shipment BOL	Medium–high	BOL number formats vary; some carriers use the customer's BOL, some assign their own
3	Order number — match between invoice reference field and order number in shipment record	Medium	Not all carriers include order number on invoice; format may differ (leading zeros, prefixes)
4	Composite key — carrier + destination + pickup date + weight range	Low–medium	Multiple shipments to same destination on same date create ambiguous matches
Unmatched	Invoice with no shipment match after all keys exhausted	—	Flag for manual review — may be duplicate invoice, ghost shipment, or data entry error

Every invoice that cannot be matched to a shipment after all four options are exhausted should be flagged before payment. A carrier invoice with no corresponding shipment in your system is either a billing error, a duplicate, or a shipment that was not captured in the source data. None of those should be paid without investigation.

The match rate — the percentage of invoices that link to a shipment record — is itself a useful metric. A match rate below 90 percent in the first month of data work is common. Below 75 percent means there is a source data problem worth investigating before building anything on top.

Reference data: the foundation that makes everything else work

Reference data is the set of standard values that allow records from different sources to be compared. When two records use different values for the same thing — "UPS" versus "United Parcel Service" versus "UPS Ground," or "residential" versus "resi" versus "home delivery" — every query that groups by carrier or location type produces wrong results.

Reference data cleanup is unglamorous work. It is also the work that determines whether every analysis that follows produces numbers you can trust.

Four reference categories need standardization before analysis is possible.

Carrier names. Pull every unique value in the carrier field across all source files. Map each variant to a single canonical name. Common problems: "FedEx Freight" and "FedEx LTL" and "Federal Express" appearing as three separate carriers; regional carriers appearing under both their own name and the name of a parent company they were acquired by. Build a mapping table — raw value on one side, canonical name on the other — and apply it every time data is loaded.

Service level codes. Different systems use different labels for the same service. Your TMS may call it "Ground Residential." The carrier's invoice may call it "Home Delivery." Your rate agreement may define it as "SVC-R." All three need to resolve to the same internal code. Without this, service-level analysis produces nonsense.

Location types. The residential vs commercial distinction drives a significant share of accessorial charges. The location type field in the shipment record needs to be a controlled value — not free text — and it needs to be accurate at booking, not corrected after the carrier charges the surcharge. Common values: Commercial Dock, Residential, Limited Access, Trade Show, Construction Site, Government Facility. If this field is empty or inconsistently populated, residential surcharge analysis is impossible.

Customer and location identifiers. The same delivery address may appear as "Costco Warehouse — Mississauga" in one system and "Costco — 2150 Argentia Road" in another. If location identifiers are not standardized, accessorial analysis by delivery location does not work. Build a location master — canonical name, address, postal code, location type — and use the location ID rather than free-text address fields as the analytical key.

Additional fields for 3PLs: the margin layer

A shipper needs to know what freight cost and whether it was correct. A third-party logistics provider needs to know all of that plus what was billed to the customer for the same movement — and whether carrier cost plus margin is running above or below what was charged.

Without the customer billing fields alongside the carrier cost fields, a 3PL can only see cost, not profitability. Accounts that look active can be running at a loss, and the problem is only visible after the accounting cycle closes.

Field	What it captures	Why it matters for 3PLs
Customer billing rate	The rate the 3PL charged the customer for this movement	Revenue side of the margin calculation
Customer billing amount (total)	What the customer was actually invoiced for this shipment	Including any accessorials passed through to the customer
Accessorial pass-through flag	Whether each accessorial was billed to the customer or absorbed by the 3PL	Identifies which charge types are being systematically absorbed and at what cost
Carrier cost (total for shipment)	What the carrier charged the 3PL for this movement	Cost side of the margin calculation
Gross margin per shipment	Customer billing amount minus carrier cost	Calculated field — the core profitability metric at the shipment level
Customer account ID	Which 3PL customer this shipment belongs to	Required to roll up margin by account and surface accounts running below target
Customer rate agreement effective date	Which customer billing agreement was in effect at time of shipment	Required to validate that the customer was billed under the correct rate agreement

The most common 3PL margin problem I see: a carrier implements a rate increase or adds a new accessorial, the 3PL's carrier cost goes up, but the customer billing rate is not updated because nobody connected the carrier invoice changes to the customer rate review process. Margin on those accounts compresses silently until someone runs the numbers. With the margin fields in the dataset, that compression is visible in real time rather than after the quarter closes.

Exception flags: what to build and what thresholds to use

Exception flags are where the connected dataset becomes action. A flag is a rule applied to the data that surfaces a record requiring review. The goal is not to flag everything — that creates noise. The goal is to flag the specific conditions that represent real cost problems, billing errors, or operational failures.

The following flags cover the majority of recoverable freight cost issues. Each one needs a defined condition, a threshold, and a clear next step.

Flag	Condition	Suggested threshold	Next step
Duplicate invoice	Same invoice number from the same carrier appears more than once	Any duplicate — no tolerance	Hold second instance; request carrier confirmation before payment
Unmatched invoice	Invoice has no matching shipment record after all four matching keys are exhausted	Any unmatched — no tolerance	Hold for manual review; do not pay until shipment is confirmed
Rate discrepancy	Billed base rate does not match the calculated rate from the agreed rate file for this carrier, zone, weight, and effective date	Variance greater than $10 or 3% of base charge, whichever is larger	Flag for dispute; document the correct rate and the actual billed rate
Carrier reweigh	Billed weight on invoice is higher than tendered weight on shipment record	Any positive difference; prioritize review where weight difference exceeds 10% or $25	Request carrier reweigh documentation; compare against warehouse scale record
Carrier reclass (LTL)	Billed freight class on invoice does not match tendered freight class on shipment record	Any mismatch	Verify correct NMFC classification; dispute if tendered class was correct
Unexpected residential surcharge	Residential delivery charge billed for a location flagged as commercial in the shipment record	Any mismatch	Confirm delivery address type; dispute if address is confirmed commercial
Quote-to-invoice variance	Total invoiced amount is higher than quoted amount on shipment record	Variance greater than $30 or 10% of quoted amount	Identify which charge types created the variance; classify as expected, explainable, or disputed
Accessorial concentration	The same accessorial charge type appears more than N times for the same delivery location in a rolling 90-day window	Five or more occurrences of the same charge type at the same location	Review whether the charge reflects a permanent location characteristic that should be flagged at booking
3PL margin below threshold	Gross margin per shipment (customer billing minus carrier cost) falls below target	Below agreed margin floor for that customer account or lane	Review whether carrier cost increase was passed through to customer billing

A few practical notes on thresholds. Start conservative — flag fewer things with higher confidence, not everything with low confidence. A team that receives 300 flags per week and investigates none of them has the same outcome as a team with no flags. The threshold should match the investigation capacity available. Adjust over time as the team develops a sense of which flags are producing genuine recoveries and which are noise.

What the master record looks like

Once the shipment record, invoice record, and rate record are connected and the exception flags are applied, the result is a master record — one row per invoice charge line, linked to the shipment, with the rate baseline and exception status alongside it.

This is what the structure looks like in practice.

Category	Key fields in this row
Shipment identity	Shipment ID, order number, carrier PRO, BOL number
Carrier and service	Carrier name (normalized), service level code
Lane	Origin postal code, destination postal code, destination location type
Customer	Customer ID, delivery location ID
Shipment facts	Tendered weight, freight class, piece count, pickup date, delivery date, quoted amount
Invoice line	Invoice number, invoice date, charge type (normalized), charge amount, billed weight, billed class
Rate baseline	Expected rate for this charge type, variance amount, variance percentage
Exception flags	Duplicate flag, unmatched flag, rate discrepancy flag, reweigh flag, reclass flag, residential mismatch flag, quote variance flag
3PL fields (if applicable)	Customer billing amount, carrier cost, gross margin, accessorial pass-through flag

One row in this structure contains everything needed to answer whether a charge was correct, whether it was expected, and who should act on it. A query that filters for any row where a flag is active gives the exception queue. A query that groups by carrier and charge type gives the cost breakdown. A query that groups by customer and location gives the accessorial pattern by delivery point.

The four-week build: specific tasks and deliverables

The following plan assumes you are starting from scratch — carrier invoices in email or accounting software, shipment data in a TMS or spreadsheet, rate agreements in PDFs. Adjust based on what source data you already have in accessible form.

Week 1 — Source inventory and invoice extraction. List every carrier, every source system, and every data format. For each carrier, identify how invoices arrive: email attachment, portal download, EDI file, or accounting system export. Pull three months of invoices from each carrier. For each invoice, extract to a spreadsheet with one row per charge line. Build the charge type mapping table — what each carrier calls each charge type, mapped to your internal taxonomy. Deliverable: a flat file per carrier, three months of invoice lines, charge types normalized.

Week 2 — Shipment record extraction and invoice matching. Export shipment records for the same three-month window from your TMS, WMS, or order management system. Confirm which fields are populated and which are missing. Attempt the invoice-to-shipment match using the priority key order above. Calculate your match rate. For unmatched invoices, document the reason: missing PRO in TMS, no order reference on invoice, or no shipment record at all. Deliverable: matched invoice-shipment pairs, unmatched invoice list with reason codes, initial match rate metric.

Week 3 — Rate data and reference cleanup. Translate carrier rate agreements into a queryable rate table with effective dates. Build the carrier name mapping table and apply it to all three source files. Standardize service level codes and location type values. Link each matched shipment-invoice pair to the applicable rate row and calculate the expected charge. Flag rows where the billed amount does not match the expected amount above the threshold. Deliverable: rate baseline applied to all matched records, rate discrepancy flags active, reference data normalized.

Week 4 — Exception view and first review cycle. Apply all exception flags from the table above. Generate the exception queue — all flagged rows, sorted by variance amount. Run the first review cycle: for each flagged record, classify as confirmed error (dispute), operational explanation (document and accept), or requires more information. For the accessorial concentration flag, identify which delivery locations appear most frequently and whether the charge type reflects a permanent location characteristic that should be pre-flagged at booking. Deliverable: first live exception queue, first set of dispute candidates, first view of accessorial patterns by location.

What not to build before the data is clean

Two things are consistently attempted before the data is ready and produce results that waste more time than they save.

Dashboards on dirty data. A chart showing freight cost by carrier built on inconsistent carrier names will show the same carrier split across three bars with three different labels. The bar chart looks finished. The numbers are wrong. Clean the carrier name reference table first, then build the visualization.

AI or automation on unlinked records. Any tool that reads invoice data to detect anomalies, suggest disputes, or flag duplicates requires the invoice to be linked to the shipment. An AI that reads an invoice line in isolation cannot tell whether the liftgate charge was expected or unexpected — it does not know whether the delivery location was flagged as commercial. Connect the records first. The automation or analysis layer comes after.

For more on how logistics data reaches shippers in the first place — and why different carriers deliver it in different formats — see Why is logistics data so hard to get? EDI vs API explained.

The dataset is the foundation; the analysis comes after

The work described in this article is not glamorous. It is field mapping, reference table building, key matching, and exception rule writing. It does not produce a visible output in week one. It does not immediately answer the question of why freight cost went up last month.

But it is the work that makes every other effort possible. Without the specific fields in the shipment record, reweigh disputes are guesswork. Without the invoice at the line level, accessorial patterns are invisible. Without the rate record with effective dates, the audit compares charges to the wrong baseline. Without the reference data normalized, every query groups incorrectly. Without the exception flags, the review cycle is a manual search through thousands of rows.

Four weeks of careful data work produces a connected dataset that can be queried, audited, and acted on. It does not require a new system. It requires knowing what fields to collect, how to connect them, and what rules to apply on top.

That is what freight cost control actually looks like at the data level.

The questions are easy to ask. It is the data structure underneath that determines whether they can be answered.

If you are building this dataset and want a second opinion on structure, field choices, or exception logic, feel free to reach out at dzhang@stridedata.com.

— Yuzhao (David) Zhang