What you'll get from this article
- Know exactly which fields to collect for each dataset. Not just "shipment data" — the specific fields in the shipment record, the invoice line record, and the rate record that make freight cost questions answerable.
- Understand how to link invoices to shipments in practice. The matching key problem is where most data projects stall. Here is a priority order for linking, and what to do when the obvious key is missing.
- Clean up reference data before it breaks the analysis. The same carrier appearing under four different names across five source files kills every query. Here is what to normalize and how.
- Build exception flags that surface real cost problems. Specific rules and thresholds — not "flag variances," but what condition, what threshold, and what action each flag should drive.
- See what the finished master record looks like. A working field structure that connects the shipment, the invoice, and the rate agreement in one row.
- Understand what 3PLs need to add. Additional fields for margin visibility and customer billing that shippers do not need but 3PLs cannot skip.
The decision to centralize logistics data is easy. The harder question is: exactly what fields go in each dataset, how do you connect them, and what do you build on top? This article answers those questions at the level of the work itself.
If you are looking for the case for why centralizing logistics data matters in the first place, start with Your Freight Invoices Are Too Expensive — But First, Centralize the Data. This article picks up where that one ends: you have decided to build the dataset. Now what, specifically, does that mean?
The shipment record: the backbone everything else connects to
Every other dataset links back to the shipment. If the shipment record is incomplete or inconsistently structured, every downstream analysis inherits the problem. Before collecting anything else, get the shipment record right.
The shipment record answers the most basic question in freight cost analysis: what actually moved? Without it, the invoice is a charge with no context. With it, every charge can be matched to a specific movement, a specific carrier, a specific customer, and a specific lane.
| Field | What it captures | Why it matters |
|---|---|---|
| Shipment ID | Your internal reference for this movement | Primary key — every other record links to this |
| Order number | The customer order that generated the shipment | Connects freight cost back to the sale and customer |
| Carrier PRO / tracking number | The carrier's own reference for the shipment | Required to match against carrier invoices and tracking events |
| Bill of lading number | The shipping document reference | Fallback matching key when PRO is missing from invoice |
| Carrier name (normalized) | The carrier used for this movement | Must be consistent across all records — see reference data section |
| Service level code | Standard ground, expedited, LTL, etc. | Required to validate that the billed service matches what was tendered |
| Origin postal code / city | Where the shipment was picked up | Lane definition and zone calculation |
| Destination postal code / city | Where the shipment was delivered | Lane definition, zone calculation, and residential flag |
| Location type (destination) | Commercial, residential, limited access | Determines which accessorial surcharges are expected vs unexpected |
| Customer ID | Which customer this shipment belongs to | Required to analyze freight cost by customer |
| Tendered weight | The weight recorded at the warehouse when the shipment was booked | Baseline for spotting carrier reweigh adjustments on the invoice |
| Number of pieces / pallets | Piece count or pallet count | LTL pricing is often per-hundredweight; piece count affects minimum charges |
| Freight class (LTL) | NMFC class used at booking | Baseline for spotting carrier reclass adjustments; see LTL reweigh and reclass charges explained |
| Scheduled pickup date | The date the carrier was expected to pick up | Identifies late pickups that may trigger re-pickup or delay fees |
| Actual pickup date | The date the carrier actually picked up | Identifies missed pickup windows |
| Actual delivery date | The date the shipment was delivered | On-time delivery tracking; fuel surcharge effective date calculation |
| Quoted freight cost | The rate that was shown at booking | Foundation for quote-to-invoice variance tracking |
A common gap: teams collect shipment header data (carrier, origin, destination, dates) but skip tendered weight, freight class, and quoted cost. Those three fields are required for any meaningful audit. Without tendered weight, you cannot detect a reweigh. Without freight class, you cannot detect a reclass. Without quoted cost, invoice variance is invisible.
The invoice record: line level is not optional
Invoice data collected as monthly totals — even by carrier — cannot support freight cost analysis. The cost story lives inside individual charge lines, not in the summary. A carrier invoice for $420 tells you nothing. A carrier invoice showing $280 base freight, $62 fuel surcharge, $48 liftgate delivery, and $30 appointment fee tells you four separate things, each with a different explanation and a different owner.
The invoice record requires one row per charge line, not one row per invoice. If your source data arrives as invoice-level totals, push back before accepting it. The line-level breakdown is not more data for its own sake — it is the only level at which accessorial patterns, rate discrepancies, and billing errors are visible.
| Field | What it captures | Notes |
|---|---|---|
| Invoice number | The carrier's invoice reference | Required for duplicate detection — same invoice number from the same carrier on two records is a flag |
| Invoice date | The date the carrier issued the invoice | Used to match the correct fuel surcharge table and rate agreement effective period |
| Carrier PRO / tracking number | The carrier's shipment reference on this invoice | Primary link back to the shipment record |
| Charge type | Base freight, fuel surcharge, liftgate, appointment, reweigh adjustment, residential, re-delivery, etc. | Must be categorized consistently — not as "miscellaneous" or "other" |
| Charge amount | The dollar amount for this specific charge line | One amount per charge type per row |
| Billed weight | The weight the carrier used to calculate the freight charge | Compare against tendered weight in shipment record to detect reweigh |
| Billed freight class (LTL) | The class the carrier used for rating | Compare against tendered class in shipment record to detect reclass |
| Carrier name (normalized) | Which carrier issued this invoice | Must match the normalized carrier name in the shipment record exactly |
| Invoice currency | CAD, USD, or other | Required for accurate cost comparison across carriers on cross-border lanes |
| Credit or adjustment flag | Whether this line is a charge or a credit | Credits need to be matched to the original invoice they correct |
One recurring problem: carrier invoices often arrive with charge types labeled inconsistently. One carrier calls it "fuel surcharge," another calls it "energy surcharge," a third buries it inside "accessorial fees." Before analysis is possible, these need to be mapped to a consistent internal charge type taxonomy. Building that mapping table once — carrier by carrier — saves hours of confusion every month going forward.
The rate record: what should have been charged
Without rate data, invoice data can only tell you what was charged. It cannot tell you whether what was charged was correct. Rate data is the baseline that turns every invoice line into a specific question: was this calculated correctly against the agreement that was in place when this shipment moved?
Rate agreements are typically stored as carrier PDFs or spreadsheet exports. The work of centralization is translating those documents into queryable records with effective dates so the right rate row applies to the right shipment.
| Field | What it captures | Notes |
|---|---|---|
| Carrier name (normalized) | Which carrier this rate applies to | Must match exactly across all records |
| Service level | Standard ground, expedited, LTL standard, etc. | Rates vary by service — a single carrier may have five different rate tables |
| Origin zone or postal range | Where the shipment originates | Rate tables are often zone-based, not lane-specific |
| Destination zone or postal range | Where the shipment is going | Zone-to-zone matrix is the most common format |
| Weight break | The weight range this rate row covers | Rates typically step down per-unit as weight increases |
| Base rate | The agreed linehaul rate for this zone/weight combination | May be expressed as a flat amount or a rate per 100 lbs (cwt) |
| Discount percentage (if applicable) | The negotiated discount applied to the carrier's published tariff | Many contracts are structured as "X% off tariff" rather than absolute rates |
| Minimum charge | The floor below which no shipment is billed | Light shipments on short lanes frequently hit the minimum — this needs to be modeled, not ignored |
| Fuel surcharge table reference | Which fuel index and rate table governs fuel for this agreement | Fuel surcharge is recalculated weekly or monthly — the table and effective period must be captured |
| Accessorial rate by type | The agreed rate for each accessorial charge | Liftgate, appointment, residential, wait time, limited access — each needs its own agreed rate row |
| Effective date (start) | The date this rate came into effect | Critical — without this, the wrong rate row is applied to historical shipments |
| Effective date (end) | The date this rate was superseded | Open-ended for the current agreement; filled when a new agreement replaces it |
The effective date pair is the most commonly skipped field and the one that causes the most analysis errors. A rate that changed in March should not be applied to January shipments. If the effective dates are not captured, every historical comparison is potentially wrong.
The linking problem: how to connect invoices to shipments
The most common place a data centralization effort stalls is the match between invoice and shipment. The two records come from different systems, use different reference numbers, and were created by different teams at different moments in time. Getting them to connect reliably is not automatic.
Use this priority order when matching.
| Priority | Matching key | Reliability | Common problems |
|---|---|---|---|
| 1 | Carrier PRO number — match between invoice PRO and shipment PRO | High | PRO not captured in TMS at booking; PRO assigned after pickup and not updated in source system |
| 2 | Bill of lading number — match between invoice BOL and shipment BOL | Medium–high | BOL number formats vary; some carriers use the customer's BOL, some assign their own |
| 3 | Order number — match between invoice reference field and order number in shipment record | Medium | Not all carriers include order number on invoice; format may differ (leading zeros, prefixes) |
| 4 | Composite key — carrier + destination + pickup date + weight range | Low–medium | Multiple shipments to same destination on same date create ambiguous matches |
| Unmatched | Invoice with no shipment match after all keys exhausted | — | Flag for manual review — may be duplicate invoice, ghost shipment, or data entry error |
Every invoice that cannot be matched to a shipment after all four options are exhausted should be flagged before payment. A carrier invoice with no corresponding shipment in your system is either a billing error, a duplicate, or a shipment that was not captured in the source data. None of those should be paid without investigation.
The match rate — the percentage of invoices that link to a shipment record — is itself a useful metric. A match rate below 90 percent in the first month of data work is common. Below 75 percent means there is a source data problem worth investigating before building anything on top.
Reference data: the foundation that makes everything else work
Reference data is the set of standard values that allow records from different sources to be compared. When two records use different values for the same thing — "UPS" versus "United Parcel Service" versus "UPS Ground," or "residential" versus "resi" versus "home delivery" — every query that groups by carrier or location type produces wrong results.
Reference data cleanup is unglamorous work. It is also the work that determines whether every analysis that follows produces numbers you can trust.
Four reference categories need standardization before analysis is possible.
Carrier names. Pull every unique value in the carrier field across all source files. Map each variant to a single canonical name. Common problems: "FedEx Freight" and "FedEx LTL" and "Federal Express" appearing as three separate carriers; regional carriers appearing under both their own name and the name of a parent company they were acquired by. Build a mapping table — raw value on one side, canonical name on the other — and apply it every time data is loaded.
Service level codes. Different systems use different labels for the same service. Your TMS may call it "Ground Residential." The carrier's invoice may call it "Home Delivery." Your rate agreement may define it as "SVC-R." All three need to resolve to the same internal code. Without this, service-level analysis produces nonsense.
Location types. The residential vs commercial distinction drives a significant share of accessorial charges. The location type field in the shipment record needs to be a controlled value — not free text — and it needs to be accurate at booking, not corrected after the carrier charges the surcharge. Common values: Commercial Dock, Residential, Limited Access, Trade Show, Construction Site, Government Facility. If this field is empty or inconsistently populated, residential surcharge analysis is impossible.
Customer and location identifiers. The same delivery address may appear as "Costco Warehouse — Mississauga" in one system and "Costco — 2150 Argentia Road" in another. If location identifiers are not standardized, accessorial analysis by delivery location does not work. Build a location master — canonical name, address, postal code, location type — and use the location ID rather than free-text address fields as the analytical key.
Additional fields for 3PLs: the margin layer
A shipper needs to know what freight cost and whether it was correct. A third-party logistics provider needs to know all of that plus what was billed to the customer for the same movement — and whether carrier cost plus margin is running above or below what was charged.
Without the customer billing fields alongside the carrier cost fields, a 3PL can only see cost, not profitability. Accounts that look active can be running at a loss, and the problem is only visible after the accounting cycle closes.
| Field | What it captures | Why it matters for 3PLs |
|---|---|---|
| Customer billing rate | The rate the 3PL charged the customer for this movement | Revenue side of the margin calculation |
| Customer billing amount (total) | What the customer was actually invoiced for this shipment | Including any accessorials passed through to the customer |
| Accessorial pass-through flag | Whether each accessorial was billed to the customer or absorbed by the 3PL | Identifies which charge types are being systematically absorbed and at what cost |
| Carrier cost (total for shipment) | What the carrier charged the 3PL for this movement | Cost side of the margin calculation |
| Gross margin per shipment | Customer billing amount minus carrier cost | Calculated field — the core profitability metric at the shipment level |
| Customer account ID | Which 3PL customer this shipment belongs to | Required to roll up margin by account and surface accounts running below target |
| Customer rate agreement effective date | Which customer billing agreement was in effect at time of shipment | Required to validate that the customer was billed under the correct rate agreement |
The most common 3PL margin problem I see: a carrier implements a rate increase or adds a new accessorial, the 3PL's carrier cost goes up, but the customer billing rate is not updated because nobody connected the carrier invoice changes to the customer rate review process. Margin on those accounts compresses silently until someone runs the numbers. With the margin fields in the dataset, that compression is visible in real time rather than after the quarter closes.
Exception flags: what to build and what thresholds to use
Exception flags are where the connected dataset becomes action. A flag is a rule applied to the data that surfaces a record requiring review. The goal is not to flag everything — that creates noise. The goal is to flag the specific conditions that represent real cost problems, billing errors, or operational failures.
The following flags cover the majority of recoverable freight cost issues. Each one needs a defined condition, a threshold, and a clear next step.
| Flag | Condition | Suggested threshold | Next step |
|---|---|---|---|
| Duplicate invoice | Same invoice number from the same carrier appears more than once | Any duplicate — no tolerance | Hold second instance; request carrier confirmation before payment |
| Unmatched invoice | Invoice has no matching shipment record after all four matching keys are exhausted | Any unmatched — no tolerance | Hold for manual review; do not pay until shipment is confirmed |
| Rate discrepancy | Billed base rate does not match the calculated rate from the agreed rate file for this carrier, zone, weight, and effective date | Variance greater than $10 or 3% of base charge, whichever is larger | Flag for dispute; document the correct rate and the actual billed rate |
| Carrier reweigh | Billed weight on invoice is higher than tendered weight on shipment record | Any positive difference; prioritize review where weight difference exceeds 10% or $25 | Request carrier reweigh documentation; compare against warehouse scale record |
| Carrier reclass (LTL) | Billed freight class on invoice does not match tendered freight class on shipment record | Any mismatch | Verify correct NMFC classification; dispute if tendered class was correct |
| Unexpected residential surcharge | Residential delivery charge billed for a location flagged as commercial in the shipment record | Any mismatch | Confirm delivery address type; dispute if address is confirmed commercial |
| Quote-to-invoice variance | Total invoiced amount is higher than quoted amount on shipment record | Variance greater than $30 or 10% of quoted amount | Identify which charge types created the variance; classify as expected, explainable, or disputed |
| Accessorial concentration | The same accessorial charge type appears more than N times for the same delivery location in a rolling 90-day window | Five or more occurrences of the same charge type at the same location | Review whether the charge reflects a permanent location characteristic that should be flagged at booking |
| 3PL margin below threshold | Gross margin per shipment (customer billing minus carrier cost) falls below target | Below agreed margin floor for that customer account or lane | Review whether carrier cost increase was passed through to customer billing |
A few practical notes on thresholds. Start conservative — flag fewer things with higher confidence, not everything with low confidence. A team that receives 300 flags per week and investigates none of them has the same outcome as a team with no flags. The threshold should match the investigation capacity available. Adjust over time as the team develops a sense of which flags are producing genuine recoveries and which are noise.
What the master record looks like
Once the shipment record, invoice record, and rate record are connected and the exception flags are applied, the result is a master record — one row per invoice charge line, linked to the shipment, with the rate baseline and exception status alongside it.
This is what the structure looks like in practice.
| Category | Key fields in this row |
|---|---|
| Shipment identity | Shipment ID, order number, carrier PRO, BOL number |
| Carrier and service | Carrier name (normalized), service level code |
| Lane | Origin postal code, destination postal code, destination location type |
| Customer | Customer ID, delivery location ID |
| Shipment facts | Tendered weight, freight class, piece count, pickup date, delivery date, quoted amount |
| Invoice line | Invoice number, invoice date, charge type (normalized), charge amount, billed weight, billed class |
| Rate baseline | Expected rate for this charge type, variance amount, variance percentage |
| Exception flags | Duplicate flag, unmatched flag, rate discrepancy flag, reweigh flag, reclass flag, residential mismatch flag, quote variance flag |
| 3PL fields (if applicable) | Customer billing amount, carrier cost, gross margin, accessorial pass-through flag |
One row in this structure contains everything needed to answer whether a charge was correct, whether it was expected, and who should act on it. A query that filters for any row where a flag is active gives the exception queue. A query that groups by carrier and charge type gives the cost breakdown. A query that groups by customer and location gives the accessorial pattern by delivery point.
The four-week build: specific tasks and deliverables
The following plan assumes you are starting from scratch — carrier invoices in email or accounting software, shipment data in a TMS or spreadsheet, rate agreements in PDFs. Adjust based on what source data you already have in accessible form.
Week 1 — Source inventory and invoice extraction. List every carrier, every source system, and every data format. For each carrier, identify how invoices arrive: email attachment, portal download, EDI file, or accounting system export. Pull three months of invoices from each carrier. For each invoice, extract to a spreadsheet with one row per charge line. Build the charge type mapping table — what each carrier calls each charge type, mapped to your internal taxonomy. Deliverable: a flat file per carrier, three months of invoice lines, charge types normalized.
Week 2 — Shipment record extraction and invoice matching. Export shipment records for the same three-month window from your TMS, WMS, or order management system. Confirm which fields are populated and which are missing. Attempt the invoice-to-shipment match using the priority key order above. Calculate your match rate. For unmatched invoices, document the reason: missing PRO in TMS, no order reference on invoice, or no shipment record at all. Deliverable: matched invoice-shipment pairs, unmatched invoice list with reason codes, initial match rate metric.
Week 3 — Rate data and reference cleanup. Translate carrier rate agreements into a queryable rate table with effective dates. Build the carrier name mapping table and apply it to all three source files. Standardize service level codes and location type values. Link each matched shipment-invoice pair to the applicable rate row and calculate the expected charge. Flag rows where the billed amount does not match the expected amount above the threshold. Deliverable: rate baseline applied to all matched records, rate discrepancy flags active, reference data normalized.
Week 4 — Exception view and first review cycle. Apply all exception flags from the table above. Generate the exception queue — all flagged rows, sorted by variance amount. Run the first review cycle: for each flagged record, classify as confirmed error (dispute), operational explanation (document and accept), or requires more information. For the accessorial concentration flag, identify which delivery locations appear most frequently and whether the charge type reflects a permanent location characteristic that should be pre-flagged at booking. Deliverable: first live exception queue, first set of dispute candidates, first view of accessorial patterns by location.
What not to build before the data is clean
Two things are consistently attempted before the data is ready and produce results that waste more time than they save.
Dashboards on dirty data. A chart showing freight cost by carrier built on inconsistent carrier names will show the same carrier split across three bars with three different labels. The bar chart looks finished. The numbers are wrong. Clean the carrier name reference table first, then build the visualization.
AI or automation on unlinked records. Any tool that reads invoice data to detect anomalies, suggest disputes, or flag duplicates requires the invoice to be linked to the shipment. An AI that reads an invoice line in isolation cannot tell whether the liftgate charge was expected or unexpected — it does not know whether the delivery location was flagged as commercial. Connect the records first. The automation or analysis layer comes after.
For more on how logistics data reaches shippers in the first place — and why different carriers deliver it in different formats — see Why is logistics data so hard to get? EDI vs API explained.
The dataset is the foundation; the analysis comes after
The work described in this article is not glamorous. It is field mapping, reference table building, key matching, and exception rule writing. It does not produce a visible output in week one. It does not immediately answer the question of why freight cost went up last month.
But it is the work that makes every other effort possible. Without the specific fields in the shipment record, reweigh disputes are guesswork. Without the invoice at the line level, accessorial patterns are invisible. Without the rate record with effective dates, the audit compares charges to the wrong baseline. Without the reference data normalized, every query groups incorrectly. Without the exception flags, the review cycle is a manual search through thousands of rows.
Four weeks of careful data work produces a connected dataset that can be queried, audited, and acted on. It does not require a new system. It requires knowing what fields to collect, how to connect them, and what rules to apply on top.
That is what freight cost control actually looks like at the data level.
The questions are easy to ask. It is the data structure underneath that determines whether they can be answered.
If you are building this dataset and want a second opinion on structure, field choices, or exception logic, feel free to reach out at dzhang@stridedata.com.