One of the common issues in managing evidence, is dealing with large PDFs that are actually bundles of multiple evidence files – dozens or hundreds of documents all in one large PDF file.
MasterFile splits, i.e. breaks apart, large PDFs regardless of file size into the multiple, logical documents these aggregate PDFs contain, automatically. That makes it easy to:
- Review and extract key information, facts and issues, or apply work product to a single section of a document.
- Produce specific documents.
- Avoid navigating PDFs with large page counts and unrelated documents within them by dealing with each logical document on its own in MasterFile’s review platform.
Dozens of products let you split a large PDF file into smaller PDFs based on a number of criteria – bookmarks, number of pages, file size and so on. That’s all well and good, but as a litigator or a legal nurse consultant being able to simply extract pages has no relevance to managing evidence, key facts and extracts within each document that’s inside the large PDF file now on your screen.
Let MasterFile split and load large PDF files for you automatically, so you can focus on what matters most – case analysis and case strategy.
Where do these irritating PDF files come from?
Common examples of large PDFs are medical records and production disclosures, FOIA responses and responses from the Court. Each of these bundles need to be broken apart and split into their constituent logical documents, each classified and correctly filed with case evidence: by date, author, summary, document type (i.e. is it an expert report, case law, invoice, ruling, etc.) and so forth. MasterFile’s Express Load splits such PDF files into their logical documents, one document per PDF, on-the-fly as the original PDF is being added to your case evidence.
You can split PDF files in one of two ways.
- By bookmarks. Many digitally combined PDFs (such as by Adobe Acrobat) include bookmarks — either the original filename from which the PDF was made, or a description. MasterFile uses these as logical breaks at which to split a PDF. A bookmark is also used as a short summary for its related document. Only top-level bookmarks are used; nested bookmarks are ignored.
- By the starting page number of each contained document. That’s explained below.
Option 1 — Split PDF files on-the-fly via bookmarks
- Simply navigate from Express Load to the folder containing the PDF to split.
- Select it, and enter one of these options on how to interpret the PDF’s book marks:
- No bookmark numbers. Processing splits the entire file on each bookmark.
- Starting or ending bookmark numbers; if both are specified, just the documents between the starting and ending bookmarks are split out and loaded.
- Click OK.
Option 2 – Split PDF files on-the-fly via page number
This option lets you split PDF files using an Excel spreadsheet or CSV file to control what pages the logical documents start and end on. Starting and ending page numbers for each contained document are entered on consecutive rows in their respective columns. MasterFile uses the Excel sheet to determine where and how to split the PDF. Bookmarks are ignored in this option.
Why use an Excel or CSV file to split a large PDF?
The Excel file is actually a pseudo load file. Which means you can also include any other meta data you like, rather than updating meta data during review, including each document’s date, sender/author, document type, issues it pertains to, and a summary or description. Excel’s cell copy commands make short work any of common information.
Note that the CSV and XSLX files should have exactly the same filename as the PDF you are splitting.
Here’s a screenshot of a simple Excel sheet to split the same FOIA document being split via bookmarks in the short animation above.
Splitting a PDF using bookmarks is useful if a PDF already has descriptive bookmarks and each bookmark marks the start of each logical document in the PDF. You can of course add bookmarks yourself and load the PDF. When splitting with bookmarks, however, the bookmark is the only meta data loaded (as the document summary) and therefore use short, meaningful summaries as bookmarks for each document within the large PDF.
Whether bookmarks exist or you are adding them, PDF products can and do introduce odd characters and invisible line breaks into bookmarks. For example:
- Acrobat itself can add a square box character like this ▯ to bookmarks in some cases.
- Bookmarks can’t be two or more lines yet Acrobat and other products let you add carriage returns or new line characters, or add these themselves, that are invisible — so bookmarks look like one long line. You will only be able to spot such line breaks by copying and pasting an entire bookmark into Notepad, removing breaks and odd characters, and then replacing the bookmark’s text with it.
Any of the above will cause an import to fail; they can not be detected in advance. We always recommend therefore you check the characters in each bookmark and test load your PDF in a new database before finally importing.
What happens to the original PDF?
If you want the original PDF, load it into MasterFile like any other document without the split function. However, we’ve found that’s rarely needed in practice — although each split document references its original page number range, it’s the split out document that matters, forms part of the case chronology, and is what you will need to produce in future, etc.
Can I split PDFs arbitrarily?
Yes, with Option 2, enter the page numbers to split at and a PDF of those many pages will be created. You might use this to make separate PDF files of individual pages, PDFs of multiple pages with specific page counts per PDF file, or simply to split a large PDF file into several smaller ones with an equal number of pages in each. You can also use the same technique to select pages or a page range for each split PDF. Note that PDFs will also be created with the ‘excluded’ page ranges; simply delete them from MasterFile after the load process ends.
What numbering is applied to split PDF pages?
When the split PDF is loaded, the default assumption is that as a separate document, numbering is related to it, not to the large PDF file MasterFile extracted pages from. Page numbering therefore starts at 1.
However, MasterFile lets you manipulate starting numbers to coincide with specific situations. For example, suppose the page range of a split PDF of 3 pages was 223, 224 and 225 in the large PDF file. But those three pages’ actual printed page numbers are 6, 7 and 8. You can set the extraction numbering to be 6, 7 and 8 rather than 1, 2 and 3. The original page range of 223 – 225 is also preserved.
Be more effective with MasterFile – a true, small-firm platform for simple document management through complex litigation that replaces CaseMap, Concordance, & Relativity, etc.