Paul Rayius

The Relevance of Metadata in Accessible PDFs

This is the second in an eight-part article series.

STEP 2: METADATA

What is Metadata in a PDF File?

The term PDF metadata refers to searchable fields within the document’s properties that identify what the document is about. Ideally, PDF metadata should be as accurate and specific as possible to make it easier to find through search.

Some of the components that make up metadata are:

  • Title
  • Author
  • Subject
  • Keywords
  • Document Language
A graphic listing the five items that comprise PDF metadata: Title, author, subject, keywords, and language

Let’s break down each of these items, delineate them as needed and identify the accessibility concerns around them.

Access Metadata for Title, Author, Subject and Keywords by opening a PDF in Adobe Acrobat Pro and go to File > Properties > Description

A screen grab of the dialogue from Adobe Acrobat Pro, displaying the fields for PDF Metadata

PDF Metadata: Title

The Title, which is not the same thing as the file name, should succinctly and accurately describe what the document is about. For accessibility, the title metadata is a requirement.

If we titled this article “The Migration Patterns of the Peregrine Falcon,” and then started going on about metadata, that wouldn’t be very helpful, would it? It’s not nice to frustrate the people you serve. Always make sure the title correctly reflects the subject matter of the PDF.

Accessibility requirements mandate that PDF Titles need to be displayed when the document is opened, as opposed to the File Name. Let’s face it, some documents have very cryptic file names with dates, versions, editor’s initials and all other kinds of things that aren’t terribly informative to the end user. Descriptive PDF Titles solve this problem.

In addition, if your organization needs to make your PDFs conform with the HHS standard (U.S. Dept. of Health and Human Services) there are other requirements regarding the Title, specifically the types of characters that are (or are not) allowed. For more details, check out the HHS standard.

PDF Metadata: Author

Author metadata is all about who wrote the document. But, according to the HHS standards, the author should be the division, office, etc., where the document came from, but not a specific person’s name. The other standards aren’t this exacting regarding the document Author. In fact, while the HHS standard says that there should be an Author, the other standards don’t require that the Author’s information is included in the metadata. (This same rule applies for the Subject and Keywords which we’ll discuss next.)

PDF Metadata: Subject

While the Title says what the document is about, the Subject can be useful to provide more specific details. HHS doesn’t put as many restrictions on the Subject as it does on the Title.

Pro Tip: If you need to conform with the HHS standard, and you don’t know what to use for the Subject, you can use the Title again in this field.

PDF Metadata: Keywords

Keywords play a crucial role in making the document easier to find when searching online. If you’re not sure what to use for the keywords, perhaps consider things like important words from the Title, Subject and some of the headings in the document (at least the main chapter/section headings).

For example, Keywords for this article might be: PDF Metadata; Metadata and PDF Accessibility; CommonLook; Title; Author; Subject; PDF Keywords; Peregrine Falcon Migration (Ha! Just kidding.).

Pro Tip: When you’re entering Keywords into the metadata, separate them with semicolons as opposed to commas.

PDF Metadata: Language

Access Metadata for Language by opening a PDF in Adobe Acrobat Pro and go to File > Properties > Advanced

A screen capture of the dialogue window showing how to access the Language of a PDF in Adobe Acrobat Pro. Stating the document’s language is required by all accessibility standards.

Finally, a critical section of the metadata is the document’s Language. Stating the document’s language is required by all accessibility standards and it needs to be set accurately so that when assistive technology is being used to read the PDF, it’s using the language that the words are written in. Imagine how confusing it would be to read a document written in English, when the document’s Language is set to German!

Dealing with Metadata While Remediating PDFs

When it comes to addressing the metadata during PDF remediation, if you’re using Adobe Acrobat, it will give you failures if you don’t have a Title for your document, if the document is set to display the File Name, or if there isn’t a Language assigned. If there is a Title that is set to be displayed and there’s a language assigned, these things will automatically pass the Acrobat check.

meta data graphic in a computer

When using CommonLook PDF, you’ll be prompted to address all metadata concerns. Like Acrobat, your document will receive a Failure grade if you neglect the add or display the Title or if the language is not set. Fortunately, the Fix Wizard makes these things a breeze to correct!

Though Acrobat will give you a Pass if the document has those settings correct, CommonLook will ask you to go one step further and verify their accuracy. If you notice a mistake, CommonLook makes it easy to add or edit all of the metadata, including the Subject, Author and Keywords.

The HHS standard is one of the most rigorous ones when it comes to metadata. Fortunately, this standard is built-in to CommonLook PDF. If you need to conform to HHS, CommonLook PDF will prompt you to verify and correct any concerns.

How Assistive Technology Handles Metadata

Even though the only real metadata requirements are concerned with a document’s Title and Language, the PDF/UA standard (ISO 14289-1) clearly states that assistive technology shall make available the metadata in the document, if it’s there.

Post-Script:

In a PDF, the metadata includes additional information such as the software that was used to create or produce the PDF, when it was created, and, if applicable, when it was last edited. However, you really don’t have any control over that, and it doesn’t impact the accessibility of the PDF, which is why we didn’t discuss it in this article. But to be “technically” accurate regarding metadata, we felt the need to at least mention it. If you’re the type of person who stays to read the credits at the end of a movie, now you know.

Read the Entire Series Here

Step 1: Tags & Reading Order

Step 2: Metadata

Step 3: Color & Contrast