Subscribe:      
 

Indexing PDF content in SharePoint

Written By: Hesham Saad -- 11/24/2010 -- join -- contribute -- (9) comments -- printer friendly version

Rating: Rate --

Categories: Configurations, Design, Document Management, MOSS 2007, SharePoint 2010, SharePoint Foundation 2010, WSS2, WSS3

Problem

Many SharePoint portals require that content from PDF documents be available in SharePoint's search results.

By default, Windows SharePoint Services (WSS) 2.0/3.0, Microsoft SharePoint Foundation, SharePoint Portal Server 2003,  Microsoft Office SharePoint Server (MOSS) 2007 and even Microsoft SharePoint Server 2010 cannot index PDF documents.

Solution

In order to fix this issue, please follow the below steps:

Indexing PDF's metadata in WSS (2.0) & SharePoint Portal Server 2003 Libraries:

=> Download the free Adobe PDF IFilter 5.0 from the following Adobe web site : (Later versions like 6.0 can be used - Note there are different versions compatible for X32 / X64 bit machines)

http://www.adobe.com/support/downloads/detail.jsp?ftpID=1276

=> Go to each Index server(s) you have in your farm and stop the IIS admin service :

  • Start > Administrative Tools > Services.
  • Right Click "IIS Admin Service" > Stop.
  • Run the Adobe PDF IFilter installer to install the IFilter on each Index server(s).
  • Register the Adobe PDF IFilter:
    1. Start > Run.
    2. cmd > Ok (Enter)
    3. III. Change directory to "cd <drive>:\Program Files\Adobe\PDF IFilter 5.0, then type enter.
    4. Type "regsvr32.exe pdffilt.dll", then type enter.
    5. Wait until receiving the message "The Operation was successful" >Click "Ok".
    6. Type "exit", then enter to quit the command prompt "cmd".

=> Download the PDF icon image of size 96x96 (pxs) from the following location and save it as "pdf16.gif" somewhere to the local hard disk:

http://www.adobe.com/misc/linking.html

(Note that the file name for the PDF icon image must be "pdf16" instead of any customized file name, because if you use a customized file name for icon, you will be unable to see the correct icon in search results on SharePoint Portal Server 2003. There is a bug in SharePoint Portal Server 2003 which was fixed in MOSS 2007 & MS SharePoint Server 2010. However, there is a workaround ( MS - KB article) where you can format the file name for the icon as you like).

=> Copy the "pdf16.gif" icon image to the following location : "Drive:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\60\Template\Images ".

=> Go to the following location "Drive:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\60\Template\XML" and edit the "DOCICON.xml" file to add an entry for the ".pdf" extension and then save and quit the xml file:
<Mapping Key="pdf" Value="pdf16.gif">
=> Reboot the Index server(s) or restart all SharePoint services or even do an "IISRESET".

=> Add the ".pdf" file type to the content index:
  1. Connect to the portal site > Site Settings.
  2. Configure Search and Indexing (Under: "Search Settings & Indexed Content") > Include File Types (Under: "General Content Settings and Indexing Status").
  3. New File Type (On: "Specify File Types to Include").
  4. On "Add File Type" page, type "pdf" in the File extension box and click ok.
=> Update the context indexes for portal and for non-portal content:
  1. Site Settings > Configure Search and Indexing (Under: "Search Settings and Indexed Content").
  2. Manage Content Indexes (Under: "Content Indexes") > Click the down arrow next to the name of index that you want to update and click "Start Full Update".

Indexing PDF's Metadata in WSS (3.0) & MOSS 2007 Libraries

=> Download free Adobe PDF IFilter 6.0 from the following Adobe web site:

(Later versions like 9.0 can be used - Note there are different versions compatible for X32 / X64 bit machines)

http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611

=> Go to each Index server(s) you have in your farm and stop the IIS admin service:
  1. Start > Administrative Tools > Services.
  2. Right Click "IIS Admin Service" > Stop.
=> Run the Adobe PDF IFilter installer to install the IFilter on each Index server(s).

=> Download the PDF icon image of size 17x17 (pxs) from the following location and save it as "icopdf.gif" somewhere to the local hard disk:

http://www.adobe.com/misc/linking.html

=> Copy the "icopdf.gif" icon image to the following location:

"Drive:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images ".

=> Go to the following location "Drive:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\XML" and edit the "DOCICON.xml" file to add an entry for the ".pdf" extension and then save and quit the xml file:
<Mapping Key="pdf" Value="icopdf.gif">
=> Reboot the Index server(s) or Restart all SharePoint services or even do an "IISRESET"..

=> Go to the Search Settings under Central Administration and add the PDF as a new file type.

=> Perform a full crawling at the Central Administration SSP (Shared Services Provider).

(Note here that MOSS supposed to search PDF documents properly but sometimes many people complain that they weren't able to get the PDF's displayed at the search results at MOSS 2007. So, Microsoft provides the below Hot fix to make this work for MOSS 2007):
  1. Add the following registry entry, and then set the registry entry value to pdf:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\\Gather\Search\Extensions\ExtensionList\38 To do this, follow these steps:
    • Click Start, click Run, type regedit, and then click Ok.

    • Locate and then click the following registry subkey:

      HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\GUID\Gather\Search\Extensions\ExtensionList.

    • On the Edit menu, point to New, and then click String Value then type 38, and then press enter.
    • Right-click the registry entry that you created, and then click Modify then in the Value data box, type pdf, and then click ok.

  2. Verify that the following two registry sub keys are present and that they contain the appropriate values. Note these registry sub keys and the values that they contain are created when you installed the Adobe PDF IFilter on the server.

    • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
      This registry subkey must contain the following registry entry:
      • Name: Default Type: REG_MULTI_SZ Data: {4C904448-74A9-11D0-AF6E-00C04FD8DC02}HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared
        Tools\Web Server Extensions\12.0\Search\Setup\Filters\.pdf
    • This registry subkey must contain the following registry entries :
      • Name: Default Type: REG_SZ Data: (value not set)
      • Name: Extension Type: REG_SZ Data: pdf
      • Name: FileTypeBucket Type: REG_DWORD Data: 0x00000001 (1)
      • Name: MimeTypes Type: REG_SZ Data: application/pdf

  3. => Stop and then start the Windows SharePoint Services Search service:
  1. Run > type " cmd" >lick Ok.
  2. Stop the Windows SharePoint Services Search service this can be done via UI wizard at the Central Administration or via stsadm command by type "net stop spsearch" at the command prompt, and then press enter.
  3. Start the Windows SharePoint Services Search service. This can also be done via the UI wizard at the Central Administration or via the stsadm command or by typing "net start spsearch" at the command prompt, and then press enter.

Indexing PDF's Metadata in Foundation & Microsoft SharePoint Server 2010 Libraries:

  1. Download free Adobe PDF IFilter 9.0 (X64 bit) from the following Adobe web site:

    http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025

  2. Download the PDF icon image of size 17x17 (pxs) from the following location and save it as "pdf16.gif" somewhere to the local hard disk:

    http://www.adobe.com/misc/linking.html

  3. Copy the "pdf16.gif" icon image to the following location:

    "Drive:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\Template\Images".
  4. Go to the following location...

    "Drive:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\Template\XML"

    ...and edit the "DOCICON.xml" file to add an entry for the ".pdf" extension and then save and quit the xml file:

    <Mapping Key="pdf" Value="pdf16.gif">

  5. Add PDF file type on the File Type page under Search Service Application.
  6. Run > Regedit.
  7. Navigate to the following location:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\ContentIndexCommon\Filters\Extension
  8. Right Click > Click New > Key to create a new key for .pdf
  9. Add the following GUID in the default value {E8978DA6-047F-4E3D-9C78-CDBE46041603} 3}
  10. Restart the SharePoint Server Search 14 then reboot the SharePoint Servers in the Farm.
  11. Perform a Full Crawl to get the search results.

Next Steps




Learn more about SharePoint



Sponsor Information




Copyright (c) 2010-2017 Edgewood Solutions, LLC All rights reserved
privacy | disclaimer | copyright | advertise | contribute | feedback | about
Some names and products listed are the registered trademarks of their respective owners.


MSSharePointTips.com | MSSQLTips.com