Giter VIP home page Giter VIP logo

baiduandtencentapi's Introduction

前言

  • xpdf可将pdf内容解析为php可读的文本内容
  • antiword可将doc(注意不是docx)内容解析为php可读的文本内容

一.xpdf安装指南

参考链接:http://www.cnblogs.com/yinhutaxue/p/Yihoo.html

    (1)xpdfbin-linux-3.04.tar.gz
    (2)xpdfbin-linux-3.04.tar.gz

1.2 inux操作

  cd /usr/local
  tar zxvf xpdfbin-linux-3.04.tar.gz -C /usr/local
  cd /usr/local/xpdfbin-linux-3.04  
  cat INSTALL
  cd bin32/
  cp ./* /usr/local/bin/
  cd ../doc/
  mkdir -p /usr/local/man/man1
  mkdir -p /usr/local/man/man5
  cp *.1 /usr/local/man/man1
  cp *.5 /usr/local/man/man5

下面是中文语言包支持安装

  cp sample-xpdfrc /usr/local/etc/xpdfrc
  tar zxvf xpdf-chinese-simplified.tar.gz -C /usr/local
  cd /usr/local/xpdf-chinese-simplified
  mkdir -p /usr/local/share/xpdf/chinese-simplified
  cp -r Adobe-GB1.cidToUnicode ISO-2022-CN.unicodeMap EUC-CN.unicodeMap GBK.unicodeMap CMap /usr/local/share/xpdf/chinese-simplified/
  shell端命令调用(W020151204630497494614.pdf文件已经下载到shell命令当前目录中):
  pdftotext W020151204630497494614.pdf     //没有采用字体库,存在乱码
  pdftotext -layout -enc GBK W020151204630497494614.pdf    //无乱码

pdftotext注意事项

  • xpdf的配置文件如果出现乱码的现象必须要去掉textEncoding=utf-8的选项
  • php.ini中的disable_function 中剔除shell_exec
  • 注意pdftotext的执行权限 确保对apache或者nginx有执行权限和写权限
  • php调用示例 shell_exec('/usr/local/bin/pdftotext filename')
  • 调用该命令行之后会自动生成同名的以txt为后缀的文件
  • pdftotext -layout -enc GBK W020151204630497494614.pdf //如果上例无法调用 则可以考虑使用此行代码

php调用示例:

   shell_exec('/usr/local/bin/pdftotext filename')    

二 antiword安装

下载完,解压,进入目录 使用命令 make && make install 安装时,自动安装到了/root/目录下,只有root才可执行该命令,我们需要改一下路径,COPY到/usr中方便调用。

cp /root/bin/*antiword /usr/local/bin/
mkdir /usr/share/antiword
cp -R /root/.antiword/* /usr/share/antiword/
chmod 777 /usr/local/bin/*antiword
chmod 755 /usr/share/antiword/*
  • 确保shell_exec不再php.ini中的disable_function中
  • php端执行 $content = shell_exec('/usr/local/bin/antiword -m UTF-8.txt '.$filename);
  • 确保/usr/local/bin/antiword 对apache或者nginx用户有执行和写权限

baiduandtencentapi's People

Contributors

edwardzhou28 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.