深海游弋的鱼 – 第 20 页

Ubuntu中shell下root用户切换其他用户运行程序

工作中，一些程序，需要随机启动，但是不是以root用户运行，于是需要在rc.local中通过shell，从root用户切换到其他用户运行程序，命令如下:

$su -c 'command' - user

$sudo -u jetty ./nexus start

$su -c 'command' - user

$sudo -u jetty ./nexus start

实例

在/etc/rc.local的exit 0之前加入如下内容：

#vim /etc/rc.local
su - jetty -c "/data/nexus/nexus-2.12.0-01/bin/nexus start"

1 2	#vim /etc/rc.local su - jetty -c "/data/nexus/nexus-2.12.0-01/bin/nexus start"

参考链接

shell下root用户切换其他用户运行程序

sort+awk+uniq 统计文件中出现次数最多的前10个单词

使用linux命令或者shell实现：文件words存放英文单词，格式为每行一个英文单词（单词可以重复），统计这个文件中出现次数最多的前10个单词。

$cat words.txt | sort | uniq -c | sort -k1,1nr | head -10

1	$cat words.txt \| sort \| uniq -c \| sort -k1,1nr \| head -10

主要考察对sort,uniq命令的使用，相关解释如下，命令及参数的详细说明请自行通过man查看，简单介绍下以上指令各部分的功能：
sort: 对单词进行排序
uniq -c: 显示唯一的行，并在每行行首加上本行在文件中出现的次数
sort -k1,1nr: 按照第一个字段，数值排序，且为逆序
head -10: 取前10行数据

参考链接

sort +awk+uniq 统计文件中出现次数最多的前10个单词

Ubuntu 15.04/18.04使用bashdb调试bash脚本

bash调试器bashdb，这是一个类似于GDB的调试工具，可以完成对shell脚本的断点设置，单步执行，变量观察等许多功能。

安装bashdb

$ sudo apt-get install bashdb

1	$ sudo apt-get install bashdb

macOS下使用Homebrew安装，如下：

$ brew install bashdb

1	$ brew install bashdb

有两种启动调试的方式

1.直接在bash中传递参数的方式启动，适合需要读取$0参数为shell脚本的名字的采用这种方式启动。

$ bash --debugger xx.sh

1	$ bash --debugger xx.sh

2.使用bashdb直接启动脚本,适用于一般的脚本。

$ bashdb xx.sh

1	$ bashdb xx.sh

上面的操作对于ubuntu 18.04之前的版本都是可以的，但是目前的ubuntu 18.04是没有这个包的，只能手工编译安装,参考如下命令：

$ wget https://netix.dl.sourceforge.net/project/bashdb/bashdb/4.4-0.94/bashdb-4.4-0.94.tar.gz

#这个网站,国内用户下载非常慢，可以本站下载
#$ wget https://www.mobibrw.com/wp-content/uploads/2016/02/bashdb-4.4-0.94.tar.gz

$ tar -xvf bashdb-*.tar.gz

$ cd bashdb-*/

$ ./configure

$ make

$ sudo make install

$ wget https://netix.dl.sourceforge.net/project/bashdb/bashdb/4.4-0.94/bashdb-4.4-0.94.tar.gz

#这个网站,国内用户下载非常慢，可以本站下载

#$ wget https://www.mobibrw.com/wp-content/uploads/2016/02/bashdb-4.4-0.94.tar.gz

$ tar -xvf bashdb-*.tar.gz

$ cd bashdb-*/

$ ./configure

$ make

$ sudo make install

常用的调试命令

列出代码和查询代码类：
l 列出当前行以下的10行
- 列出正在执行的代码行的前面10行
. 回到正在执行的代码行
w 列出正在执行的代码行前后的代码
/pat/ 向后搜索pat
Debug控制类：
h 帮助
help 命令得到命令的具体信息
q 退出bashdb
x 算数表达式计算算数表达式的值，并显示出来
!!空格Shell命令参数 执行shell命令
使用bashdb进行debug的常用命令(cont.)
控制脚本执行类：
n 执行下一条语句，遇到函数，不进入函数里面执行，将函数当作黑盒
s n 单步执行n次，遇到函数进入函数里面
b 行号n 在行号n处设置断点
d 行号n 撤销行号n处的断点
c 行号n 一直执行到行号n处
R 重新启动
Finish 执行到程序最后
cond n expr 条件断点

参考链接

Ubuntu 15.10，12.04 安装Apache Solr 4.10.4

Apache Solr是一个高性能，采用Java开发，基于Lucene的全文搜索服务器。Apache Solr是一个独立的企业级搜索应用服务器，目前很多企业使用solr开源服务。原理大致是文档通过Http利用XML加到一个搜索集合中。查询该集合也是通过 http收到一个XML/JSON响应来实现。它的主要特性包括：高效、灵活的缓存功能，垂直搜索功能，高亮显示搜索结果，通过索引复制来提高可用性，提供一套强大Data Schema来定义字段，类型和设置文本分析，提供基于Web的管理界面等。

Apache Solr目前可以与Apache Nutch 2.3.1整合的最高版本是4.10.4，Ubuntu 15.10 ，12.04上安装的步骤如下：

1.安装Java，并设置JAVA_HOME

$sudo apt-get install openjdk-7-jre

$sudo apt-get install openjdk-7-jdk

$export JAVA_HOME=$(readlink -f `which java` | xargs dirname | xargs dirname | xargs dirname)

$sudo apt-get install openjdk-7-jre

$sudo apt-get install openjdk-7-jdk

$export JAVA_HOME=$(readlink -f `which java` | xargs dirname | xargs dirname | xargs dirname)

建议在系统的环境变量中增加"JAVA_HOME"环境变量，也可以写在~/bashrc里面。

$sudo vim /etc/profile

1	$sudo vim /etc/profile

文件尾部增加

export JAVA_HOME=$(readlink -f `which java` | xargs dirname | xargs dirname | xargs dirname)

1	export JAVA_HOME=$(readlink -f `which java` \| xargs dirname \| xargs dirname \| xargs dirname)

然后,重启机器。

2.下载Apache Solr 4.10.4

$cd ~

$wget http://archive.apache.org/dist/lucene/solr/4.10.4/solr-4.10.4.tgz

$cd ~

$wget http://archive.apache.org/dist/lucene/solr/4.10.4/solr-4.10.4.tgz

3.解压缩到指定目录，并建立文件链接

$sudo tar -zxvf solr-4.10.4.tgz -C /var/opt/

$sudo ln -s /var/opt/solr-4.10.4/ /var/opt/apache-solr

$sudo tar -zxvf solr-4.10.4.tgz -C /var/opt/

$sudo ln -s /var/opt/solr-4.10.4/ /var/opt/apache-solr

在系统的环境变量中增加"SOLR_HOME"环境变量，也可以写在 ~/bashrc里面。

$sudo vim /etc/profile

1	$sudo vim /etc/profile

在文件末尾追加

export SOLR_HOME=/var/opt/apache-solr

1	export SOLR_HOME=/var/opt/apache-solr

然后,重启机器。

4.启动Apache Solr并设置端口为9876

$sudo -E java -Djetty.home=${SOLR_HOME}/example -Djetty.logs=/tmp -Dsolr.solr.home=${SOLR_HOME}/example/solr -Djetty.port=9876 -jar ${SOLR_HOME}/example/start.jar

1	$sudo -E java -Djetty.home=${SOLR_HOME}/example -Djetty.logs=/tmp -Dsolr.solr.home=${SOLR_HOME}/example/solr -Djetty.port=9876 -jar ${SOLR_HOME}/example/start.jar

5.在浏览器中打开网页，观察是否启动成功

在浏览器中访问http://localhost:9876/solr/

出现如下界面，说明配置成功。

Apache_Solr_4_10_4

参考链接

Nutch2.3+Hbase0.94+Solr4.10.3单机集成配置安装

Ubuntu 15.10，12.04 单机安装并配置Apache HBase

Apache HBase是一个分布式的、面向列的开源数据库，该技术来源于 Fay Chang 所撰写的Google论文“Bigtable：一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统（File System）所提供的分布式数据存储一样，HBase在Hadoop之上提供了类似于Bigtable的能力。HBase是Apache的Hadoop项目的子项目。HBase不同于一般的关系数据库，它是一个适合于非结构化数据存储的数据库。另一个不同的是HBase基于列的而不是基于行的模式。

1.安装Java，并设置JAVA_HOME

$sudo apt-get install openjdk-7-jre

$sudo apt-get install openjdk-7-jdk

$export JAVA_HOME=$(readlink -f `which java` | xargs dirname | xargs dirname | xargs dirname)

$sudo apt-get install openjdk-7-jre

$sudo apt-get install openjdk-7-jdk

$export JAVA_HOME=$(readlink -f `which java` | xargs dirname | xargs dirname | xargs dirname)

建议在系统的环境变量中增加"JAVA_HOME"环境变量，也可以写在~/bashrc里面。

$sudo vim /etc/profile

1	$sudo vim /etc/profile

文件尾部增加

export JAVA_HOME=$(readlink -f `which java` | xargs dirname | xargs dirname | xargs dirname)

1	export JAVA_HOME=$(readlink -f `which java` \| xargs dirname \| xargs dirname \| xargs dirname)

然后,重启机器。

2.下载并配置Apache HBase

$wget http://apache.opencas.org/hbase/1.1.3/hbase-1.1.3-bin.tar.gz

$sudo tar -zxvf hbase-1.1.3-bin.tar.gz  -C /var/opt

$sudo ln -s /var/opt/hbase-1.1.3/ /var/opt/apache-hbase

$wget http://apache.opencas.org/hbase/1.1.3/hbase-1.1.3-bin.tar.gz

$sudo tar -zxvf hbase-1.1.3-bin.tar.gz -C /var/opt

$sudo ln -s /var/opt/hbase-1.1.3/ /var/opt/apache-hbase

在系统的环境变量中增加"HBASE_HOME"环境变量，也可以写在~/bashrc里面。

$sudo vim /etc/profile

1	$sudo vim /etc/profile

文件尾部增加

export HBASE_HOME=/var/opt/apache-hbase

1	export HBASE_HOME=/var/opt/apache-hbase

然后,重启机器。

3.编辑`conf/hbase-site.xml`配置数据存储目录

添加如下内容。其实这里也可以不做修改，如果不做修改，就会把数据存放到tmp临时目录中，重启就没有数据。如果做简单的测试就不用麻烦去配置文件。

$sudo vim $HBASE_HOME/conf/hbase-site.xml

1	$sudo vim $HBASE_HOME/conf/hbase-site.xml

在其中的configuration中增加对于存储数据目录的配置：

<configuration>
	<property>
		<name>hbase.rootdir</name>
		<value>file:///home/hduser/HBASE/hbase</value>
	</property>
	<property>
		<name>hbase.zookeeper.property.dataDir</name>
		<value>/home/hduser/HBASE/zookeeper</value>
	</property>
</configuration>

<name>hbase.rootdir</name>

<value>file:///home/hduser/HBASE/hbase</value>

</property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/home/hduser/HBASE/zookeeper</value>

</property>

</configuration>

4.启动以及关闭Apache HBase

启动

sudo -E $HBASE_HOME/bin/start-hbase.sh

1	sudo -E $HBASE_HOME/bin/start-hbase.sh

关闭

sudo -E $HBASE_HOME/bin/stop-hbase.sh

1	sudo -E $HBASE_HOME/bin/stop-hbase.sh

注意，如果执行sudo的话，一定要加-E参数，否则会导致提示找不到JAVA_HOME.原因是，sudo出于安全原因，默认禁止环境变量输出到子进程中。

5.参考链接

Installing Apache HBase on Ubuntu for Standalone Mode
安装nutch2+Hbase+Slor4

Ubuntu 15.10，12.04 安装Apache Nutch 2.3.1 并整合Apache Solr 4.10.4

Apache Nutch是一个用Java编写的开源网络爬虫。通过它，我们就能够自动地找到网页中的超链接，从而极大地减轻了维护工作的负担，例如检查那些已经断开了的链接，或是对所有已经访问过的网页创建一个副本以便用于搜索。接下来就是Apache Solr所要做的。Apache Solr是一个开源的全文搜索框架，通过Apache Solr我们能够搜索Apache Nutch已经访问过的网页。

Apache Nutch对于Apache Solr已经支持得很好，这大大简化了Apache Nutch与Apache Solr的整合。这也消除了过去依赖于Apache Tomcat来运行老的Nutch网络应用以及依赖于Apache Lucene来进行索引的麻烦。

目前官方2.x只提供了源码下载，不再提供编译发布版本，需要用户自己去编译。

请先参考Ubuntu 15.10，12.04 安装Apache Solr 4.10.4 安装Apache Solr。（注意，目前的Apache Nutch只能支持到Apache Solr 4.10.4版本，因此不能安装高于这个版本的Apache Solr，实验后确定目前不能使用高于这个版本的Apache Solr）

然后参考Ubuntu 15.10，12.04 单机安装并配置Apache HBase安装HBase。

之后再进行后续的操作。

1.安装Java，并设置JAVA_HOME

$sudo apt-get install openjdk-7-jre

$sudo apt-get install openjdk-7-jdk

$export JAVA_HOME=$(readlink -f `which java` | xargs dirname | xargs dirname | xargs dirname)

$sudo apt-get install openjdk-7-jre

$sudo apt-get install openjdk-7-jdk

$export JAVA_HOME=$(readlink -f `which java` | xargs dirname | xargs dirname | xargs dirname)

建议在系统的环境变量中增加"JAVA_HOME"环境变量，也可以写在~/bashrc里面。

$sudo vim /etc/profile

1	$sudo vim /etc/profile

文件尾部增加

export JAVA_HOME=$(readlink -f `which java` | xargs dirname | xargs dirname | xargs dirname)

1	export JAVA_HOME=$(readlink -f `which java` \| xargs dirname \| xargs dirname \| xargs dirname)

然后,重启机器。

2.下载并安装Nutch

$wget http://apache.opencas.org/nutch/2.3.1/apache-nutch-2.3.1-src.tar.gz

$sudo tar -zxvf apache-nutch-2.3.1-src.tar.gz  -C /var/opt

$sudo ln -s /var/opt/apache-nutch-2.3.1/ /var/opt/apache-nutch

$wget http://apache.opencas.org/nutch/2.3.1/apache-nutch-2.3.1-src.tar.gz

$sudo tar -zxvf apache-nutch-2.3.1-src.tar.gz -C /var/opt

$sudo ln -s /var/opt/apache-nutch-2.3.1/ /var/opt/apache-nutch

3.编译Nutch

安装ant

$sudo apt-get install ant

1	$sudo apt-get install ant

配置Nutch需要使用的数据库,由于我们会与Apache Solr整合，因此需要配置Nutch编译的时候打开Apache Solr的支持。(目前测试情况来看，是无法配置成功使用org.apache.gora.solr.store.SolrStore作为存储后端的，只能是使用HBase作为存储后端）。

1.修改ivy/ivy.xml，设置可以使用的后端存储模块，可以多选，至于最后使用哪个存储模块，需要在conf/nutch-site.xml中指明。

$sudo vim /var/opt/apache-nutch/ivy/ivy.xml

1	$sudo vim /var/opt/apache-nutch/ivy/ivy.xml

找到如下信息：

<!-- Uncomment this to use SQL as Gora backend. It should be noted that the 
gora-sql 0.1.1-incubating artifact is NOT compatable with gora-core 0.3. Users should 
downgrade to gora-core 0.2.1 in order to use SQL as a backend however this is not suggested. -->
<!--
<dependency org="org.apache.gora" name="gora-sql" rev="0.1.1-incubating" conf="*->default" />
-->
<!-- Uncomment this to use MySQL as database with SQL as Gora store. -->
<!--
<dependency org="mysql" name="mysql-connector-java" rev="5.1.18" conf="*->default"/> 
-->
<!-- Uncomment this to use HBase as Gora backend. -->
<!--     
<dependency org="org.apache.gora" name="gora-hbase" rev="0.6.1" conf="*->default" /> 
-->
<!-- Uncomment this to use Accumulo as Gora backend. -->
<!--
<dependency org="org.apache.gora" name="gora-accumulo" rev="0.6.1" conf="*->default" />
-->
<!-- Uncomment this to use Cassandra as Gora backend. -->
<!-- 
<dependency org="org.apache.gora" name="gora-cassandra" rev="0.6.1" conf="*->default" />
-->
<!-- Uncomment this to use MongoDB as Gora backend. -->
<!--
<dependency org="org.apache.gora" name="gora-mongodb" rev="0.6.1" conf="*->default" />
-->
<!-- Uncomment this to use Solr as Gora backend. -->
<!--
<dependency org="org.apache.gora" name="gora-solr" rev="0.6.1" conf="*->default" />
-->
<!-- The gora-compiler is used within the 'ant generate-gora-src' target to compile
the Gora .avsc files within ./src/gora 
-->

<!-- Uncomment this to use SQL as Gora backend. It should be noted that the

gora-sql 0.1.1-incubating artifact is NOT compatable with gora-core 0.3. Users should

downgrade to gora-core 0.2.1 in order to use SQL as a backend however this is not suggested. -->

<!--

<dependency org="org.apache.gora" name="gora-sql" rev="0.1.1-incubating" conf="*->default" />

-->

<!--

<dependency org="mysql" name="mysql-connector-java" rev="5.1.18" conf="*->default"/>

-->

<!--

<dependency org="org.apache.gora" name="gora-hbase" rev="0.6.1" conf="*->default" />

-->

<!--

<dependency org="org.apache.gora" name="gora-accumulo" rev="0.6.1" conf="*->default" />

-->

<!--

<dependency org="org.apache.gora" name="gora-cassandra" rev="0.6.1" conf="*->default" />

-->

<!--

<dependency org="org.apache.gora" name="gora-mongodb" rev="0.6.1" conf="*->default" />

-->

<!--

<dependency org="org.apache.gora" name="gora-solr" rev="0.6.1" conf="*->default" />

-->

<!-- The gora-compiler is used within the 'ant generate-gora-src' target to compile

the Gora .avsc files within ./src/gora

-->

找到如下：

<!-- Uncomment this to use HBase as Gora backend. -->
<!--     
<dependency org="org.apache.gora" name="gora-hbase" rev="0.6.1" conf="*->default" /> 
-->

<!--

<dependency org="org.apache.gora" name="gora-hbase" rev="0.6.1" conf="*->default" />

-->

去掉注释。

同时为了修复gora-hbase 0.6.1的BUG，需要在刚刚去掉注释的代码下面，增加如下一句：

<dependency org="org.apache.hbase" name="hbase-common" rev="0.98.8-hadoop2" conf="*->default" />

1	<dependency org="org.apache.hbase" name="hbase-common" rev="0.98.8-hadoop2" conf="*->default" />

2.配置conf/nutch-site.xml，指明需要的后端存储模块。

$sudo vim /var/opt/apache-nutch/conf/nutch-site.xml

1	$sudo vim /var/opt/apache-nutch/conf/nutch-site.xml

在configuration字段中指明需要的后端存储类型（此处我们需要整合Solr，因此指定org.apache.gora.solr.store.SolrStore,具体配置的字符串，参考上面的conf/gora.properties中的gora.datastore.default字段)，同时需要指明"http.agent.name"字段，否则运行时候会报错，没有设置"http.agent.name"。设置"plugin.includes"字段，否则在最后建立Solr索引的时候会报告"No IndexWriters activated - check your configuration"。

<configuration>
	<property>
		<name>http.agent.name</name>
		<value>MyNutchSpider</value>
	</property>
	<property>
		<name>storage.data.store.class</name>
		<value>org.apache.gora.hbase.store.HBaseStore</value>
		<description>Default class for storing data</description>
	</property>
	<property>
	  <name>plugin.includes</name>
	  <value>protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(basic|site|url|lang)|indexer-solr|nutch-extensionpoints|protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|more|metadata)</value>
	</property>
</configuration>

<name>http.agent.name</name>

<value>MyNutchSpider</value>

</property>

<name>storage.data.store.class</name>

<value>org.apache.gora.hbase.store.HBaseStore</value>

<description>Default class for storing data</description>

</property>

<name>plugin.includes</name>

</property>

</configuration>

3.修改conf/gora.properties，打开需要的后端存储，单机情况下，一般默认即可,Apache Solr的监听端口，要根据自己机器的监听端口设置。

$sudo vim /var/opt/apache-nutch/conf/gora.properties

1	$sudo vim /var/opt/apache-nutch/conf/gora.properties

找到

#########################
# HBaseStore properties #
#########################
# HBase requires that the Configuration has a valid "hbase.zookeeper.quorum"
# property. It should be included within hbase-site.xml on the classpath. When
# this property is omitted, it expects Zookeeper to run on localhost:2181.

# To greatly improve scan performance, increase the hbase-site Configuration
# property "hbase.client.scanner.caching". This sets the number of rows to grab
# per request.

# HBase autoflushing. Enabling autoflush decreases write performance. 
# Available since Gora 0.2. Defaults to disabled.
# hbase.client.autoflush.default=false

# HBase client cache that improves the scan in HBase (default 0)
# gora.datastore.scanner.caching=1000

#########################

# HBaseStore properties #

#########################

# HBase requires that the Configuration has a valid "hbase.zookeeper.quorum"

# property. It should be included within hbase-site.xml on the classpath. When

# this property is omitted, it expects Zookeeper to run on localhost:2181.

# To greatly improve scan performance, increase the hbase-site Configuration

# property "hbase.client.scanner.caching". This sets the number of rows to grab

# per request.

# HBase autoflushing. Enabling autoflush decreases write performance.

# Available since Gora 0.2. Defaults to disabled.

# hbase.client.autoflush.default=false

# HBase client cache that improves the scan in HBase (default 0)

# gora.datastore.scanner.caching=1000

最后一行增加：

gora.datastore.default=org.apache.gora.hbase.store.HBaseStore

1	gora.datastore.default=org.apache.gora.hbase.store.HBaseStore

4.修改ivy中配置的maven仓库地址，配置ivy/ivysettings.xml。

$sudo vim /var/opt/apache-nutch/ivy/ivysettings.xml

1	$sudo vim /var/opt/apache-nutch/ivy/ivysettings.xml

找到如下代码：

<property name="repo.maven.org"
   value="http://repo1.maven.org/maven2/"
   override="false"/>

<property name="repo.maven.org"

value="http://repo1.maven.org/maven2/"

override="false"/>

把默认的maven中央库地址 http://repo1.maven.org/maven2/ 替换成国内OSC提供的镜像：http://maven.oschina.net/content/groups/public/ 。

编译Nutch,并下载依赖的Jar包

$cd /var/opt/apache-nutch

$sudo ant runtime

$cd /var/opt/apache-nutch

$sudo ant runtime

4.增加Nutch安装目录的环境变量${NUTCH_RUNTIME_HOME}

$sudo vim /etc/profile

1	$sudo vim /etc/profile

文件尾部增加

export NUTCH_RUNTIME_HOME=/var/opt/apache-nutch/runtime/local

1	export NUTCH_RUNTIME_HOME=/var/opt/apache-nutch/runtime/local

然后,重启机器。

5.检验Nutch安装

运行"${NUTCH_RUNTIME_HOME}/bin/nutch"。如果您能看见下列内容说明您的安装是正确的：

Usage: nutch COMMAND
where COMMAND is one of:
 inject		inject new urls into the database
 hostinject     creates or updates an existing host table from a text file
 generate 	generate new batches to fetch from crawl db
 fetch 		fetch URLs marked during generate
 parse 		parse URLs marked during fetch
 updatedb 	update web table after parsing
 updatehostdb   update host table after parsing
 readdb 	read/dump records from page database
 readhostdb     display entries from the hostDB
 index          run the plugin-based indexer on parsed batches
 elasticindex   run the elasticsearch indexer - DEPRECATED use the index command instead
 solrindex 	run the solr indexer on parsed batches - DEPRECATED use the index command instead
 solrdedup 	remove duplicates from solr
 solrclean      remove HTTP 301 and 404 documents from solr - DEPRECATED use the clean command instead
 clean          remove HTTP 301 and 404 documents and duplicates from indexing backends configured via plugins
 parsechecker   check the parser for a given url
 indexchecker   check the indexing filters for a given url
 plugin 	load a plugin and run one of its classes main()
 nutchserver    run a (local) Nutch server on a user defined port
 webapp         run a local Nutch web application
 junit         	runs the given JUnit test
 or
 CLASSNAME 	run the class named CLASSNAME
Most commands print help when invoked w/o parameters.

Usage: nutch COMMAND

where COMMAND is one of:

inject inject new urls into the database

hostinject creates or updates an existing host table from a text file

generate generate new batches to fetch from crawl db

fetch fetch URLs marked during generate

parse parse URLs marked during fetch

updatedb update web table after parsing

updatehostdb update host table after parsing

readdb read/dump records from page database

readhostdb display entries from the hostDB

index run the plugin-based indexer on parsed batches

elasticindex run the elasticsearch indexer - DEPRECATED use the index command instead

solrindex run the solr indexer on parsed batches - DEPRECATED use the index command instead

solrdedup remove duplicates from solr

solrclean remove HTTP 301 and 404 documents from solr - DEPRECATED use the clean command instead

clean remove HTTP 301 and 404 documents and duplicates from indexing backends configured via plugins

parsechecker check the parser for a given url

indexchecker check the indexing filters for a given url

plugin load a plugin and run one of its classes main()

nutchserver run a (local) Nutch server on a user defined port

webapp run a local Nutch web application

junit runs the given JUnit test

CLASSNAME run the class named CLASSNAME

Most commands print help when invoked w/o parameters.

一些解决问题的提示：

如果您看见"Permission denied"那么请运行下列命令：

$chmod +x ${NUTCH_RUNTIME_HOME}/bin/nutch

1	$chmod +x ${NUTCH_RUNTIME_HOME}/bin/nutch

如果您看见JAVA_HOME没有设置那么请设置JAVA_HOME环境变量。在Mac上，您可以运行下述命令或者把它添加到~/.bashrc里面去：

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home

1	export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home

6.配置Apache Solr

1.备份需要修改的配置文件

$sudo cp ${NUTCH_RUNTIME_HOME}/conf/schema.xml ${NUTCH_RUNTIME_HOME}/conf/schema.xml.old

1	$sudo cp ${NUTCH_RUNTIME_HOME}/conf/schema.xml ${NUTCH_RUNTIME_HOME}/conf/schema.xml.old

2.复制Nutch运行目录下的schema.xml到我们设置的目录下。

$sudo cp ${NUTCH_RUNTIME_HOME}/conf/schema.xml ${SOLR_HOME}/example/solr/collection1/conf/


$sudo cp ${NUTCH_RUNTIME_HOME}/conf/gora-solr-host-schema.xml ${SOLR_HOME}/example/solr/collection1/conf/gora-solr-schema.xml

$sudo cp ${NUTCH_RUNTIME_HOME}/conf/schema.xml ${SOLR_HOME}/example/solr/collection1/conf/

$sudo cp ${NUTCH_RUNTIME_HOME}/conf/gora-solr-host-schema.xml ${SOLR_HOME}/example/solr/collection1/conf/gora-solr-schema.xml

3.重启Apache Solr

$sudo -E java -Djetty.home=${SOLR_HOME}/example -Djetty.logs=/tmp -Dsolr.solr.home=${SOLR_HOME}/example/solr -Djetty.port=9876 -jar ${SOLR_HOME}/example/start.jar

1	$sudo -E java -Djetty.home=${SOLR_HOME}/example -Djetty.logs=/tmp -Dsolr.solr.home=${SOLR_HOME}/example/solr -Djetty.port=9876 -jar ${SOLR_HOME}/example/start.jar

7.抓取您的第一个网站

添加要抓取的URL（以自己的网站为例）

$cd ~ $mkdir ~/urls $vim ~/urls/seed.txt

1
2
3

$cd ~
$mkdir ~/urls
$vim ~/urls/seed.txt

在seed.txt中添加需要抓取的地址：http://www.mobibrw.com/
启动HBase

$sudo -E $HBASE_HOME/bin/start-hbase.sh

1	$sudo -E $HBASE_HOME/bin/start-hbase.sh

使用如下命令进行网页的抓取（以百度为例）

Shell

$cd ~ $sudo -E ${NUTCH_RUNTIME_HOME}/bin/crawl ~/urls/ StoreCrawl http://localhost:9876/solr/collection1 2

1
2
3

$cd ~

$sudo -E ${NUTCH_RUNTIME_HOME}/bin/crawl ~/urls/ StoreCrawl http://localhost:9876/solr/collection1 2
- ~/urls 是存放了种子url,也就是要抓取的网站地址的目录
- StoreCrawl 是存放数据的根目录（在Nutch 2.x中，则表示crawlId，这会在HBase中创建一张以crawlId为前缀的表，例如StoreCrawl_Webpage）
- "http://localhost:9876/solr/collection1" Apache Solr的访问链接,此处注意，网页访问的链接是"http://localhost:9876/solr/#/collection1",但是Nutch上行数据的链接不可以有"#"，否则会报告"Expected mime type application/octet-stream but got text/html"。
- 2,numberOfRounds，迭代的次数,表明从根网页开始那应该被抓取的链接深度。

注意，如果执行sudo的话，一定要加-E参数，否则会导致提示找不到JAVA_HOME。原因是sudo出于安全原因，默认禁止环境变量输出到子进程中。

执行完成后，不应该出现任何的失败提示才对。

如果执行出错的话，详细的错误信息可以在${NUTCH_RUNTIME_HOME}/logs/hadoop.log中看到。

索引完成后，在Apache Solr中查询的结果如下图所示：

SolrNutch

参考链接

Ubuntu 15.10，12.04 安装Apache Solr 5.4.1

Apache Solr最新的版本是5.4.1，Ubuntu 15.10 ，12.04上安装的步骤如下：

1.下载Apache Solr 5.4.1

$cd ~

$wget http://apache.opencas.org/lucene/solr/5.4.1/solr-5.4.1.tgz

$cd ~

$wget http://apache.opencas.org/lucene/solr/5.4.1/solr-5.4.1.tgz

2.解压缩服务安装脚本

$tar -zxvf solr-5.4.1.tgz solr-5.4.1/bin/install_solr_service.sh --strip-components=2

1	$tar -zxvf solr-5.4.1.tgz solr-5.4.1/bin/install_solr_service.sh --strip-components=2

3.执行安装脚本

$sudo bash ./install_solr_service.sh solr-5.4.1.tgz

1	$sudo bash ./install_solr_service.sh solr-5.4.1.tgz

4.检查服务是否正确安装

$sudo service solr status

1	$sudo service solr status

如果正确安装，会出现如下的提示信息：

● solr.service - LSB: Controls Apache Solr as a Service
   Loaded: loaded (/etc/init.d/solr)
   Active: active (exited) since 日 2016-01-24 20:51:13 CST; 22s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 5035 ExecStart=/etc/init.d/solr start (code=exited, status=0/SUCCESS)

● solr.service - LSB: Controls Apache Solr as a Service

Loaded: loaded (/etc/init.d/solr)

Active: active (exited) since 日 2016-01-24 20:51:13 CST; 22s ago

Docs: man:systemd-sysv-generator(8)

Process: 5035 ExecStart=/etc/init.d/solr start (code=exited, status=0/SUCCESS)

4.创建Solr实例，可以创建多个实例，在这里我们只创建一个

$sudo su - solr -c "/opt/solr/bin/solr create -c solr_default -n data_driven_schema_configs"

1	$sudo su - solr -c "/opt/solr/bin/solr create -c solr_default -n data_driven_schema_configs"

请注意实例的结果输出：

Copying configuration to new core instance directory:
/var/solr/data/solr_default

Creating new core 'solr_default' using command:
http://localhost:8983/solr/admin/cores?action=CREATE&name=solr_default&instanceDir=solr_default

{
  "responseHeader":{
    "status":0,
    "QTime":754},
  "core":"solr_default"}

Copying configuration to new core instance directory:

/var/solr/data/solr_default

Creating new core 'solr_default' using command:

http://localhost:8983/solr/admin/cores?action=CREATE&name=solr_default&instanceDir=solr_default

{

"responseHeader":{

"status":0,

"QTime":754},

"core":"solr_default"}

这意味着，访问实例的URL链接为http://localhost:8983/solr，而不是默认的8080端口，一般如果8080端口被Tomcat7占据的情况下，Apache Solr作为一个服务独立安装的时候，会随机选择一个没有占有的端口，注意这个情况。

ApacheSolr

4.配置刚刚创建的Solr实例

ApacheSolrCoreSeletor

如上图，选择"Core Selector"

ApacheSolrCoreSeletorDocuments

可以这这个页面中"Document Type"下拉框用来选择提交，文件，XML，JSON等等，"Submit Document"按钮，可以提交需要分析的内容。

参考链接：
How To Install Apache Solr In Ubuntu

Ubuntu 15.10系统下制作自己的PPA安装包

Personal Package Archives（个人软件包档案）是Ubuntu Launchpad网站提供的一项服务，允许个人用户上传软件源代码，通过Launchpad进行编译并发布为2进制软件包，作为APT/新立得源供其他用户下载和更新。在Launchpad网站上的每一个用户和团队都可以拥有一个或多个PPA。

1.安装打包需要的软件环境

$ sudo apt-get install packaging-dev

1	$ sudo apt-get install packaging-dev

2.创建自己的GPG KEY

$ gpg --gen-key

1	$ gpg --gen-key

接下来，一路回车，在要求确认信息的时候，点击y。最后一步是一通的键盘乱按。整个过程如下图所示： gpg gpg2

gpg: 正在检查信任度数据库
gpg: 需要 3 份勉强信任和 1 份完全信任，PGP 信任模型
gpg: 深度：0 有效性：  1 已签名：  0 信任度：0-，0q，0n，0m，0f，1u
pub   2048R/47EDFAD4 2016-01-23
      密钥指纹 = 043D A507 281E D8F8 6D79  9806 34F5 C3F3 47ED FAD4
uid                  LongSky (LongSky) <wangqiang1588@sina.com>
sub   2048R/5A23BF98 2016-01-23

gpg: 正在检查信任度数据库

gpg: 需要 3 份勉强信任和 1 份完全信任，PGP 信任模型

gpg: 深度：0 有效性： 1 已签名： 0 信任度：0-，0q，0n，0m，0f，1u

pub 2048R/47EDFAD4 2016-01-23

密钥指纹 = 043D A507 281E D8F8 6D79 9806 34F5 C3F3 47ED FAD4

uid LongSky (LongSky) <wangqiang1588@sina.com>

sub 2048R/5A23BF98 2016-01-23

注意，我们需要的KEY ID为47EDFAD4.

将KEY的公共部分上传到KEY SERVER，这样全世界的开发者就可以根据你的KEY来识别你的信息和文件.

$ gpg --send-keys --keyserver keyserver.ubuntu.com <KEY ID>

1	$ gpg --send-keys --keyserver keyserver.ubuntu.com <KEY ID>

我们自己的Key发送就是

$ gpg --send-keys --keyserver keyserver.ubuntu.com 47EDFAD4

1	$ gpg --send-keys --keyserver keyserver.ubuntu.com 47EDFAD4

3.创建你的SSH KEY

$ ssh-keygen -t rsa

1	$ ssh-keygen -t rsa

4.创建pbuilder(允许开发者在本地创建PPA包）

pbuilder-dist <release> create
where <release> is for example raring, saucy, trusty or in the case of Debian maybe sid.
This will take a while as it will download all the necessary packages for a “minimal installation”. 
These will be cached though.

pbuilder-dist <release> create

where <release> is for example raring, saucy, trusty or in the case of Debian maybe sid.

This will take a while as it will download all the necessary packages for a “minimal installation”.

These will be cached though.

我的系统版本是Ubuntu 15.10 (Wily Werewolf)，所以执行如下命令:

$ pbuilder-dist wily create

1	$ pbuilder-dist wily create

这部分的耗时比较长,会安装全部的编译工具，要耐心等待一下！

5.创建Launchpad账户

帐号可以去Launchpad 官网注册
上传GPG KEY到Launchpad，通过如下指令查看自己的GPG KEY：

$ gpg --fingerprint wangqiang1588@sina.com

1	$ gpg --fingerprint wangqiang1588@sina.com

会得到如下结果：

pub   2048R/47EDFAD4 2016-01-23
      密钥指纹 = 043D A507 281E D8F8 6D79  9806 34F5 C3F3 47ED FAD4
uid                  LongSky (LongSky) <wangqiang1588@sina.com>
sub   2048R/5A23BF98 2016-01-23

pub 2048R/47EDFAD4 2016-01-23

密钥指纹 = 043D A507 281E D8F8 6D79 9806 34F5 C3F3 47ED FAD4

uid LongSky (LongSky) <wangqiang1588@sina.com>

sub 2048R/5A23BF98 2016-01-23

运行如下指令提交你的KEY到Ubuntu Key Server：

$ gpg --send-keys --keyserver keyserver.ubuntu.com 47EDFAD4

1	$ gpg --send-keys --keyserver keyserver.ubuntu.com 47EDFAD4

登陆个人KEY管理面板

将Key fingerprint拷贝到文本框，点击 “Import Key”.
如果导入成功，系统会发送一封邮件到你的邮箱，用来对导入的Key进行验证
其中邮件的内容为不可读的,需要编译,内容类似于：
BEGIN PGP MESSAGE
Version: GnuPG v1.4.3 (GNU/Linux)
hQIOA0THhKozD+K5EAf9F3PcOL2iU6onH2YsvB6IKDXNxbK0NBVy6ppxcNq8hoTe
cuHvzWLFfh1ehhSNe1V6xpuFnt5sJoeA4qEEOxez3HmY80tKIKMPLyhC/8JiSIW9
fwuxj4C0F6pdyrpvGbQAzfPEFk/P1AtIHXm4WLXduhBT7YEpmUk/I4A/KlSrKoiP
J5vBtbroUyp2jvIhDUmY7ToU+ifrDe3+VP1ZzSEJzOOXec9oPbcbvf5NptXA7Hbp
S0ElBAcLjKpAu7VKotCwFZIsVXDHT/mxf2qm88bGIrlXS5uTzvmyhQps1KmyNiCz
I0i5kSVvHZWyVZ+8FrROLqYAqqnEIMg9hUnbFAervgf/YiYs0xxWLYf9e14eoMZA
ranGT72q/JHmBNBYenOijaquFNi1TH5J8Udtt2RfdyRUlmGilxRvtIYL8gpnuNpS
+GHOoBWUN2f4nawaDeqgrf6Nt3qQWWLO4iJPgieejFP2FP6zkLme1t7dXo+z1ary
EZuxSLtKIWkOFEZ8Gcn02hBgOhJZucnkF6BmVW9dr1C4QEAmGM631uqfsp5PapAn
yjHbEU1L2R9i7vPtJNRr6ubFLWg1Yhfv63ByxSx/WQHMMqlrbL+moXBGED3L2hM8
7FP9eapBRgmS+Bda9ArcGMUElTOkWoUYIOPyLOYmo15LvbxHOVaXjn7+fDgr2S1J
R9LArwHycmdKKelRww+ZvylHIfq8xy10atRQIYawchh9A1myXD1TlWbrrIkodQJF
iEpO2i1LKvqwZHOx3szT4hF+44tNFzQIL1j+zF5Hrt2WOTnS5WXGgGRtfEd8F7fN
khQZOAdhwrnlY+yknruC8Y8Jm8vM57+KnPgBfvxuxzLX1XFTfTZCHXeUmwwu3mga
m+6WzckeBGBDHKK6GqwFoOAykTwjyqOZaty7DPHeoINc0tLMVr9Ks64DScf8bgh4
MkNonA0YhMQbkmwRc33APw441+/iLw5gqndQdX44kKqC71dG6LqanAOjD29Xj3JV
ZBsjg95Jrx7Sx+i/V0PUeaU9QjCT0Q1jEy1Bcs8NYtTJnpG+4oHYJ0pyiGxIquQH
V9E+hW6Qehx5DbsIXEvfeaBBHOfAHHOhUH14WK4bsJWm8wZ50XiYBZrNFOqzsm13
2STcY4VIoJp3Uw2qNyvZXQUhpndlfgQGO14CMSadzDn6Vts=
=hTe6
END PGP MESSAGE
将邮件中这些内容拷贝到一个文件中，例如：file.txt，然后执行如下命令：
gpg --decrypt file.txt
此时会提示你输入之前设置的key密码，验证成功后，会有如下提示：
Please go here to finish adding the key to your Launchpad account:
[[https://launchpad.net/token/bP56TDKg8HXQbBs6LsN0]]
点击该链接，选择continue , 完成GPG Key 上传

将Key fingerprint拷贝到文本框，点击 “Import Key”.

如果导入成功，系统会发送一封邮件到你的邮箱，用来对导入的Key进行验证

其中邮件的内容为不可读的,需要编译,内容类似于：

BEGIN PGP MESSAGE

Version: GnuPG v1.4.3 (GNU/Linux)

hQIOA0THhKozD+K5EAf9F3PcOL2iU6onH2YsvB6IKDXNxbK0NBVy6ppxcNq8hoTe

cuHvzWLFfh1ehhSNe1V6xpuFnt5sJoeA4qEEOxez3HmY80tKIKMPLyhC/8JiSIW9

fwuxj4C0F6pdyrpvGbQAzfPEFk/P1AtIHXm4WLXduhBT7YEpmUk/I4A/KlSrKoiP

J5vBtbroUyp2jvIhDUmY7ToU+ifrDe3+VP1ZzSEJzOOXec9oPbcbvf5NptXA7Hbp

S0ElBAcLjKpAu7VKotCwFZIsVXDHT/mxf2qm88bGIrlXS5uTzvmyhQps1KmyNiCz

I0i5kSVvHZWyVZ+8FrROLqYAqqnEIMg9hUnbFAervgf/YiYs0xxWLYf9e14eoMZA

ranGT72q/JHmBNBYenOijaquFNi1TH5J8Udtt2RfdyRUlmGilxRvtIYL8gpnuNpS

+GHOoBWUN2f4nawaDeqgrf6Nt3qQWWLO4iJPgieejFP2FP6zkLme1t7dXo+z1ary

EZuxSLtKIWkOFEZ8Gcn02hBgOhJZucnkF6BmVW9dr1C4QEAmGM631uqfsp5PapAn

yjHbEU1L2R9i7vPtJNRr6ubFLWg1Yhfv63ByxSx/WQHMMqlrbL+moXBGED3L2hM8

7FP9eapBRgmS+Bda9ArcGMUElTOkWoUYIOPyLOYmo15LvbxHOVaXjn7+fDgr2S1J

R9LArwHycmdKKelRww+ZvylHIfq8xy10atRQIYawchh9A1myXD1TlWbrrIkodQJF

iEpO2i1LKvqwZHOx3szT4hF+44tNFzQIL1j+zF5Hrt2WOTnS5WXGgGRtfEd8F7fN

khQZOAdhwrnlY+yknruC8Y8Jm8vM57+KnPgBfvxuxzLX1XFTfTZCHXeUmwwu3mga

m+6WzckeBGBDHKK6GqwFoOAykTwjyqOZaty7DPHeoINc0tLMVr9Ks64DScf8bgh4

MkNonA0YhMQbkmwRc33APw441+/iLw5gqndQdX44kKqC71dG6LqanAOjD29Xj3JV

ZBsjg95Jrx7Sx+i/V0PUeaU9QjCT0Q1jEy1Bcs8NYtTJnpG+4oHYJ0pyiGxIquQH

V9E+hW6Qehx5DbsIXEvfeaBBHOfAHHOhUH14WK4bsJWm8wZ50XiYBZrNFOqzsm13

2STcY4VIoJp3Uw2qNyvZXQUhpndlfgQGO14CMSadzDn6Vts=

=hTe6

END PGP MESSAGE

将邮件中这些内容拷贝到一个文件中，例如：file.txt，然后执行如下命令：

gpg --decrypt file.txt

此时会提示你输入之前设置的key密码，验证成功后，会有如下提示：

Please go here to finish adding the key to your Launchpad account:

[[https://launchpad.net/token/bP56TDKg8HXQbBs6LsN0]]

点击该链接，选择continue , 完成GPG Key 上传

6.上传SSH KEY

打开~/.ssh/id_rsa.pub文件,将其中的内容拷贝到帐号的add ssh key文本框中，选择导入后完成上传

7.配置 Bazzar

(一个版本控制软件,可以储存代码)之所以需要使用Bazzar，原因是Launchpad的默认BUG管理器，并且编译代码的时候，要求用bzr进行代码编译。
首先告诉Bazzar你是谁:

$ bzr whoami "LongSky <wangqiang1588@sina.com>"

$ bzr launchpad-login wangqiang1588（我的Launchpad账户id）

$ bzr whoami "LongSky <wangqiang1588@sina.com>"

$ bzr launchpad-login wangqiang1588（我的Launchpad账户id）

Bazaar in five minutes

官方教程:http://packaging.ubuntu.com/html/packaging-new-software.html
开源许可证:http://opensource.org/licenses

8.配置你的shell环境变量

打开~/.bashrc 文件,在文件开头加上如下内容:

$ export DEBFULLNAME="LongSky"

$ export DEBEMAIL="wangqiang1588@sina.com"

$ export DEBFULLNAME="LongSky"

$ export DEBEMAIL="wangqiang1588@sina.com"

然后执行:

$ source ~/.bashrc

1	$ source ~/.bashrc

9.安装编译工具

$ sudo apt-get install build-essential dh-make

1	$ sudo apt-get install build-essential dh-make

10.下载源代码,以Openyoudao为例

$ wget https://github.com/justzx2011/openyoudao/archive/beta0.2.tar.gz

1	$ wget https://github.com/justzx2011/openyoudao/archive/beta0.2.tar.gz

11.开始打包

a.根据模板生成配置文件

$ bzr dh-make openyoudao 0.2 beta0.2.tar.gz

Fetching tarball                                                               
Looking for a way to retrieve the upstream tarball
Upstream tarball already exists in build directory, using that
                                                                               
Type of package: single binary, indep binary, multiple binary, library, kernel module, kernel patch?
 [s/i/m/l/k/n] s 

Maintainer name  : LongSky
Email-Address    : wangqiang1588@sina.com 
Date             : Sun, 24 Jan 2016 17:09:27 +0800
Package Name     : openyoudao
Version          : 0.2
License          : blank
Type of Package  : Single
Hit <enter> to confirm: 
Skipping creating ../openyoudao_0.2.orig.tar.gz because it already exists
Currently there is no top level Makefile. This may require additional tuning.
Done. Please edit the files in the debian/ subdirectory now. You should also
check that the openyoudao Makefiles install into $DESTDIR and not in / .
Package prepared in /home/longsky/openyoudao

$ bzr dh-make openyoudao 0.2 beta0.2.tar.gz

Fetching tarball

Looking for a way to retrieve the upstream tarball

Upstream tarball already exists in build directory, using that

Type of package: single binary, indep binary, multiple binary, library, kernel module, kernel patch?

[s/i/m/l/k/n] s

Maintainer name : LongSky

Email-Address : wangqiang1588@sina.com

Date : Sun, 24 Jan 2016 17:09:27 +0800

Package Name : openyoudao

Version : 0.2

License : blank

Type of Package : Single

Hit <enter> to confirm:

Skipping creating ../openyoudao_0.2.orig.tar.gz because it already exists

Currently there is no top level Makefile. This may require additional tuning.

Done. Please edit the files in the debian/ subdirectory now. You should also

check that the openyoudao Makefiles install into $DESTDIR and not in / .

Package prepared in /home/longsky/openyoudao

b.将修改提交到打包分支

$ cd /home/longsky/openyoudao

$ bzr commit -m "Initial commit of Debian packaging."

$ cd /home/longsky/openyoudao

$ bzr commit -m "Initial commit of Debian packaging."

c.在当前环境中编译软件包

$ bzr builddeb -- -us -uc

1	$ bzr builddeb -- -us -uc

d.检测软件包是否存在BUG

$ cd ..

$ lintian openyoudao_0.2-1.dsc

$ cd ..

$ lintian openyoudao_0.2-1.dsc

e.给软件包签名(需要输入之前设置的密码)

$ cd /home/longsky/openyoudao

$ debuild -S -k47EDFAD4

$ cd /home/longsky/openyoudao

$ debuild -S -k47EDFAD4

12.上传软件包

a.编辑~/.dput.cf

通过上面的一系列流程，我们生成了一组经过数字签名的文件，最后需要做的是将这些文件上传到PPA官方的FTP，此时会用到一个叫dput的软件，要使用dput,首先要编辑文件~/.dput.cf，没有就创建.在这个文件里面定义要上传的Launchpad账号。我的~/.dput.cf文件内容如下：

[youdao-beta0.2]
	fqdn = ppa.launchpad.net
	method = ftp
	incoming = ~wangqiang1588/openyoudao-v0.2/ubuntu/
	login = anonymous
	allow_unsigned_uploads = 0

[youdao-beta0.2]

fqdn = ppa.launchpad.net

method = ftp

incoming = ~wangqiang1588/openyoudao-v0.2/ubuntu/

allow_unsigned_uploads = 0

b.执行上传

$ dput my-ppa openyoudao_0.2-1ubuntu1_source.changes

1	$ dput my-ppa openyoudao_0.2-1ubuntu1_source.changes

上传成功后会收到一封PPA发来的邮件，内容如下：

Accepted:
OK: openyoudao_0.2.orig.tar.gz
OK: openyoudao_0.2-1ubuntu1.debian.tar.gz
OK: openyoudao_0.2-1ubuntu1.dsc
 -> Component: main Section: net
openyoudao (0.2-1ubuntu1) trusty; urgency=low
* fix issue#8, exception interrupt
https://launchpad.net/~wangqiang1588/+archive/openyoudao-v0.2
You are receiving this email because you are the uploader of the above
PPA package.

Accepted:

OK: openyoudao_0.2.orig.tar.gz

OK: openyoudao_0.2-1ubuntu1.debian.tar.gz

OK: openyoudao_0.2-1ubuntu1.dsc

-> Component: main Section: net

openyoudao (0.2-1ubuntu1) trusty; urgency=low

* fix issue#8, exception interrupt

https://launchpad.net/~wangqiang1588/+archive/openyoudao-v0.2

You are receiving this email because you are the uploader of the above

PPA package.

接下来就静待官方编译吧~
编译完成后，就可以通过ppa源进行安装了
编译进度可查看:http://ppa.launchpad.net/wangqiang1588/

参考链接

如何在ubuntu系统下制作自己的ppa安装包

Ubuntu 15.10 使用Ubuntu Make简化Android开发环境配置

Ubuntu 15.10 中，已经可以使用Ubuntu Make简化Android开发环境配置了，具体方法如下：

1.安装Ubuntu Make

$ sudo apt-get install ubuntu-make

1	$ sudo apt-get install ubuntu-make

2.配置Android开发环境

$ umake android

1	$ umake android

然后根据提示，一步一步操作就可以了。

一会之后，就会发现Android Studio已经安装成功了。

就是这么简单！

Windows下创建Ubuntu的USB系统安装盘

要从U盘运行Ubuntu，需要插入一个至少2G空余空间的U盘。使用USB安装器是将Ubuntu安装到U盘最简单的方式，它由 pendrivelinux.com 提供。下载Pen Drive Linux’s USB安装器后，安装并运行。也可本站下载。

从下拉选项中选择Ubuntu桌面版本，或者自行用BT、迅雷等软件下载Ubuntu ISO文件。

image-createausbstickonwindows-1

点击“浏览”然后打开已下载的ISO文件。 image-createausbstickonwindows-2

选择一个USB驱动器并点击“创建”，注意，U盘里的数据请备份，“创建”过程会格式化U盘。

image-createausbstickonwindows-3

Ubuntu系统本身附带一个小程序Startup Disk Creator，直接制作可启动U盘Ubuntu系统。本站下载

参考链接：

制作Ubuntu USB可启动系统盘解决Windows问题

2025 年 2 月
一	二	三	四	五	六	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28

实例

参考链接

参考链接

安装bashdb

常用的调试命令

参考链接

1.安装Java，并设置JAVA_HOME

2.下载Apache Solr 4.10.4

3.解压缩到指定目录，并建立文件链接

4.启动Apache Solr并设置端口为9876

5.在浏览器中打开网页，观察是否启动成功

参考链接

1.安装Java，并设置JAVA_HOME

2.下载并配置Apache HBase

3.编辑conf/hbase-site.xml配置数据存储目录

4.启动以及关闭Apache HBase

5.参考链接

1.安装Java，并设置JAVA_HOME

2.下载并安装Nutch

3.编译Nutch

4.增加Nutch安装目录的环境变量${NUTCH_RUNTIME_HOME}

5.检验Nutch安装

6.配置Apache Solr

7.抓取您的第一个网站

参考链接

1.安装打包需要的软件环境

2.创建自己的GPG KEY

3.创建你的SSH KEY

4.创建pbuilder(允许开发者在本地创建PPA包）

5.创建Launchpad账户

6.上传SSH KEY

7.配置 Bazzar

8.配置你的shell环境变量

9.安装编译工具

10.下载源代码,以Openyoudao为例

11.开始打包

12.上传软件包

参考链接

参考链接：

3.编辑`conf/hbase-site.xml`配置数据存储目录