Giter VIP home page Giter VIP logo

mdrill's Introduction

项目简介

    数据越来越多,传统的关系型数据库支撑不了,分布式数据仓库又非常贵。几十亿、几百亿、甚至几千亿的数据量,如何才能高效的分析?
mdrill是由阿里妈妈开源的一套数据的软件,针对TB级数据量,能够仅用10台机器,达到秒级响应,数据能实时导入,可以对任意的维度进行组合与过滤。
    mdrill作为数据在线分析处理软件,可以在几秒到几十秒的时间,分析百亿级别的任意组合维度的数据。
在阿里10台机器完成每日30亿的数据存储,其中10亿为实时的数据导入,20亿为离线导入。目前集群的总存储1000多亿80~400维度的数据。
目前有阿里、腾讯、京东、联想、一号店、美团、大街网、亚信、恒隆兴等多家公司在使用。

mdrill的特性

1.满足大数据查询需求:adhoc每天的数据量为30亿条,随着日积月累,数据会越来越大,mdrill采用列存储,索引,分布式技术,适当的分区等满足用户对数据的实时在线分析的需求。
2.支持增量更新:离线形式的mdrill数据支持按照分区方式的增量更新。
3.支持实时数据导入:在仅有10台机器的情况下,支持每天10亿级别(高峰每小时2亿)的实时导入。
4.响应时间快:列存储、倒排索引、高效的数据压缩、内存计算,各种缓存、分区、分布式处理等等这些技术,使得mdrill可以仅在几秒到几十秒的时间分析百亿级别的数据。
5.低成本:目前在阿里adhoc仅仅使用10台48G内存的PC机,但确存储了超过千亿规模的数据。

版本下载

版本下载

资源列表

mdrill contributors

jstorm Core contributors 点击进入

mdrill数据量的增长

时间点

数据量

事件

12年12月

小于2亿

adhoc首次上线

13年1月

20~30亿

由2台机器扩容到了10台

13年5月2日

100亿

首次过百亿

13年7月24日

400亿

首次开源

13年11月

1000亿

全文检索模式ods_allpv_ad_d上线

13年12月

1500亿

实时数据以及无线数据的接入

14年2月

3200亿

11台机器,硬盘使用率30%

14年3月28日

4900亿

11台机器,硬盘使用率60%

其他

mdrill's People

Contributors

bwzheng2010 avatar muyannian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mdrill's Issues

分区以接口的方式独立出来,方便以后拓展新的分区方式

package com.alimama.mdrill.partion;

import java.util.HashMap;
import java.util.HashSet;

import org.apache.hadoop.fs.FileSystem;

public interface MdrillPartionsInterface {
public void setPartionType(String parttype);
public String[] SqlPartions(String queryStr) throws Exception;
public String SqlFilter(String queryStr) throws Exception;

public HashSet<String> getNameList(FileSystem fs,String inputBase,String startPoint,int dayDelay, int maxRunDays) throws Exception;
public HashMap<String,HashSet<String>> indexPartions(HashSet<String> namelist,String startday,int dayDelay, int maxRunDays) throws Exception;
public HashMap<String,String> indexVertify(HashMap<String,HashSet<String>> partions,int shards,String startday,int dayDelay, int maxRunDays) throws Exception;


public StatListenerInterface getStatObj() throws Exception ;

}

package com.alimama.mdrill.partion;

import java.net.MalformedURLException;
import java.util.HashMap;

import org.apache.solr.client.solrj.SolrServerException;

import com.alimama.mdrill.partion.GetPartions.TablePartion;
import com.alimama.mdrill.topology.SolrStartJetty;
import com.alipay.bluewhale.core.cluster.SolrInfo.ShardCount;

public interface StatListenerInterface {
public void init();
public void setPartionType(String parttype);
public void syncClearPartions();
public void addPartionStat(String partion);
public void syncClearStat();
public void fetchCount(SolrStartJetty solrservice,String tablename,TablePartion part) throws MalformedURLException, SolrServerException;
public HashMap<String, ShardCount> getPartioncount() ;
public HashMap<String, ShardCount> getExtaCount() ;
}

逆旋过程改进

突然有一个想法 ,我们的查询 大部分时间都消耗在首次查询的 逆旋上了(50%以上)�所谓的逆旋 就是将一个索引的某个列的值load到内存里(通过倒排表实现)�但是内存中存储的并非是这个列的值的真实的值,而仅仅是这个值代号=>比如说 用9 代表中华人民共和国 这个词�但是通过倒排表逆旋的时候,每次都读一次这个真实值,而且倒排表本身也会有一些额外的存储,比如说各种文件指针的地址�所以目前通过倒排表进行逆旋-速度并非是最优,有很大的提升空间,稍微改造下,额外弄一个伪倒排表文件,这里仅仅存储值的代号,那些根本用不到的原始值不在这里存储,IO将会减少很多,也意味着查询速度更快,而且改造代价不大,对这块代码很熟悉。

现象:rpt_p4padhoc_product很多指标数值偏小的原因分析以及解决

现象:rpt_p4padhoc_product很多指标数值偏小
原因:
该表数据生成过后,上游数据三天后会有一次更新进行补数据,意味着文件会被更改。
如果此时adhoc正在建立索引,上游的文件被删除,导致部分map失败,从而导致任务失败或者该map被忽略计算(hadoop能容忍部分map失败)
所以任务失败的原因为 补数任务与当前创建索引任务冲突导致
分析:
对历史数据进行修改或者补充,为常态,是正常的(因云梯调度或者故障需要进行对历史数据启动补数任务)
Adhoc的增量索引,应该识别这种变化,如果发现数据进行了修改,或者上游数据异常(比如说被删除),应该重新更新索引。

如何解决:
1.Adhoc是否重建索引的依据是判断vertify文件是否匹配,如果发生了变化则重建。
2.vertify原先仅仅保留目录个数,以及建立的日期,缺少目录的{最后修改时间}以及{目录的数据大小},补行后可以根据这个可以识别对文件的更改
3.mapreduce创建索引过程中,索引可能发生变更(新增文件和删除文件),此时要在索引创建完毕,进行索引分发之前,重新检查下原始文件是否发生改变,如果改变,废弃此次mapreduce,由天网任务重新调度。

创建索引报错

Exception in thread "Lucene Merge Thread #4" org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: docs out of order (515 <= 569 ) (out: org.apache.lucene.store.RAMOutputStream@13c93f78)
at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:517)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: org.apache.lucene.index.CorruptIndexException: docs out of order (515 <= 569 ) (out: org.apache.lucene.store.RAMOutputStream@13c93f78)
at org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:99)
at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:614)
at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:546)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:478)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:114)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4297)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3942)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
Exception in thread "Lucene Merge Thread #5" org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: docs out of order (304 <= 339 ) (out: org.apache.lucene.store.RAMOutputStream@49c03579)
at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:517)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: org.apache.lucene.index.CorruptIndexException: docs out of order (304 <= 339 ) (out: org.apache.lucene.store.RAMOutputStream@49c03579)
at org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:99)
at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:614)
at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:546)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:478)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:114)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4297)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3942)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
Exception in thread "Lucene Merge Thread #3" org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: docs out of order (102 <= 155 ) (out: org.apache.lucene.store.RAMOutputStream@40542fa7)
at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:517)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Concurren

frq文件存储改进

mdrill的列式存储是采用基于lucene的倒排索引来实现
所谓的倒排索引其实就是这样的
列的值=>对应的文档ID,我们举例如下

张三=>1,2,3,4,5,6,7,8,9,10,11,12,13
李四=>14,15,17,19,21,24,28,29,30,31,32,33
王五=>34,35,37,39,40,41

所以倒排索引里保存了每个列的所有值,以及每个值对应的文档ID列表
在真实的存储中,文档ID列表采用差值存储,就是与前一个文档相减后在存储,以上上面的例子位例子,相减后的结果如下

张三=>1,1,1,1,1,1,1,1,1,1,1,1
李四=>14,1,2,3,2,5,1,1,1,1,1,1
王五=>34,1,2,2,2,1,1

这样经过相减后,文档的数值变小,保存在文件中采用变长的方式存储(占用空间相对比较小)
文档ID的列表最终存储在frq文件中

针对adhoc的使用场景
1,数据的条数特别多
2,真实场景中很多列有重复值,比如说日期,年龄,性别,类目,以及大量涉及金钱,次数的列等
这样会导致像上面某一个列的值后面对应大量的文档ID(百万级别),经过文档差值相减后会存在大量的数值很小的,但是重复度特别高的连续的值,比如说上面的张三,经过差值相减后结果是连续的6个1,那么如果有500W条记录,都是连续的6个1,我们保存500W个1,太不可取了,如果仅仅保存500万一个数+1这个值,想象我们的存储空间会降低好多倍,查询的时候需要读取的IO就少好多倍,针对这种情形,我们有必要考虑压缩这些值,将来查询的性能会提升很多。

mdrill实时数据源之索引结构

索引分为两种,本地索引和内存索引
索引根据是否对外提供服务 分3总类型 A类表示可以只读,B类表示可写,C表示临时

对外提供读的服务索引如下
Index 1 A
Index 2 A
Index N A
RAM A

提供写服务的索引如下
RAM B

当B体积增加到阈值或超过设定时间后
RAM A + RAM B =>RAM C
然后 RAM B to clear,RAM A=RAM C

当RAM A增加到一定体积后
RAM A+INDEX N A=>INDEX C
然后清理RAM A,INDEX N A,然后INDEX C转换为INDEX A

当INDEX N的N超过设定阈值后,借助INDEX C 合并最小的两个INDEX为新的INDEX

merger server查询的时候偶尔会报错误

    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

2013-07-31 07:53:35 SolrCore [ERROR] org.apache.solr.common.SolrException: java.lang.Exception: 10.246.45.44:51166/solr/rpt_p4padhoc_cust@2013073@33
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:175)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1502)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:264)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:198)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.Exception: 10.246.45.44:51166/solr/rpt_p4padhoc_cust@2013073@33
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:316)
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:290)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException: this IndexReader is closed org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:297) at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:268) at org.apache.solr.search.SolrIndexReader.decRef(SolrIndexReader.java:434) at org.apache.solr.search.SolrIndexSearcher.close(SolrIndexSearcher.java:92) at org.apache.solr.core.SolrCore.clearPartion(SolrCore.java:944) at org.apache.solr.core.SolrCore.DropSearch(SolrCore.java:980) at org.apache.solr.core.SolrCore$2.removeEldestEntry(SolrCore.java:900) at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:410) at java.util.HashMap.put(HashMap.java:385) at org.apache.lucene.util.cache.SimpleMapCache.put(SimpleMapCache.java:52) at org.apache.lucene.util.cache.SimpleMapCache$SynchronizedSimpleMapCache.put(SimpleMapCache.java:94) at org.apache.solr.core.SolrCore.getSearcherByPath(SolrCore.java:1064) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:997) at org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:220) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:206) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:102) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1502) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:264) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:198) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.han

2013-08-06 16:02:34 SolrCore [ERROR] java.lang.ArrayIndexOutOfBoundsException: 33554431

2013-08-06 16:02:34 ShardGroupByTermNumCompare [INFO] ####SortType [sortFieldNum=-1, typeNum=0, typeEnum=index]
2013-08-06 16:02:34 MdrillGroupBy [INFO] ##baseDocs.size## 24749@397300
2013-08-06 16:02:34 SolrCore [ERROR] java.lang.ArrayIndexOutOfBoundsException: 33554431
at org.apache.solr.request.uninverted.NumberedTermEnum.skipTo(NumberedTermEnum.java:140)
at org.apache.solr.request.uninverted.UnInvertedField.getTermText(UnInvertedField.java:432)
at org.apache.solr.request.uninverted.UnInvertedFieldTermNumRead$TermNumReadSingleNotNull.tNumToString(UnInvertedFieldTermNumRead.java:78)
at org.apache.solr.request.uninverted.UnInvertedFieldTermNumRead$TermNumReadSingle.tNumToString(UnInvertedFieldTermNumRead.java:193)
at org.apache.solr.request.uninverted.UnInvertedField.tNumToString(UnInvertedField.java:402)
at org.apache.solr.request.mdrill.MdrillPorcessUtils$TermNumToString.fetchValues(MdrillPorcessUtils.java:94)
at org.apache.solr.request.mdrill.MdrillGroupBy.prefetch(MdrillGroupBy.java:372)
at org.apache.solr.request.mdrill.MdrillGroupBy.transGroupValue(MdrillGroupBy.java:386)
at org.apache.solr.request.mdrill.MdrillGroupBy.execute(MdrillGroupBy.java:261)
at org.apache.solr.request.mdrill.MdrillGroupBy.getCross(MdrillGroupBy.java:110)
at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:314)
at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:201)
at org.apache.solr.request.mdrill.FacetComponent.process(FacetComponent.java:81)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:102)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1506)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:264)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:198)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

2013-08-06 16:02:34 SolrCore [INFO] [rpt_p4padhoc_product] webapp=/solr path=/select params={facet.cross=true&facet.cross.join=@&facet=true&hadoop.merger.hashindex=32&facet.cross.offset=0&facet.cross.sort.fl=higoempty_sort_s&indexpartion=2013073&join.path.31d5a2cf-4f5c-4eef-a86c-0298855340bf=/group/taobao/external/p4p/p4padhoc/download/offline/20130726/d38df7db-a176-4e5d-8c30-16ce31999507/index/part-00000&version=2&facet.cross.sort.desc=true&fl=higo_uuid,score&facet.cross.limit=10010&facet.field=thedate&facet.field=user_id&fsv=true&fq=thedate:20130724&join.rightkey.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_1_s&facet.cross.sort.cp=string&join.fq.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_1_s:99&join.fq.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_1_s:9&join.leftkey.31d5a2cf-4f5c-4eef-a86c-0298855340bf=user_id&join.fl.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_0_s&join.fl.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_2_s&join.fl.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_1_s&higo_ms_depth=3&wt=javabin&rows=0&facet.sort=index&start=0&q=:&join.tables=31d5a2cf-4f5c-4eef-a86c-0298855340bf&join.sort.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_1_s+&facet.cross.sort.tp=index&higolockCount=10241&maxshards=8&facet.cross.fl=clickcount0&merger.shards.issub=true&mergeservers=10.246.45.24:51275/solr/rpt_p4padhoc_product,10.246.45.23:51274/solr/rpt_p4padhoc_product,10.246.45.42:51277/solr/rpt_p4padhoc_product,10.246.45.41:51276/solr/rpt_p4padhoc_product,10.246.45.21:51272/solr/rpt_p4padhoc_product,10.246.45.22:51273/solr/rpt_p4padhoc_product,10.246.45.43:51270/solr/rpt_p4padhoc_product,10.246.45.44:51271/solr/rpt_p4padhoc_product,10.246.45.24:51275/solr/rpt_p4padhoc_product,10.246.45.23:51274/solr/rpt_p4padhoc_product,&isShard=true} hits=397300 status=500 QTime=93
2013-08-06 16:02:34 SolrDispatchFilter [ERROR] java.lang.ArrayIndexOutOfBoundsException: 33554431
at org.apache.solr.request.uninverted.NumberedTermEnum.skipTo(NumberedTermEnum.java:140)
at org.apache.solr.request.uninverted.UnInvertedField.getTermText(UnInvertedField.java:432)
at org.apache.solr.request.uninverted.UnInvertedFieldTermNumRead$TermNumReadSingleNotNull.tNumToString(UnInvertedFieldTermNumRead.java:78)
at org.apache.solr.request.uninverted.UnInvertedFieldTermNumRead$TermNumReadSingle.tNumToString(UnInvertedFieldTermNumRead.java:193)
at org.apache.solr.request.uninverted.UnInvertedField.tNumToString(UnInvertedField.java:402)
at org.apache.solr.request.mdrill.MdrillPorcessUtils$TermNumToString.fetchValues(MdrillPorcessUtils.java:94)
at org.apache.solr.request.mdrill.MdrillGroupBy.prefetch(MdrillGroupBy.java:372)
at org.apache.solr.request.mdrill.MdrillGroupBy.transGroupValue(MdrillGroupBy.java:386)
at org.apache.solr.request.mdrill.MdrillGroupBy.execute(MdrillGroupBy.java:261)
at org.apache.solr.request.mdrill.MdrillGroupBy.getCross(MdrillGroupBy.java:110)
at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:314)
at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:201)
at org.apache.solr.request.mdrill.FacetComponent.process(FacetComponent.java:81)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:102)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1506)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:264)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:198)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

日志报错问题追踪

java.lang.NullPointerException
at org.apache.lucene.index.SegmentReader.getQuickPos(SegmentReader.java:502)
at org.apache.solr.request.uninverted.UnInvertedField.uninvert(UnInvertedField.java:466)
at org.apache.solr.request.uninverted.UnInvertedField.(UnInvertedField.java:88)
at org.apache.solr.request.uninverted.UnInvertedField.getUnInvertedField(UnInvertedField.java:720)
at org.apache.solr.request.mdrill.MdrillPorcessUtils$UnvertFields.(MdrillPorcessUtils.java:324)
at org.apache.solr.request.mdrill.MdrillGroupBy.execute(MdrillGroupBy.java:230)
at org.apache.solr.request.mdrill.MdrillGroupBy.getCross(MdrillGroupBy.java:115)
at org.apache.lucene.index.SegmentReader.invertScan(SegmentReader.java:559)
at org.apache.lucene.index.DirectoryReader.invertScan(DirectoryReader.java:591)
at org.apache.lucene.index.FilterIndexReader.invertScan(FilterIndexReader.java:317)
at org.apache.solr.request.mdrill.FacetComponent.getByGroupby(FacetComponent.java:115)
at org.apache.solr.request.mdrill.FacetComponent.process(FacetComponent.java:88)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:101)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1510)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:264)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:198)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
2013-09-26 10:40:21 SegmentReader [INFO] ##getCount##p4p_e_gmv_direct_amtfileNum#-1#count#0
2013-09-26 10:40:21 UnInvertedField [INFO] setSingleValue p4p_e_gmv_direct_amt field false@768181715_8795

在执行 ./bluewhale mdrill create ./create.sql时报异常

Exception in thread "main" java.io.IOException: Call to Master.Hadoop/192.168.1.131:9000 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at com.alimama.mdrill.topology.MdrillMain.createtable(MdrillMain.java:73)
at com.alimama.mdrill.topology.MdrillMain.main(MdrillMain.java:44)
at com.alimama.bluewhale.core.drpc.Mdrill.main(Mdrill.java:11)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:50

心跳优化

当表越来越多的时候 表的心跳超时时间 修改
分为单表超时时间比较长
和整体超时时间(单位时间内没有一个表成功发送过心跳) 这种情况间隔设置较短

;引起的SQL解析错误

开心延年[email protected] 15:14:36
select thedate, count(thedate) as cnt from rpt_hitfake_auctionall_d where thedate <='20130814' and thedate >= '20130812' group by thedate ;
你吧后面的那个 ;去掉看看

那个 “;” 符号

@*-daemon
*-daemon(350385384) 15:15:44
好了。。
不加分号就行了
parser的bug吧。。
开心延年[email protected] 15:16:00
好吧 这个肯定是程序BUG了
*-daemon(350385384) 15:15:59
多谢
开心延年[email protected] 15:16:08
;我应该没处理好

第一个问题,在0.8.0 以下的storm里面, 当zk不稳定时, worker有可能没有彻底杀死,导致同一个slot 多个进程跑

封仲淹(纪君祥)(32147704) 11:16:37
第一个问题,在0.8.0 以下的storm里面, 当zk不稳定时, worker有可能没有彻底杀死,导致同一个slot 多个进程跑
jstorm修复了这个bug
storm 0.8.2 修复了这个bug
开心延年[email protected] 11:17:26
恩 第二个靠谱点 我去看看代码 呵呵

如何解决的呀
是强制kill嘛
封仲淹(纪君祥)(32147704) 11:18:07
原因就是, worker创建心跳节点必须是第一步,而不是等连接zk后创建心跳节点
不是强制kill的问题
封仲淹(纪君祥)(32147704) 11:19:13
当时我这边zk就很不稳定, 结果发现很多进程跑在同一个slot上, 后来debug,发现这个原因
开心延年[email protected] 11:19:17
就是worker->supervisor的心跳 优先创建
后创建zk的心跳
封仲淹(纪君祥)(32147704) 11:19:24

封仲淹(纪君祥)(32147704) 11:19:59
其实不是心跳,是进程pid文件, 第一步创建进程pid文件
开心延年[email protected] 11:20:31
storm官方有这个的描述没?
封仲淹(纪君祥)(32147704) 11:20:37
没有
开心延年[email protected] 11:20:59
我去扣你的代码去
封仲淹(纪君祥)(32147704) 11:21:10
官方0.8.0 就修复了这个问题
开心延年[email protected] 11:21:29
我这边还是jstorm比较老的版本
很久没有同步了
封仲淹(纪君祥)(32147704) 11:22:06
你就用github的
那个上面是最新的
开心延年[email protected] 11:23:05
嗯哪

sql查询时报错.

DistributedClusterState [ERROR] get_children
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /higo
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1586)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:184)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:173)
at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:169)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:161)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:36)
at com.alipay.bluewhale.core.zk.Zookeeper.getChildren(Zookeeper.java:170)
at com.alipay.bluewhale.core.cluster.DistributedClusterState.get_children(DistributedClusterState.java:105)
at com.alipay.bluewhale.core.cluster.StormZkClusterState.higo_tableList(StormZkClusterState.java:433)
at com.alimama.web.TableList.getTablelist(TableList.java:22)
at org.apache.jsp.tablelist_jsp._jspService(org.apache.jsp.tablelist_jsp:60)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:109)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:389)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:486)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:380)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)

join的fieldvaluecache没生效

lelist/r_rpt_tanx_adzone_total/index]



2013-08-02 14:33:56 SolrStartJetty [INFO] higolog heartbeat called:adhoc
2013-08-02 14:33:56 SolrStartJetty [INFO] higolog heartbeat called:adhoc
2013-08-02 14:34:05 SolrStartJetty [INFO] higolog heartbeat called:adhoc
2013-08-02 14:34:08 HigoJoinUtils [INFO] begin clean /disk10/taobao/bluewhile/higojoin_work,600000,1375425248391
2013-08-02 14:34:08 HigoJoinUtils [INFO] ###joinpath###/disk10/taobao/bluewhile/higojoin_work/31d5a2cf-4f5c-4eef-a86c-0298855340bf
2013-08-02 14:34:08 HigoJoinInvert [INFO] ##fqlist.size()##3
2013-08-02 14:34:08 HigoJoinInvert [INFO] ##joinright##31117
2013-08-02 14:34:08 HigoJoin [INFO] ###termok###10000621>>>>10000621,user_id,cols_1_s
2013-08-02 14:34:08 SolrCore [INFO] ####KeyInput#####=102400
2013-08-02 14:34:08 SolrCore [INFO] ####KeyInput#####=102400
2013-08-02 14:34:08 HigoJoin [INFO] ###termok###100006473>>>>100006473,user_id,cols_1_s
2013-08-02 14:34:08 HigoJoin [INFO] ###termok###100006522>>>>100006522,user_id,cols_1_s
2013-08-02 14:34:08 HigoJoin [INFO] ###termok###100007368>>>>100007368,user_id,cols_1_s
2013-08-02 14:34:08 HigoJoin [INFO] ###termok###100008692>>>>100008692,user_id,cols_1_s
2013-08-02 14:34:08 HigoJoin [INFO] ###termok###10000940>>>>10000940,user_id,cols_1_s
2013-08-02 14:34:08 HigoJoin [INFO] ###termok###10001114>>>>10001114,user_id,cols_1_s
2013-08-02 14:34:08 HigoJoin [INFO] ###termok###100012245>>>>100012245,user_id,cols_1_s
2013-08-02 14:34:08 HigoJoin [INFO] ###termok###100013143>>>>100013143,user_id,cols_1_s
2013-08-02 14:34:08 HigoJoin [INFO] ###termok###10001744>>>>10001744,user_id,cols_1_s
2013-08-02 14:34:24 ConnectionStateManager [INFO] State change: SUSPENDED
2013-08-02 14:34:24 ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.
2013-08-02 14:34:24 DistributedClusterState [WARN] Received event Disconnected:None:null with disconnected Zookeeper.
2013-08-02 14:34:25 ConnectionStateManager [INFO] State change: RECONNECTED
2013-08-02 14:34:25 ConnectionStateManager [WARN] There are no ConnectionStateListeners registered.
2013-08-02 14:34:25 HigoJoin [INFO] ###termbreak###null>>>>null,user_id,cols_1_s
2013-08-02 14:34:25 HigoJoin [INFO] ###join###370260,4204338
2013-08-02 14:34:26 SolrCore [INFO] ####fieldvaluecache####1563@1648mb,size=8721,mem=84mb,key rpt_p4padhoc_product@2013073@1375420712010@user_id@1483291153_1153331319_53@/disk10/taobao/bluewhile/higojoin_work/31d5a2cf-4f5c-4eef-a86c-0298855340bf@cols_1_s@436511011_3491129748_52

join报错

ue} hits=397500 status=500 QTime=23058
2013-08-02 17:10:01 SolrDispatchFilter [ERROR] java.lang.NullPointerException
at org.apache.solr.request.mdrill.MdrillDetail.prefetchValues(MdrillDetail.java:198)
at org.apache.solr.request.mdrill.MdrillDetail.transGroupValue(MdrillDetail.java:229)
at org.apache.solr.request.mdrill.MdrillDetail.topRows(MdrillDetail.java:373)
at org.apache.solr.request.mdrill.MdrillDetail.execute(MdrillDetail.java:145)
at org.apache.solr.request.mdrill.MdrillDetail.getDetail(MdrillDetail.java:91)
at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:310)
at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:201)
at org.apache.solr.request.mdrill.FacetComponent.process(FacetComponent.java:78)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:102)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1506)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:264)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:198)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

2013-08-02 17:10:07 BlockBufferInput [INFO] ##buffer_close##@free:99#malloc:11189:2653@create:0
2013-08-02 17:10:07 BlockBufferInput [INFO] ##buffer_close##@free:99#malloc:11189:2653@create:0
2013-08-02 17:10:07 SolrStartJetty [INFO] higolog heartbeat called:adhoc
2013-08-02 17:10:20 SolrStartJetty [INFO] higolog heartbeat called:adhoc
2013-08-02 17:10:21 HigoJoin [INFO] ###termbreak###null>>>>null,user_id,cols_1_s
2013-08-02 17:10:21 HigoJoin [INFO] ###join###370421,4204405
2013-08-02 17:10:22 SolrCore [INFO] ####fieldvaluecache####1564@1648mb,size=7111,mem=84mb,key rpt_p4padhoc_product@2013073@1375420672264@user_id@1482458059_3267686919_53@/disk10/taobao/bluewhile/higojoin_work/31d5a2cf-4f5c-4eef-a86c-0298855340bf@cols_1_s@436511011_3491129748_52
2013-08-02 17:10:22 SolrCore [ERROR] java.lang.NullPointerException
at org.apache.solr.request.mdrill.MdrillDetail.prefetchValues(MdrillDetail.java:198)
at org.apache.solr.request.mdrill.MdrillDetail.transGroupValue(MdrillDetail.java:229)
at org.apache.solr.request.mdrill.MdrillDetail.topRows(MdrillDetail.java:373)
at org.apache.solr.request.mdrill.MdrillDetail.execute(MdrillDetail.java:145)
at org.apache.solr.request.mdrill.MdrillDetail.getDetail(MdrillDetail.java:91)
at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:310)
at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:201)
at org.apache.solr.request.mdrill.FacetComponent.process(FacetComponent.java:78)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:102)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1506)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:264)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:198)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

2013-08-02 17:10:22 SolrCore [INFO] [rpt_p4padhoc_product] webapp=/solr path=/select params={facet.cross=true&facet.cross.join=@&facet=true&hadoop.merger.hashindex=37&facet.cross.offset=0&facet.cross.sort.fl=higoempty_count_l&indexpartion=2013073&join.path.31d5a2cf-4f5c-4eef-a86c-0298855340bf=/group/taobao/external/p4p/p4padhoc/download/offline/20130726/d38df7db-a176-4e5d-8c30-16ce31999507/index/part-00000&version=2&facet.cross.sort.desc=true&fl=higo_uuid,score&facet.cross.limit=10010&facet.field=thedate&facet.field=user_id&fsv=true&fq=thedate:20130724&join.rightkey.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_1_s&facet.cross.isdetail=true&facet.cross.sort.cp=tdouble&join.leftkey.31d5a2cf-4f5c-4eef-a86c-0298855340bf=user_id&join.fq.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_1_s:99&join.fq.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_1_s:9&join.fl.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_0_s&join.fl.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_2_s&join.fl.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_1_s&higo_ms_depth=3&wt=javabin&rows=0&facet.sort=index&start=0&q=:&join.tables=31d5a2cf-4f5c-4eef-a86c-0298855340bf&join.sort.31d5a2cf-4f5c-4eef-a86c-0298855340bf=cols_1_s+&higolockCount=10241&maxshards=8&merger.shards.issub=true&mergeservers=10.246.45.24:51275/solr/rpt_p4padhoc_product,10.246.45.23:51274/solr/rpt_p4padhoc_product,10.246.45.42:51277/solr/rpt_p4padhoc_product,10.246.45.41:51276/solr/rpt_p4padhoc_product,10.246.45.21:51272/solr/rpt_p4padhoc_product,10.246.45.22:51273/solr/rpt_p4padhoc_product,10.246.45.43:51270/solr/rpt_p4padhoc_product,10.246.45.44:51271/solr/rpt_p4padhoc_product,10.246.45.24:51275/solr/rpt_p4padhoc_product,10.246.45.23:51274/solr/rpt_p4padhoc_product,&isShard=true} hits=397500 status=500 QTime=24632
2013-08-02 17:10:22 SolrDispatchFilter [ERROR] java.lang.NullPointerException
at org.apache.solr.request.mdrill.MdrillDetail.prefetchValues(MdrillDetail.java:198)
at org.apache.solr.request.mdrill.MdrillDetail.transGroupValue(MdrillDetail.java:229)
at org.apache.solr.request.mdrill.MdrillDetail.topRows(MdrillDetail.java:373)
at org.apache.solr.request.mdrill.MdrillDetail.execute(MdrillDetail.java:145)
at org.apache.solr.request.mdrill.MdrillDetail.getDetail(MdrillDetail.java:91)
at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:310)
at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:201)
at org.apache.solr.request.mdrill.FacetComponent.process(FacetComponent.java:78)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:102)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1506)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:264)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:198)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

2013-08-02 17:10:26 SolrStartJetty [INFO] higolog heartbeat called:adhoc
2013-08-02 17:10:36 SolrStartJetty [INFO] higolog heartbeat called:adhoc
2013-08-02 17:10:36 SolrStartTable [INFO] higolog heartbeat:r_rpt_cps_adhoc_pid,cop

查询表时zookeeper信息查询不到

2013-10-14 17:01:16 MdrillService [INFO] higorequest:mdrill_fact_ec_ref,0,20,[{"thedate":{"value":["20130901"],"operate":"1"}}],thedate,count(),thedate,null,null,null,
2013-10-14 17:01:16 MdrillService [INFO] query:mdrill_fact_ec_ref,0,20,[{"thedate":{"value":["20130901"],"operate":"1"}}],thedate,count(
),thedate,null,null,null,
2013-10-14 17:01:17 GetShards [INFO] sync from zookeeper mdrill_fact_ec_ref
2013-10-14 17:01:17 DistributedClusterState [ERROR] get_children
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /higo/mdrill_fact_ec_ref
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1586)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:184)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:173)
at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:169)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:161)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:36)
at com.alipay.bluewhale.core.zk.Zookeeper.getChildren(Zookeeper.java:170)
at com.alipay.bluewhale.core.cluster.DistributedClusterState.get_children(DistributedClusterState.java:105)
at com.alipay.bluewhale.core.cluster.StormZkClusterState.higo_ids(StormZkClusterState.java:416)
at com.alimama.mdrill.partion.GetShards$SolrInfoList.run(GetShards.java:65)
at com.alimama.mdrill.partion.GetShards$SolrInfoList.maybeRefresh(GetShards.java:46)
at com.alimama.mdrill.partion.GetShards.getSolrInfoList(GetShards.java:94)
at com.alimama.mdrill.ui.service.MdrillService.result(MdrillService.java:199)
at org.apache.jsp.result_jsp._jspService(org.apache.jsp.result_jsp:73)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:93)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:470)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:364)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

查询表时候需要到zk的 HIGO_ROOT/表名 路径下查询信息,但是这个节点并未被创建过,zk下只有storm.zookeeper.root 这个配置项定义的路径被创建,在创建表和启动表时均没发现有在zk写 HIGO_ROOT/表名 这个路径下信息的行为。版本0.18

索引里 我对每个列去了空格

子落 (16:27:55):
索引里 我对每个列去了空格

hive没有去

恩 BUG 我不应该干预原始数据

子落 (16:28:55):
原始数据有空格 就应该保

去掉锁以后,同时大量的并发查询出现GC的情况

2013-09-27 12:14:47 UnInvertedField [ERROR] readFail
java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.solr.request.BigReUsedBuffer$BlockInteger.(BigReUsedBuffer.java:150)
at org.apache.solr.request.BigReUsedBuffer$2.create(BigReUsedBuffer.java:82)
at org.apache.solr.request.BigReUsedBuffer.calloc(BigReUsedBuffer.java:52)
at org.apache.solr.request.uninverted.UnInvertedField.setSingleValue(UnInvertedField.java:76)
at org.apache.solr.request.uninverted.UnInvertedField.uninvert(UnInvertedField.java:280)
at org.apache.solr.request.uninverted.UnInvertedField.(UnInvertedField.java:495)
at org.apache.solr.request.uninverted.UnInvertedField.getUnInvertedField(UnInvertedField.java:532)
at org.apache.solr.request.mdrill.MdrillUtils$UnvertFields.(MdrillUtils.java:168)
at org.apache.solr.request.mdrill.MdrillParseGroupby$fetchContaioner.(MdrillParseGroupby.java:119)
at org.apache.solr.request.mdrill.MdrillParseGroupby.createContainer(MdrillParseGroupby.java:84)
at org.apache.solr.request.mdrill.MdrillGroupBy.get(MdrillGroupBy.java:65)
at org.apache.lucene.index.SegmentReader.invertScan(SegmentReader.java:559)
at org.apache.lucene.index.DirectoryReader.invertScan(DirectoryReader.java:591)
at org.apache.lucene.index.FilterIndexReader.invertScan(FilterIndexReader.java:317)
at org.apache.solr.request.mdrill.FacetComponent.getResult(FacetComponent.java:106)
at org.apache.solr.request.mdrill.FacetComponent.process(FacetComponent.java:81)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:101)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1510)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:264)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:198)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)

java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0 *.txt

4pvip string true false
p4p_e_alipay_outshop_cnt tlong true false
p4p_e_alipay_outshop_amt tdouble true false
p4p_e_alipay_direct_cnt tlong true false
p4p_e_alipay_direct_amt tdouble true false
p4p_e_alipay_indirect_cnt tlong true false
p4p_e_alipay_indirect_amt tdouble true false
p4p_e_gmv_indirect_cnt tlong true false
p4p_e_gmv_outshop_cnt tlong true false
higo_uuid tlong true true
13/08/19 03:27:07 INFO input.FileInputFormat: Total input paths to process : 1
13/08/19 03:27:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/08/19 03:27:07 WARN snappy.LoadSnappy: Snappy native library not loaded
13/08/19 03:27:08 INFO mapred.JobClient: Running job: job_201308181947_0428
13/08/19 03:27:09 INFO mapred.JobClient: map 0% reduce 0%
13/08/19 03:27:18 INFO mapred.JobClient: Task Id : attempt_201308181947_0428_m_000000_0, Status : FAILED
java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*.txt
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.sequence(Pattern.java:2090)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.lang.String.split(String.java:2313)
at com.alimama.mdrill.index.IndexMapper.line(IndexMapper.java:134)
at com.alimama.mdrill.index.IndexMapper.map(IndexMapper.java:205)
at com.alimama.mdrill.index.IndexMapper.map(IndexMapper.java:23)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform

这个是我的数据格式问题么?
数据格式如下:
20120512\00189\001军事\001飞机\001192.168.1.122\00123\00130\00189\00134\00198\00190\00190\00190

how to create the test data??

Hi, i have install mdrill followed the doc of installation mdrill .
When i have create data on hdfs and parition by thedate, but i exectue the query, i got no result and no erro info in the log also, I didn't know why.
So, could you give me more details about how to create data on hdfs correctly and how to use it because the details you have given is too blurry for me.
just a small example would be better !
Expecting your reply, thanks!

merger server查询IO优化

mdrill每个shard返回top 1W条记录,通过层层的merger server 合并,最终返回给用户,最终每页仅仅显示20条,如果都是长文本,并且很多列,60个shard意味着返回60W条长文本的记录,但最终只用到其中的20条,很浪费IO

改为,每次返回的1W条仅仅是记录的CRC32的值,等真正的呈现给用户的时候,在进行一次查询,将crc32的值转换给真实的值,仅仅将那20条记录返回给用户,从而起到优化网络IO的作用

查询明细-按照join的小表的某一列排序报错

http://110.75.67.137:9999/result.jsp?project=rpt_p4padhoc_product&fl=thedate,user_id &groupby=thedate,user_id&q=[{%22thedate%22%3A{%22operate%22%3A9%2C%22value%22%3A[%2220130724%22%2C%2220130724%22]}}]&leftjoin=%5B%7Btablename%3A%2231d5a2cf-4f5c-4eef-a86c-0298855340bf%22%2Cfl%3A%22cols_0_s%2Ccols_2_s%2Ccols_1_s%22%2Cfq%3A+%5B%7B%22cols_1_s%22%3A%7B%22operate%22%3A%221%22%2C%22value%22%3A%5B%22_99_%22%5D%7D%7D%2C%7B%22cols_1_s%22%3A%7B%22operate%22%3A%221%22%2C%22value%22%3A%5B%22_9_%22%5D%7D%7D%5D%2Cleftkey%3A%22user_id%22%2Crightkey%3A%22cols_1_s%22%2Cprefix%3A%22kaixin%22%2Cpath%3A%22%2Fgroup%2Ftaobao%2Fexternal%2Fp4p%2Fp4padhoc%2Fdownload%2Foffline%2F20130726%2Fd38df7db-a176-4e5d-8c30-16ce31999507%2Findex%22%2C%22sort%22%3A%22cols_1_s+%22%7D%5D&rows=20&start=0&sort=sum(clickcount0)&order=desc&callback=123

org.apache.solr.common.SolrException: Index: 0, Size: 0 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.solr.request.compare.GroupbyRow.getStatVal(GroupbyRow.java:238) at org.apache.solr.request.compare.ShardGroupByGroupbyRowCompare$CompareSum.getCompareValue(ShardGroupByGroupbyRowCompare.java:205) at org.apache.solr.request.compare.ShardGroupByGroupbyRowCompare$CompareSum.compare(ShardGroupByGroupbyRowCompare.java:196) at org.apache.solr.request.compare.ShardGroupByGroupbyRowCompare.compare(ShardGroupByGroupbyRowCompare.java:105) at org.apache.solr.request.compare.ShardGroupByGroupbyRowCompare.compare(ShardGroupByGroupbyRowCompare.java:14) at java.util.Arrays.mergeSort(Arrays.java:1270) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.sort(Arrays.java:1210) at java.util.Collections.sort(Collections.java:159) at org.apache.solr.request.mdrill.MdrillGroupBy.toNameList(MdrillGroupBy.java:158) at org.apache.solr.request.mdrill.MdrillGroupBy.getCross(MdrillGroupBy.java:109) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:314) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:201) at org.apache.solr.request.mdrill.FacetComponent.process(FacetComponent.java:78) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:102) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1506) at org.apache

Index: 0, Size: 0 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.solr.request.compare.GroupbyRow.getStatVal(GroupbyRow.java:238) at org.apache.solr.request.compare.ShardGroupByGroupbyRowCompare$CompareSum.getCompareValue(ShardGroupByGroupbyRowCompare.java:205) at org.apache.solr.request.compare.ShardGroupByGroupbyRowCompare$CompareSum.compare(ShardGroupByGroupbyRowCompare.java:196) at org.apache.solr.request.compare.ShardGroupByGroupbyRowCompare.compare(ShardGroupByGroupbyRowCompare.java:105) at org.apache.solr.request.compare.ShardGroupByGroupbyRowCompare.compare(ShardGroupByGroupbyRowCompare.java:14) at java.util.Arrays.mergeSort(Arrays.java:1270) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.mergeSort(Arrays.java:1281) at java.util.Arrays.sort(Arrays.java:1210) at java.util.Collections.sort(Collections.java:159) at org.apache.solr.request.mdrill.MdrillGroupBy.toNameList(MdrillGroupBy.java:158) at org.apache.solr.request.mdrill.MdrillGroupBy.getCross(MdrillGroupBy.java:109) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:314) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:201) at org.apache.solr.request.mdrill.FacetComponent.process(FacetComponent.java:78) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:102) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1506) at org.apache

外表join

查询明细 俩表都有省份 省份数据丢失

报错worker-6710.log:java.nio.channels.OverlappingFileLockException

apter.call(Executors.java:441) at java.uti
worker-6710.log:java.nio.channels.OverlappingFileLockException
worker-6710.log:java.nio.channels.OverlappingFileLockException
worker-6711.log:java.nio.channels.OverlappingFileLockException
worker-6711.log:java.nio.channels.OverlappingFileLockException
worker-6712.log:java.nio.channels.OverlappingFileLockException
worker-6712.log:java.nio.channels.OverlappingFileLockException
worker-6712.log:org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access /disk4/taobao/mdrill/higojoin_tmp/939c6239-6ac d-4643-9e1d-bfbd3136956b/47/_m.tis': Not a directory worker-6712.log:org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access/disk4/taobao/mdrill/higojoin_tmp/939c6239-6ac
d-4643-9e1d-bfbd3136956b/47': No such file or directory
worker-6713.log:java.nio.channels.OverlappingFileLockException
worker-6713.log:org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access `/disk4/taobao/mdrill/higojoin_tmp/939c6239-6ac
d-4643-9e1d-bfbd3136956b/47/yn_segments_crc_1': No such file or directory
worker-6714.log:java.nio.channels.OverlappingFileLockException
worker-6714.log:java.nio.channels.OverlappingFileLockException

doubleValue在分类汇总的时候不必读取

doubleValue只有在distinct,sum,max,min 的计算中才会用到,所以逆旋的quicktis要拆分成2个文件,用于加快速度
另外termNum2Text也不是需要在逆旋过程中读取,仅仅读取一个存储位置即可,这样长标题 可以节省IO。

为何硬盘需要设置成RAID1模式?

关注这个项目。
但是看文档,想了解设置RAID0是为了保证数据完整性而牺牲性能吗?
感觉可以使用RAID0,如果我的数据并不是那么的重要,我容忍某些机器的宕机。
还有个问题是关于如果一台存储的server挂掉,是否会影响对这个表的查询?是否有相关容忍度设置?

dhoc/shard_0_5/tablelist/quanjingmointor/solr/data/201310/realtime/1382511022060_2' does not exist

dhoc/shard_0_5/tablelist/quanjingmointor/solr/data/201310/realtime/1382511022060_2' does not exist
2013-10-23 16:40:42 SolrCore [ERROR] null:org.apache.lucene.store.NoSuchDirectoryException: directory '/home/taobao/mdrill/realtime/higo/adhoc/shard_0_5/tablelist/quanjingmointor/solr/data/201310/realtime/1382511022060_2' does not exist
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:218)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:241)
at org.apache.lucene.store.LinkFSDirectory.listAll(LinkFSDirectory.java:236)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:675)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:81)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:509)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:355)
at org.apache.solr.core.SolrCore.getSearcherByPath(SolrCore.java:873)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:794)
at org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:220)
at org.apache.solr.response.BinaryResponseWriter$Resolver.writeDocList(BinaryResponseWriter.java:119)
at org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:87)
at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:144)
at org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:134)
at org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:222)
at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:139)
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:87)
at org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:47)
at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:240)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:198)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

2013-10-23 16:40:42 SolrStartJetty [INFO] higolog checkSolr /solr/quanjingmointor,201310,20131011,result=0
2013-10-23 16:40:42 SolrStartTable [INFO] higolog zkHeatbeat quanjingmointor,info:false 0@1 [bo2.sds.cm6][10.246.45.82:51115][SERVICE] part-00005 [max:4091,total:4091,free:1167] [2013-10-23 16:40:42][2013-10-23 16:40:42

在1.6.0_31和1.7.0_25下编译报错

[ERROR] /home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillStatement.java:[12,7] 错误: MdrillStatement不是抽象的, 并且未覆盖Statement中的抽象方法isCloseOnCompletion()
[ERROR] /home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillConnection.java:[22,7] 错误: MdrillConnection不是抽象的, 并且未覆盖Connection中的抽象方法getNetworkTimeout()
[ERROR] /home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillPreparedStatement.java:[32,7] 错误: MdrillPreparedStatement不是抽象的, 并且未覆盖Statement中的抽象方法isCloseOnCompletion()
[ERROR] /home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillDriver.java:[11,7] 错误: MdrillDriver不是抽象的, 并且未覆盖Driver中的抽象方法getParentLogger()
[ERROR] /home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillQueryResultSet.java:[15,7] 错误: MdrillQueryResultSet不是抽象的, 并且未覆盖ResultSet中的抽象方法getObject(String,Class)
[ERROR]
T扩展已在方法 getObject(String,Class)中声明的Object
/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillDatabaseMetaData.java:[16,7] 错误: MdrillDatabaseMetaData不是抽象的, 并且未覆盖DatabaseMetaData中的抽象方法generatedKeyAlwaysReturned()
[ERROR] /home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillDatabaseMetaData.java:[85,27] 错误: <匿名com.alimama.mdrill.jdbc.MdrillDatabaseMetaData$1>不是抽象的, 并且未覆盖ResultSet中的抽象方法getObject(String,Class)
[ERROR]
T扩展已在方法 getObject(String,Class)中声明的Object
/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillDatabaseMetaData.java:[483,56] 错误: <匿名com.alimama.mdrill.jdbc.MdrillDatabaseMetaData$2>不是抽象的, 并且未覆盖ResultSet中的抽象方法getObject(String,Class)
[ERROR]
T扩展已在方法 getObject(String,Class)中声明的Object
/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillDatabaseMetaData.java:[523,85] 错误: <匿名com.alimama.mdrill.jdbc.MdrillDatabaseMetaData$3>不是抽象的, 并且未覆盖ResultSet中的抽象方法getObject(String,Class)
[ERROR]
T扩展已在方法 getObject(String,Class)中声明的Object
/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillDatabaseMetaData.java:[596,21] 错误: <匿名com.alimama.mdrill.jdbc.MdrillDatabaseMetaData$4>不是抽象的, 并且未覆盖ResultSet中的抽象方法getObject(String,Class)
[INFO] 10 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Compilation failure

/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillStatement.java:[12,7] 错误: MdrillStatement不是抽象的, 并且未覆盖Statement中的抽象方法isCloseOnCompletion()
/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillConnection.java:[22,7] 错误: MdrillConnection不是抽象的, 并且未覆盖Connection中的抽象方法getNetworkTimeout()
/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillPreparedStatement.java:[32,7] 错误: MdrillPreparedStatement不是抽象的, 并且未覆盖Statement中的抽象方法isCloseOnCompletion()
/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillDriver.java:[11,7] 错误: MdrillDriver不是抽象的, 并且未覆盖Driver中的抽象方法getParentLogger()
/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillQueryResultSet.java:[15,7] 错误: MdrillQueryResultSet不是抽象的, 并且未覆盖ResultSet中的抽象方法getObject(String,Class)

T扩展已在方法 <T>getObject(String,Class<T>)中声明的Object

/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillDatabaseMetaData.java:[16,7] 错误: MdrillDatabaseMetaData不是抽象的, 并且未覆盖DatabaseMetaData中的抽象方法generatedKeyAlwaysReturned()
/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillDatabaseMetaData.java:[85,27] 错误: <匿名com.alimama.mdrill.jdbc.MdrillDatabaseMetaData$1>不是抽象的, 并且未覆盖ResultSet中的抽象方法getObject(String,Class)

T扩展已在方法 <T>getObject(String,Class<T>)中声明的Object

/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillDatabaseMetaData.java:[483,56] 错误: <匿名com.alimama.mdrill.jdbc.MdrillDatabaseMetaData$2>不是抽象的, 并且未覆盖ResultSet中的抽象方法getObject(String,Class)

T扩展已在方法 <T>getObject(String,Class<T>)中声明的Object

/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillDatabaseMetaData.java:[523,85] 错误: <匿名com.alimama.mdrill.jdbc.MdrillDatabaseMetaData$3>不是抽象的, 并且未覆盖ResultSet中的抽象方法getObject(String,Class)

T扩展已在方法 <T>getObject(String,Class<T>)中声明的Object

/home/rick/workspace/mdrill/adhoc-jdbc/src/main/java/com/alimama/mdrill/jdbc/MdrillDatabaseMetaData.java:[596,21] 错误: <匿名com.alimama.mdrill.jdbc.MdrillDatabaseMetaData$4>不是抽象的, 并且未覆盖ResultSet中的抽象方法getObject(String,Class)

java.lang.NoSuchMethodError: org.eclipse.jdt.internal.compiler.CompilationResult.getProblems()[Lorg/eclipse/jdt/core/compiler/IProblem;

perfect5085-perfect5085 (15:33:16):
java.lang.NoSuchMethodError: org.eclipse.jdt.internal.compiler.CompilationResult.getProblems()[Lorg/eclipse/jdt/core/compiler/IProblem;
at org.apache.jasper.compiler.JDTJavaCompiler$2.acceptResult(JDTJavaCompiler.java:426)
at org.eclipse.jdt.internal.compiler.Compiler.compile(Compiler.java:474)
at org.apache.jasper.compiler.JDTJavaCompiler.compile(JDTJavaCompiler.java:487)
at org.apache.jasper.compiler.Compiler.generateClass(Compiler.java:342)
at org.apache.jasper.compiler.Compiler.compile(Compiler.java:411)
at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:592)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:344)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:470)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:364)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
子落 (15:41:46):

子落 (2013-09-25 15:46:23):
估计是jar包冲突
子落 (2013-09-25 15:56:42):

我百多说 删除这个
百度

试试呢?

perfect5085-perfect5085 来自阿里巴巴**站 (2013-09-25 15:57:23):
不行啊,我删除了core,留下ecj,必行
我删除了ecj,留下core 也是不行
两个我都试了
子落 (2013-09-25 15:58:17):
要不你用我编译好的包?

http://yunpan.cn/QXn4tRAzx8NIL
perfect5085-perfect5085 来自阿里巴巴**站 (2013-09-25 15:59:27):
嗯。。我试一下,不过我刚才比较了lib下的所有jar包,都是一样的。。
perfect5085-perfect5085 来自阿里巴巴**站 (2013-09-25 16:04:23):
不行的,还是一样的错误
子落 (2013-09-25 16:06:15):
唉 我目前也找不到是哪个jar包冲突

perfect5085-perfect5085 来自阿里巴巴**站 (2013-09-25 16:08:02):
我再看看
子落 (2013-09-25 16:09:43):
只能测试了 一个一个的jar往里加

perfect5085-perfect5085 来自阿里巴巴**站 (2013-09-25 16:17:10):
刚才我操作有点错误,是移除了lib下的jar包,起始war包里面没有删除。。我需要重新打包,再试一下,应该是ecj的问题
子落 (2013-09-25 16:17:50):

新的版本已经不使用war包了

编译完的war包忒大了
perfect5085-perfect5085 来自阿里巴巴**站 (2013-09-25 16:18:53):
嗯。。
perfect5085-perfect5085 来自阿里巴巴**站 (16:21:29):
嗯。。问题解决了,是ecj包的问题
子落 (16:21:41):
删除后就行了?

perfect5085-perfect5085 来自阿里巴巴**站 (16:21:45):
org.mortbay.jetty jsp-2.1-jetty 6.1.26


org.eclipse.jdt.core.compiler
ecj


perfect5085-perfect5085 来自阿里巴巴**站 (16:22:01):
是的,,在maven里面修改了,重新打包的
没有ecj就OK了
子落 (16:22:20):
谢谢啦 好多同学都遇到这个问题

perfect5085-perfect5085 来自阿里巴巴**站 (16:23:04):
呵呵~~,看来解决一大家都棘手的问题哈。
子落 (16:23:42):
哈哈 是啊
之前有一次 跟hadoop里的jar包冲突 都弄懵了

building error

mdrill/trunk/adhoc-mdrill/src/main/java/com/alimama/mdrill/topology/PartionStat.java:[268,50] error: cannot find symbol

both the code in trunk and 0.18.2-beta are broken

部分字段(该格式为时间格式),查询报错

2013-09-09 09:50:20 SegmentReader [INFO] ##getpos##p4pstartsfileNum#55pos#21814020
2013-09-09 09:50:20 SegmentReader [INFO] ##getpos##p4pstartsfileNum#55pos#21814020
2013-09-09 09:50:20 UnInvertedField [INFO] setSingleValue QuickNumberedTermEnum p4pstarts field false@1383465366_918094195_71
2013-09-09 09:50:20 BigReUsedBuffer [INFO] ####BigByteBuffer### calloc free:294,mallocTimes:21174,reusedTimes:52129
2013-09-09 09:50:20 SimpleFacets [ERROR] getFacetCounts
java.lang.ArrayIndexOutOfBoundsException: 282
at com.alimama.mdrill.buffer.Simple16.s16Decompress(Simple16.java:104)
at com.alimama.mdrill.buffer.PForDelta.decompressBlockByS16(PForDelta.java:323)
at com.alimama.mdrill.buffer.PForDelta.decompressOneBlock(PForDelta.java:150)
at org.apache.lucene.index.SegmentTermDocs$ReadCompress.readCompressblock(SegmentTermDocs.java:193)
at org.apache.lucene.index.SegmentTermDocs.readNoTf(SegmentTermDocs.java:289)
at org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:262)
at org.apache.solr.request.uninverted.UnInvertedField.setSingleValue(UnInvertedField.java:121)
at org.apache.solr.request.uninverted.UnInvertedField.uninvert(UnInvertedField.java:463)
at org.apache.solr.request.uninverted.UnInvertedField.(UnInvertedField.java:88)
at org.apache.solr.request.uninverted.UnInvertedField.getUnInvertedField(UnInvertedField.java:700)
at org.apache.solr.request.mdrill.MdrillPorcessUtils$UnvertFields.(MdrillPorcessUtils.java:324)
at org.apache.solr.request.mdrill.MdrillGroupBy.execute(MdrillGroupBy.java:283)
at org.apache.solr.request.mdrill.MdrillGroupBy.getCross(MdrillGroupBy.java:164)
at org.apache.lucene.index.SegmentReader.invertScan(SegmentReader.java:563)
at org.apache.lucene.index.DirectoryReader.invertScan(DirectoryReader.java:591)
at org.apache.lucene.index.FilterIndexReader.invertScan(FilterIndexReader.java:317)
at org.apache.solr.request.mdrill.MdrillGroupBy.getBySchemaReader(MdrillGroupBy.java:130)
at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:312)
at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:204)
at org.apache.solr.request.mdrill.FacetComponent.process(FacetComponent.java:79)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:101)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1506)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:264)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:198)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
2013-09-09 09:50:20 SolrCore [ERROR] org.apache.solr.common.SolrException: java.lang.ArrayIndexOutOfBoundsException: 282
at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:210)
at org.apache.solr.request.mdrill.FacetComponent.process(FacetComponent.java:79)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:101)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1506)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:264)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:198)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.