Giter VIP home page Giter VIP logo

higo's Issues

机器规模如何达到一千台

    海狗本身是建立在蓝鲸(java版本的storm)之上,无论是蓝鲸还是storm,我见过

的集群规模也就是20台到30台的样子,而且由于频繁的心跳,zookeeper以及调度nimbus也会
成为瓶颈,虽然没有经过测试,个人感觉能够达到千台很难。
hadoop yarn的"hdfs联邦"的给了我一些启发,为何非要只有一个集群呢?我可以
创建很多个小的集群,比如说创建一50个小集群,每个小集群小有20台机器,由一个总控管
理这些小集群的状态。
按照当前海狗一台机器上6个shard,每个小集群为120个shard,全部小集群为6000个shard。
每次查询的时候,根据查询的shards,将任务分发到不同的小集群上,每个小集群的
查询完毕的结果,在进一步合并(海狗本身支持多层次合并,所以这个不是问题)。
关键的一点,海狗在这个基础上改造,改动点小,容易实现。

at org.apache.lucene.store.Lock.obtain(Lock.java:84)

r/data/2012103/workerspace_4@_0.tii 4
2013-03-20 14:45:33 ReplicationHandler [WARN] Unable to get IndexCommit on startup
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/disk6/taobao/bluewhile/higo/adhoc/16_15/tablelist/rpt_hitfake_auctionall_d/solr/data/index/lucene-1fef2e39-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1108)
at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83)
at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:101)
at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:171)
at org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:375)
at org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:858)
at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:523)
at org.apache.solr.core.SolrCore.(SolrCore.java:599)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:470)
at org.apache.solr.core.CoreContainer.createTableCore(CoreContainer.java:330)
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:598)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
2013-03-20 14:45:33 ReplicationHandler [INFO] Commits will be reserved for 10000

org.apache.jasper.JasperException: PWC6117: File "/tablelist.jsp" not found

当web UI 执行一段时间后,会提示jsp文件找不到

原因为 操作系统 本身会定期清理 tmp文件,jetty使用war包,要指定专属目录

-Djava.io.tmpdir=/tmp

Change it to some other folder since the OS will delete files in /tmp after a period of time.

拓展阅读
http://stackoverflow.com/questions/7124571/my-jetty-server-will-dead-after-a-long-time-why

完整报错信息

HTTP ERROR 500

Problem accessing /higo/tablelist.jsp. Reason:

PWC6117: File "/tablelist.jsp" not found

Caused by:

org.apache.jasper.JasperException: PWC6117: File "/tablelist.jsp" not found
at org.apache.jasper.compiler.DefaultErrorHandler.jspError(DefaultErrorHandler.java:73)
at org.apache.jasper.compiler.ErrorDispatcher.dispatch(ErrorDispatcher.java:359)
at org.apache.jasper.compiler.ErrorDispatcher.jspError(ErrorDispatcher.java:153)
at org.apache.jasper.compiler.JspUtil.getInputStream(JspUtil.java:894)
at org.apache.jasper.xmlparser.XMLEncodingDetector.getEncoding(XMLEncodingDetector.java:127)
at org.apache.jasper.compiler.ParserController.determineSyntaxAndEncoding(ParserController.java:360)
at org.apache.jasper.compiler.ParserController.doParse(ParserController.java:194)
at org.apache.jasper.compiler.ParserController.parse(ParserController.java:124)
at org.apache.jasper.compiler.Compiler.generateJava(Compiler.java:184)
at org.apache.jasper.compiler.Compiler.compile(Compiler.java:409)
at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:592)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:344)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:470)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:364)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Powered by Jetty://

FindSegmentsFile与LinkFSDirectory.readOnlyOpen报错

2013-03-20 12:59:36 SolrCore [INFO] facet read fail from file 'thedate'
2013-03-20 12:59:36 SolrCore [INFO] getSearcher:rpt_p4padhoc_cust@2012113@1363755480106:/disk7/taobao/bluewhile/higo/adhoc/17_16/tablelist/rpt_p4padhoc_cust/solr/data/2012113
2013-03-20 12:59:36 SolrCore [ERROR] org.apache.lucene.util.ThreadInterruptedException: java.lang.InterruptedException: sleep interrupted
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:730)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:462)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:405)
at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1044)
at org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:219)
at org.apache.solr.request.SimpleFacets.(SimpleFacets.java:90)
at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:70)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody

41:51112/solr/rpt_p4padhoc_cust,10.246.45.22:48680/solr/rpt_p4padhoc_cust,10.246.45.23:51111/solr/rpt_p4padhoc_cust,&isShard=true&fsv=true&fq=thedate:[20121001+TO+20130318]} hits=186846 status=0 QTime=8630
2013-03-20 13:00:53 SolrQueryRequestBase [INFO] ref close rpt_p4padhoc_cust@2013032,0
2013-03-20 13:00:53 SolrCore [ERROR] null:org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/disk7/taobao/bluewhile/higo/adhoc/17_16/tablelist/rpt_p4padhoc_cust/solr/data/2013023/workerspace/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1108)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:989)
at org.apache.lucene.store.LinkFSDirectory.readOnlyOpen(LinkFSDirectory.java:170)
at org.apache.lucene.store.LinkFSDirectory.readOnlyOpen(LinkFSDirectory.java:185)
at org.apache.solr.core.LinksStandardDirectoryFactory.open(LinksStandardDirectoryFactory.java:33)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1043)
at org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:219)
at org.apache.solr.request.SimpleFacets.(SimpleFacets.java:90)
at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:70)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:196)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:358)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)

数据改动后的索引重建

bug修正:hoc的数据产生后,会在3~7天后进行一次重新清洗,来补充数据,索引也需要进行重新更新(添加分区后,这个机制丢失),会导致在每月的上中下旬的最后一天的统计结果与hive中的不一致

解决海狗查询时候因内存原因,每次扫描必须限制扫描的数据规模

之前海狗的做法是,所有的分区都并发去请求每个shards,在机器资源有限的情况下
如果分区数量过多,会产生很多次http请求,然后merger server的压力过大。
故一直以来在adhoc项目上,海狗单次扫描的数据量限制在10亿,但这显然不能满足有些需求
,故改进之。

当前的做法是分多次提交,每次只提交固定的分区数量(比如说只提交4个分区),每个shard计算完毕后,将数据dump到hdfs中
最终提交一个merger的操作(并发数量取决于hash的数量),将所有dump到hdfs中的数据,进行merger

离线下载 的列头要展示中文

子落,这个地方和行咧沟通了么
能不能展示成中文?
子落 (15:33:53):
可以的

已经传给我了 我今天我明天处理下

张壮 (15:34:06):
OK

无组数限制的group by

当前海狗的group by sort,要求group by的组数小于一万,太少了
本期方案 要对这个组数给予提升, 在不降低性能的情况下能支持一百万的group,分钟级的相应能支撑千万甚至亿级别的group

partion support day and month

分区机制 海狗默认只支持按照每个月的 上中下旬分区, 添加按照天和月的方式进行分区

distinct实现

1.将每个值转换为整形,可以是md5,也可以是crc32,但一定要能将数据打散的均匀。
2.使用bitset,每个bit位标记是否有值。
3.如果数据量特别大,比如说上千亿,那么使用局部的bitset来估算整体。
具体如何使用,请参见ppt ppt下载

假设我们使用10亿个bitset
那么实际上我们只存储其中的1%,取到1000万
因数据均匀,这1%与其他99%的稀疏程度一致
故最终值在1%的基础上乘以100即可

但对于像类别那种重复度特别高的则不采取局部估算整体 使用准确计算

二次计算

将第一次查询的结果保存起来,供二次分析

典型使用场景
首先按照某一字段求和
select a,b,c,sum(xxx) from tbl group by a,b,c

然后对求和的值按照区间分段,统计每个区间内组的个数

扫描的分区数很多的时候(跨6个月查询),报错

2013-03-20 09:59:22 SolrCore [ERROR] org.apache.solr.common.SolrException: java.lang.Exception: 10.246.45.43:51111/solr/rpt_p4padhoc_product
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:294)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:358)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.Exception: 10.246.45.43:51111/solr/rpt_p4padhoc_product
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:461)
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:418)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException: java.lang.Exception: 10.246.45.22:51160/solr/rpt_p4padhoc_product@2012111 org.apache.solr.common.SolrException: java.lang.Exception: 10.246.45.22:51160/solr/rpt_p4padhoc_product@2012111 at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:294) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:358) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.Exception: 10.246.45.22:51160/solr/rpt_p4padhoc_product@2012111 at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:461) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:418) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441

2013-03-20 14:45:05 SolrDispatchFilter [ERROR] org.mortbay.jetty.EofException

olr/data/2012102/workerspace_2@_0.tii
2013-03-20 14:45:05 SolrIndexSearcher [INFO] Opening Searcher@33b7b32c partion_rpt_p4padhoc_product@2012102@1363761822038
2013-03-20 14:45:05 SolrCore [INFO] ref clearpartion rpt_p4padhoc_product@2013032,1
2013-03-20 14:45:05 SolrCore [INFO] ref clearpartion rpt_p4padhoc_product@2012111,1
2013-03-20 14:45:05 SolrCore [INFO] BlockBufferInput close /disk6/taobao/bluewhile/higo/adhoc/16_15/tablelist/rpt_p4padhoc_product/solr/data/2012112/workerspace_2@_0.tis
2013-03-20 14:45:05 SolrCore [INFO] SolrIndexSearcher clear:rpt_p4padhoc_product@2012112@1363761822038
2013-03-20 14:45:05 SolrQueryRequestBase [INFO] ref create rpt_p4padhoc_product@2012102,1
2013-03-20 14:45:05 SolrCore [INFO] ref clearpartion rpt_p4padhoc_product@2013032,1
2013-03-20 14:45:05 SolrCore [INFO] ref clearpartion rpt_p4padhoc_product@2012111,1
2013-03-20 14:45:05 SolrCore [INFO] getSearcher:rpt_p4padhoc_product@2012122@1363761822038:/disk6/taobao/bluewhile/higo/adhoc/16_15/tablelist/rpt_p4padhoc_product/solr/data/2012122
2013-03-20 14:45:05 SolrDispatchFilter [ERROR] org.mortbay.jetty.EofException
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:634)
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184)
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89)
at org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:47)
at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:338)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:267)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

worker-6601.log:2013-03-25 09:52:34 SolrCore [ERROR] facet_counts is null 10.246.45.44:54363/solr/rpt_p4padhoc_product

[taobao@adhoc7 logs]$ grep ERROR worker-6*
worker-6601.log:2013-03-25 09:52:34 SolrCore [ERROR] facet_counts is null 10.246.45.44:54363/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:52:34 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:54:55 SolrCore [ERROR] facet_counts is null 10.246.45.42:51112/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:54:55 SolrCore [ERROR] facet_counts is null 10.246.45.22:23913/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:55:15 SolrCore [ERROR] facet_counts is null 10.246.45.41:51113/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:55:15 SolrCore [ERROR] facet_counts is null 10.246.45.21:51117/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:55:35 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:55:35 SolrCore [ERROR] facet_counts is null 10.246.45.21:51117/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:57:11 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:57:11 SolrCore [ERROR] facet_counts is null 10.246.45.42:51112/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:59:24 SolrCore [ERROR] facet_counts is null 10.246.45.24:51119/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:00:06 SolrCore [ERROR] facet_counts is null 10.246.45.43:51118/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:03:06 SolrCore [ERROR] facet_counts is null 10.246.45.21:51117/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:03:48 SolrCore [ERROR] facet_counts is null 10.246.45.44:54363/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:05:05 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:08:16 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:09:57 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:09:57 SolrCore [ERROR] facet_counts is null 10.246.45.21:51117/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:14:25 SolrCore [ERROR] facet_counts is null 10.246.45.43:51118/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:16:26 SolrCore [ERROR] facet_counts is null 10.246.45.44:54363/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:16:26 SolrCore [ERROR] facet_counts is null 10.246.45.41:51113/solr/rpt_p4padhoc_product

2013-03-21 21:33:52 SolrDispatchFilter [ERROR] java.lang.NullPointerException

2013-03-21 21:33:52 SolrDispatchFilter [ERROR] java.lang.NullPointerException
at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:777)
at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626)
at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:404)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

内存管理改进

目前海狗的内存使用都是大块大块的(每个field约40~50MB),当查询一个较大的时间范围或者较多的列的时候,由于内存大小限制,要不断的进行LRU,把过期的数据从内存中淘汰出去。
淘汰的数据会被gc回收,也意味着会发生full gc,full gc的时候程序会暂停。

故 新的改进思路是这样的,像memcache那样,预先申请好固定大小块的内存,每次用的时候,直接从固定大小块的内存中取一个过来,标记为有人使用,不用的时候在放回去,标记为空闲,空闲的下次可以被其他对象使用。

但是有可能程序出现异常 ,这个放回去的操作没有执行,故采用WeakHashMap 依然进行回收管理。

申请固定的块会有一些内存的浪费,但是可以减少full gc的次数。

org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out

2013-03-20 14:38:26 SolrStartTable [ERROR] org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:480)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:246)
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266)
at org.apache.solr.client.solrj.embedded.JettySolrRunner.checkSolrRecord(JettySolrRunner.java:218)
at com.alipay.bluewhale.core.higo.SolrStartJetty.checkSolr(SolrStartJetty.java:419)
at com.alipay.bluewhale.core.higo.SolrStartTable.checkSolr(SolrStartTable.java:453)
at com.alipay.bluewhale.core.higo.SolrStartTable.heartbeatExecute(SolrStartTable.java:401)
at com.alipay.bluewhale.core.higo.SolrStartTable.heartbeat(SolrStartTable.java:390)
at com.alipay.bluewhale.core.higo.SolrStart.heartbeat(SolrStart.java:90)
at com.alipay.bluewhale.core.higo.ShardsBolt.execute(ShardsBolt.java:86)
at com.alipay.bluewhale.core.task.executer.BoltExecutors.run(BoltExecutors.java:104)
at com.alipay.bluewhale.core.utils.AsyncLoopRunnable.run(AsyncLoopRunnable.java:54)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:77)
at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:105)
at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1115)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1373)
at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1832)
at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1590)
at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:995)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:397)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
... 13 more

adhoc 离线下载排序出错

select thedate,category_level1_name,sum(e_alipay_direct_cnt),sum(e_alipay_direct_amt) from rpt_p4padhoc_auction where thedate=20130325 group by thedate,category_level1_name order by sum(e_alipay_direct_amt) desc

查询明细 时间显示BUG

Tue May 01 08:21:39 CST 2007 Wed Nov 25 03:27:45 CST 2009
2013-04-05 Sat Aug 09 15:16:39 CST 2008 Thu Jun 10 02:32:15 CST 2010
2013-04-05 Mon Apr 16 05:16:02 CST 2007 Sun Apr 18 06:30:01 CST 2010
2013-04-05 Sat Jun 11 20:14:29 CST 2005 Wed Nov 25 04:32:03 CST 2009
2013-04-05 Wed Sep 17 00:25:02 CST 2008 Fri Sep 02 05:49:40 CST 2011
2013-04-05 Wed Apr 29 06:36:33 CST 2009 Sun Jul 18 20:45:26 CST 2010
2013-04-05 Thu Dec 25 23:02:52 CST 2008 Mon Jun 07 19:01:57 CST 2010
2013-04-05 Tue Aug 18 20:40:11 CST 2009 Mon Mar 29 06:55:06 CST 2010
2013-04-05 Wed Oct 24 23:08:49 CST 2007 Tue Dec 15 07:07:12 CST 2009
2013-04-05 Sat Jan 03 21:03:01 CST 2009 Mon Dec 07 04:24:30 CST 2009
2013-04-05 Fri Oct 26 23:48:36 CST 2007 Wed Mar 09 23:32:49 CST 2011
2013-04-05 Wed Feb 07 21:40:47 CST 2007 Mon Dec 07 20:51:52 CST 2009
2013-04-05 Wed Oct 14 08:17:24 CST 2009 Wed Jun 09 06:59:19 CST 2010

fieldValueCache 可以按照DOCID 分开存储

原先是按照一个分区存储,可以考虑按照docid进行分开存储

比如说 DOCID 150W的存储在一起 50100的存储在一起 100~150的存储在一起

这样具体是否能提升速度,有待研究和进一步的测试

速度优化

通过这次测试,暴露出很多问题,先前很多字段是我没有测试到的,总结如下:

  1. Count有极大的优化空间
  2. 原先只有2台机器的情况下,内存资源是稀缺的,但现在有10台机器,内存富裕较多,故针对数值型的计算以及所有dist计算,
    可以考虑不在像之前 通过docid->termNum->(类似视频的关键帧压缩)->termValue的准换
    而是直接采用 docid->termNum->termValue的转换,省去关键帧后,像creativeid这种重复值比较低的字段,dist,sum等速度提升不止是一倍两倍的关系
  3. 因硬盘空间富裕,frq文件不再采用zip压缩,测试过程中发现cpu使用率比较高,主要原因就是frq文件的zip解压引起

pt=20130401000000

因网销宝接入,pt=20130401000000 这种格式的目录要兼容

不应该转义的

之前因为某种原因,数据从hive到higo,数据经过了转义,这个是不符合要求的
原则上海狗不应该对原始的数据做处理,故要取消这个转义,否则用户在使用的时候容易引起歧义。

mergeIds response is null

ust/index]



2013-03-25 14:01:40 SolrCore [ERROR] mergeIds response is null 172.24.195.154:51276/solr/rpt_p4padhoc_product
java.lang.Exception
at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:829)
at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:627)
at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:606)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:303)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:405)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:307)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
2013-03-25 14:01:40 SolrCore [ERROR] facet_counts is null 172.24.195.154:51276/solr/rpt_p4padhoc_product
java.lang.Exception
at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:324)
at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:286)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:303)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:405)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:307)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
2013-03-25 14:01:40 SolrCore [INFO] [rpt_p4padhoc_pr

map端join

让海狗的大表可以跟小表 进行map端的join ,join的数据集小于一百万
并且 只能进行一对一和一对多的join 不能进行多对多的join

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.