muyannian / higo Goto Github PK
View Code? Open in Web Editor NEW海狗-多维在线分析系统
海狗-多维在线分析系统
r/data/2012103/workerspace_4@_0.tii 4
2013-03-20 14:45:33 ReplicationHandler [WARN] Unable to get IndexCommit on startup
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/disk6/taobao/bluewhile/higo/adhoc/16_15/tablelist/rpt_hitfake_auctionall_d/solr/data/index/lucene-1fef2e39-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1108)
at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83)
at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:101)
at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:171)
at org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:375)
at org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:858)
at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:523)
at org.apache.solr.core.SolrCore.(SolrCore.java:599)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:470)
at org.apache.solr.core.CoreContainer.createTableCore(CoreContainer.java:330)
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:598)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
2013-03-20 14:45:33 ReplicationHandler [INFO] Commits will be reserved for 10000
添加预分配的内存,如果长期没人使用,回收的机制
select abc from xxx where title contains xxx,xxx,xxx,xxx
因网销宝接入,pt=20130401000000 这种格式的目录要兼容
监控的统计按照分区分开统计
[taobao@adhoc7 logs]$ grep ERROR worker-6*
worker-6601.log:2013-03-25 09:52:34 SolrCore [ERROR] facet_counts is null 10.246.45.44:54363/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:52:34 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:54:55 SolrCore [ERROR] facet_counts is null 10.246.45.42:51112/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:54:55 SolrCore [ERROR] facet_counts is null 10.246.45.22:23913/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:55:15 SolrCore [ERROR] facet_counts is null 10.246.45.41:51113/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:55:15 SolrCore [ERROR] facet_counts is null 10.246.45.21:51117/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:55:35 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:55:35 SolrCore [ERROR] facet_counts is null 10.246.45.21:51117/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:57:11 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:57:11 SolrCore [ERROR] facet_counts is null 10.246.45.42:51112/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:59:24 SolrCore [ERROR] facet_counts is null 10.246.45.24:51119/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:00:06 SolrCore [ERROR] facet_counts is null 10.246.45.43:51118/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:03:06 SolrCore [ERROR] facet_counts is null 10.246.45.21:51117/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:03:48 SolrCore [ERROR] facet_counts is null 10.246.45.44:54363/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:05:05 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:08:16 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:09:57 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:09:57 SolrCore [ERROR] facet_counts is null 10.246.45.21:51117/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:14:25 SolrCore [ERROR] facet_counts is null 10.246.45.43:51118/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:16:26 SolrCore [ERROR] facet_counts is null 10.246.45.44:54363/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:16:26 SolrCore [ERROR] facet_counts is null 10.246.45.41:51113/solr/rpt_p4padhoc_product
将第一次查询的结果保存起来,供二次分析
典型使用场景
首先按照某一字段求和
select a,b,c,sum(xxx) from tbl group by a,b,c
然后对求和的值按照区间分段,统计每个区间内组的个数
要进行协调处理,这个属于BUG
目的是为了读取字段类型,adhoc项目使用,将来也会用到
2013-03-20 12:59:36 SolrCore [INFO] facet read fail from file 'thedate'
2013-03-20 12:59:36 SolrCore [INFO] getSearcher:rpt_p4padhoc_cust@2012113@1363755480106:/disk7/taobao/bluewhile/higo/adhoc/17_16/tablelist/rpt_p4padhoc_cust/solr/data/2012113
2013-03-20 12:59:36 SolrCore [ERROR] org.apache.lucene.util.ThreadInterruptedException: java.lang.InterruptedException: sleep interrupted
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:730)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:462)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:405)
at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1044)
at org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:219)
at org.apache.solr.request.SimpleFacets.(SimpleFacets.java:90)
at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:70)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody
41:51112/solr/rpt_p4padhoc_cust,10.246.45.22:48680/solr/rpt_p4padhoc_cust,10.246.45.23:51111/solr/rpt_p4padhoc_cust,&isShard=true&fsv=true&fq=thedate:[20121001+TO+20130318]} hits=186846 status=0 QTime=8630
2013-03-20 13:00:53 SolrQueryRequestBase [INFO] ref close rpt_p4padhoc_cust@2013032,0
2013-03-20 13:00:53 SolrCore [ERROR] null:org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/disk7/taobao/bluewhile/higo/adhoc/17_16/tablelist/rpt_p4padhoc_cust/solr/data/2013023/workerspace/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1108)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:989)
at org.apache.lucene.store.LinkFSDirectory.readOnlyOpen(LinkFSDirectory.java:170)
at org.apache.lucene.store.LinkFSDirectory.readOnlyOpen(LinkFSDirectory.java:185)
at org.apache.solr.core.LinksStandardDirectoryFactory.open(LinksStandardDirectoryFactory.java:33)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1043)
at org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:219)
at org.apache.solr.request.SimpleFacets.(SimpleFacets.java:90)
at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:70)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:196)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:358)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
之前因为某种原因,数据从hive到higo,数据经过了转义,这个是不符合要求的
原则上海狗不应该对原始的数据做处理,故要取消这个转义,否则用户在使用的时候容易引起歧义。
分区机制 海狗默认只支持按照每个月的 上中下旬分区, 添加按照天和月的方式进行分区
可以 这样写 where test fq username:张* and test2 fq type:1
这种语法,目的是让sql 可以使用solr的灵活的查询语法
bug修正:hoc的数据产生后,会在3~7天后进行一次重新清洗,来补充数据,索引也需要进行重新更新(添加分区后,这个机制丢失),会导致在每月的上中下旬的最后一天的统计结果与hive中的不一致
用于追踪每天都什么样的查询报错,方便以后监控
Tue May 01 08:21:39 CST 2007 Wed Nov 25 03:27:45 CST 2009
2013-04-05 Sat Aug 09 15:16:39 CST 2008 Thu Jun 10 02:32:15 CST 2010
2013-04-05 Mon Apr 16 05:16:02 CST 2007 Sun Apr 18 06:30:01 CST 2010
2013-04-05 Sat Jun 11 20:14:29 CST 2005 Wed Nov 25 04:32:03 CST 2009
2013-04-05 Wed Sep 17 00:25:02 CST 2008 Fri Sep 02 05:49:40 CST 2011
2013-04-05 Wed Apr 29 06:36:33 CST 2009 Sun Jul 18 20:45:26 CST 2010
2013-04-05 Thu Dec 25 23:02:52 CST 2008 Mon Jun 07 19:01:57 CST 2010
2013-04-05 Tue Aug 18 20:40:11 CST 2009 Mon Mar 29 06:55:06 CST 2010
2013-04-05 Wed Oct 24 23:08:49 CST 2007 Tue Dec 15 07:07:12 CST 2009
2013-04-05 Sat Jan 03 21:03:01 CST 2009 Mon Dec 07 04:24:30 CST 2009
2013-04-05 Fri Oct 26 23:48:36 CST 2007 Wed Mar 09 23:32:49 CST 2011
2013-04-05 Wed Feb 07 21:40:47 CST 2007 Mon Dec 07 20:51:52 CST 2009
2013-04-05 Wed Oct 14 08:17:24 CST 2009 Wed Jun 09 06:59:19 CST 2010
hadoop集群不能提供服务的时候,不应该影响海狗
前端ui查询失败记录
目前海狗的内存使用都是大块大块的(每个field约40~50MB),当查询一个较大的时间范围或者较多的列的时候,由于内存大小限制,要不断的进行LRU,把过期的数据从内存中淘汰出去。
淘汰的数据会被gc回收,也意味着会发生full gc,full gc的时候程序会暂停。
故 新的改进思路是这样的,像memcache那样,预先申请好固定大小块的内存,每次用的时候,直接从固定大小块的内存中取一个过来,标记为有人使用,不用的时候在放回去,标记为空闲,空闲的下次可以被其他对象使用。
但是有可能程序出现异常 ,这个放回去的操作没有执行,故采用WeakHashMap 依然进行回收管理。
申请固定的块会有一些内存的浪费,但是可以减少full gc的次数。
海狗本身是建立在蓝鲸(java版本的storm)之上,无论是蓝鲸还是storm,我见过
的集群规模也就是20台到30台的样子,而且由于频繁的心跳,zookeeper以及调度nimbus也会
成为瓶颈,虽然没有经过测试,个人感觉能够达到千台很难。
hadoop yarn的"hdfs联邦"的给了我一些启发,为何非要只有一个集群呢?我可以
创建很多个小的集群,比如说创建一50个小集群,每个小集群小有20台机器,由一个总控管
理这些小集群的状态。
按照当前海狗一台机器上6个shard,每个小集群为120个shard,全部小集群为6000个shard。
每次查询的时候,根据查询的shards,将任务分发到不同的小集群上,每个小集群的
查询完毕的结果,在进一步合并(海狗本身支持多层次合并,所以这个不是问题)。
关键的一点,海狗在这个基础上改造,改动点小,容易实现。
当web UI 执行一段时间后,会提示jsp文件找不到
原因为 操作系统 本身会定期清理 tmp文件,jetty使用war包,要指定专属目录
-Djava.io.tmpdir=/tmp
Change it to some other folder since the OS will delete files in /tmp after a period of time.
拓展阅读
http://stackoverflow.com/questions/7124571/my-jetty-server-will-dead-after-a-long-time-why
完整报错信息
HTTP ERROR 500
Problem accessing /higo/tablelist.jsp. Reason:
PWC6117: File "/tablelist.jsp" not found
Caused by:
org.apache.jasper.JasperException: PWC6117: File "/tablelist.jsp" not found
at org.apache.jasper.compiler.DefaultErrorHandler.jspError(DefaultErrorHandler.java:73)
at org.apache.jasper.compiler.ErrorDispatcher.dispatch(ErrorDispatcher.java:359)
at org.apache.jasper.compiler.ErrorDispatcher.jspError(ErrorDispatcher.java:153)
at org.apache.jasper.compiler.JspUtil.getInputStream(JspUtil.java:894)
at org.apache.jasper.xmlparser.XMLEncodingDetector.getEncoding(XMLEncodingDetector.java:127)
at org.apache.jasper.compiler.ParserController.determineSyntaxAndEncoding(ParserController.java:360)
at org.apache.jasper.compiler.ParserController.doParse(ParserController.java:194)
at org.apache.jasper.compiler.ParserController.parse(ParserController.java:124)
at org.apache.jasper.compiler.Compiler.generateJava(Compiler.java:184)
at org.apache.jasper.compiler.Compiler.compile(Compiler.java:409)
at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:592)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:344)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:470)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:364)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Powered by Jetty://
olr/data/2012102/workerspace_2@_0.tii
2013-03-20 14:45:05 SolrIndexSearcher [INFO] Opening Searcher@33b7b32c partion_rpt_p4padhoc_product@2012102@1363761822038
2013-03-20 14:45:05 SolrCore [INFO] ref clearpartion rpt_p4padhoc_product@2013032,1
2013-03-20 14:45:05 SolrCore [INFO] ref clearpartion rpt_p4padhoc_product@2012111,1
2013-03-20 14:45:05 SolrCore [INFO] BlockBufferInput close /disk6/taobao/bluewhile/higo/adhoc/16_15/tablelist/rpt_p4padhoc_product/solr/data/2012112/workerspace_2@_0.tis
2013-03-20 14:45:05 SolrCore [INFO] SolrIndexSearcher clear:rpt_p4padhoc_product@2012112@1363761822038
2013-03-20 14:45:05 SolrQueryRequestBase [INFO] ref create rpt_p4padhoc_product@2012102,1
2013-03-20 14:45:05 SolrCore [INFO] ref clearpartion rpt_p4padhoc_product@2013032,1
2013-03-20 14:45:05 SolrCore [INFO] ref clearpartion rpt_p4padhoc_product@2012111,1
2013-03-20 14:45:05 SolrCore [INFO] getSearcher:rpt_p4padhoc_product@2012122@1363761822038:/disk6/taobao/bluewhile/higo/adhoc/16_15/tablelist/rpt_p4padhoc_product/solr/data/2012122
2013-03-20 14:45:05 SolrDispatchFilter [ERROR] org.mortbay.jetty.EofException
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:634)
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184)
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89)
at org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:47)
at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:338)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:267)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
对源码的改动独立出来,准备使用solr4
对于重复值特别高的字段可以节省很多内存
select thedate,category_level1_name,sum(e_alipay_direct_cnt),sum(e_alipay_direct_amt) from rpt_p4padhoc_auction where thedate=20130325 group by thedate,category_level1_name order by sum(e_alipay_direct_amt) desc
1.将每个值转换为整形,可以是md5,也可以是crc32,但一定要能将数据打散的均匀。
2.使用bitset,每个bit位标记是否有值。
3.如果数据量特别大,比如说上千亿,那么使用局部的bitset来估算整体。
具体如何使用,请参见ppt ppt下载
假设我们使用10亿个bitset
那么实际上我们只存储其中的1%,取到1000万
因数据均匀,这1%与其他99%的稀疏程度一致
故最终值在1%的基础上乘以100即可
但对于像类别那种重复度特别高的则不采取局部估算整体 使用准确计算
检查原因
2013-03-20 09:59:22 SolrCore [ERROR] org.apache.solr.common.SolrException: java.lang.Exception: 10.246.45.43:51111/solr/rpt_p4padhoc_product
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:294)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:358)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.Exception: 10.246.45.43:51111/solr/rpt_p4padhoc_product
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:461)
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:418)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException: java.lang.Exception: 10.246.45.22:51160/solr/rpt_p4padhoc_product@2012111 org.apache.solr.common.SolrException: java.lang.Exception: 10.246.45.22:51160/solr/rpt_p4padhoc_product@2012111 at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:294) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:358) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.Exception: 10.246.45.22:51160/solr/rpt_p4padhoc_product@2012111 at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:461) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:418) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441
让海狗的大表可以跟小表 进行map端的join ,join的数据集小于一百万
并且 只能进行一对一和一对多的join 不能进行多对多的join
2013-03-20 14:38:26 SolrStartTable [ERROR] org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:480)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:246)
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266)
at org.apache.solr.client.solrj.embedded.JettySolrRunner.checkSolrRecord(JettySolrRunner.java:218)
at com.alipay.bluewhale.core.higo.SolrStartJetty.checkSolr(SolrStartJetty.java:419)
at com.alipay.bluewhale.core.higo.SolrStartTable.checkSolr(SolrStartTable.java:453)
at com.alipay.bluewhale.core.higo.SolrStartTable.heartbeatExecute(SolrStartTable.java:401)
at com.alipay.bluewhale.core.higo.SolrStartTable.heartbeat(SolrStartTable.java:390)
at com.alipay.bluewhale.core.higo.SolrStart.heartbeat(SolrStart.java:90)
at com.alipay.bluewhale.core.higo.ShardsBolt.execute(ShardsBolt.java:86)
at com.alipay.bluewhale.core.task.executer.BoltExecutors.run(BoltExecutors.java:104)
at com.alipay.bluewhale.core.utils.AsyncLoopRunnable.run(AsyncLoopRunnable.java:54)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:77)
at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:105)
at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1115)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1373)
at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1832)
at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1590)
at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:995)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:397)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
... 13 more
bug修正:判断硬盘是否可用的探测函数,解决多个进程同时检测的时候,因冲突导致的报错
1.按照数值区间
2.将时间解析成日期
3.substring
4.其他
可以在前面写通配符
海狗的索引 尝试着存储在本地的小的hdfs集群中,看看性能如何
进行 urlencode
尝试扩大海狗每台机器执行的shards数量
提升并行度
2013-03-21 21:33:52 SolrDispatchFilter [ERROR] java.lang.NullPointerException
at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:777)
at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626)
at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:404)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
通过这次测试,暴露出很多问题,先前很多字段是我没有测试到的,总结如下:
子落,这个地方和行咧沟通了么
能不能展示成中文?
子落 (15:33:53):
可以的
已经传给我了 我今天我明天处理下
张壮 (15:34:06):
OK
蓝鲸 bolt 修改没有ack的bug
rangeGroup ,substring,多列组合计算等函数
当前海狗的group by sort,要求group by的组数小于一万,太少了
本期方案 要对这个组数给予提升, 在不降低性能的情况下能支持一百万的group,分钟级的相应能支撑千万甚至亿级别的group
之前海狗的做法是,所有的分区都并发去请求每个shards,在机器资源有限的情况下
如果分区数量过多,会产生很多次http请求,然后merger server的压力过大。
故一直以来在adhoc项目上,海狗单次扫描的数据量限制在10亿,但这显然不能满足有些需求
,故改进之。
当前的做法是分多次提交,每次只提交固定的分区数量(比如说只提交4个分区),每个shard计算完毕后,将数据dump到hdfs中
最终提交一个merger的操作(并发数量取决于hash的数量),将所有dump到hdfs中的数据,进行merger
原先是按照一个分区存储,可以考虑按照docid进行分开存储
比如说 DOCID 150W的存储在一起 50100的存储在一起 100~150的存储在一起
这样具体是否能提升速度,有待研究和进一步的测试
public Integer val=0; Count统计使用的数据类型为 Integer
要修正
保留便于日后的分析,LOG4J管理
ust/index]
solr的conf文件,发生改变的时候,可以通过重启来主动更新conf文件
要求数据可以按照 , 空格之类的做做简单的分词
tiansuan1出现oom报错
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.