Giter VIP home page Giter VIP logo

nscrapy's People

Contributors

dependabot[bot] avatar xboxeer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

nscrapy's Issues

ReWriter the Scheduler, RequestReceiver, ResponseDistributor

ReWriter the Scheduler, RequestReceiver, ResponseDistributor, so that all Request/Response would be sent to 3rd party Message Queue server and Downloader from Other Machine/Process can pick request and send response back to Message Queue server

Read appsetting.json from somewhere else instead of local

Since we are adding multiple Redis instance support for NScrapy, we should also be able to add redis instance at runtime for NScrapy, this requires each NScrapy node get the latest appsetting.json at run time and refresh its context when appsetting.json changed.
there should be a base setting specify appsetting.json's location, and a ConfigProvider should provide an approach for NScrapy to get its appsetting.json whenever it is in local or remote
The appsetting.json find strategy should be like this
Read from local->not found then read from arguments->not found then read from remote which should be implemented by ConfigProvider

Sentiment Test

Iโ€˜m very disappointed with the support! can any one help me asap?

Add Item Loader for ReponseHandler

add an ItemLoder function that will automatically transfer the Response to some predefined object by mapping the selector/xpath with object property. while mapping, there should be some middleware that will take over the data filter/cleaning function

Low Sentiment Test

I feel this is terrible, does anyone can help me? I'm not satisfied with the whole support experience

Support for Distributed Redis instance using Consistent Hashing

Reason to have the feature is that while i was running spider and downloader in 2 raspberry pi3+, the node that host the Redis sometimes failed to set value because the Memory is almost running out, and the other Raspberry pi 3+ still have a lot of Memory.

Sentiment test 3

I'm very very unhappy with the support! please help assign an engineer on this as soon as possible! I'm very upset now!

When running Spider in Raspberry with a Mongodb Pipeline, got the below exception

Assert Failure
Expression: [Recursive resource lookup bug]
Description: Infinite recursion during resource lookup within System.Private.CoreLib. This may be a bug in System.Private.CoreLib, or potentially in certain extensibility points such as assembly resolve events or CultureInfo names. Resource name: ArgumentNull_Generic
Stack Trace:
at System.SR.InternalGetResourceString(String key)
at System.SR.GetResourceString(String resourceKey, String defaultString)
at System.ArgumentNullException..ctor(String paramName)
at System.Runtime.Loader.AssemblyLoadContext.GetLoadContext(Assembly assembly)
at System.Reflection.Assembly.LoadFromResolveHandler(Object sender, ResolveEventArgs args)
at System.AppDomain.OnAssemblyResolveEvent(RuntimeAssembly assembly, String assemblyFullName)
at System.Reflection.RuntimeAssembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks, IntPtr ptrLoadContextBinder)
at System.Reflection.RuntimeAssembly.InternalGetSatelliteAssembly(String name, CultureInfo culture, Version version, Boolean throwOnFileNotFound, StackCrawlMark& stackMark)
at System.Resources.ManifestBasedResourceGroveler.GetSatelliteAssembly(CultureInfo lookForCulture, StackCrawlMark& stackMark)
at System.Resources.ManifestBasedResourceGroveler.GrovelForResourceSet(CultureInfo culture, Dictionary2 localResourceSets, Boolean tryParents, Boolean createIfNotExists, StackCrawlMark& stackMark) at System.Resources.ResourceManager.InternalGetResourceSet(CultureInfo requestedCulture, Boolean createIfNotExists, Boolean tryParents, StackCrawlMark& stackMark) at System.Resources.ResourceManager.InternalGetResourceSet(CultureInfo culture, Boolean createIfNotExists, Boolean tryParents) at System.Resources.ResourceManager.GetString(String name, CultureInfo culture) at System.SR.InternalGetResourceString(String key) at System.SR.GetResourceString(String resourceKey, String defaultString) at System.ArgumentNullException..ctor(String paramName) at System.Runtime.Loader.AssemblyLoadContext.GetLoadContext(Assembly assembly) at System.Reflection.Assembly.LoadFromResolveHandler(Object sender, ResolveEventArgs args) at System.AppDomain.OnAssemblyResolveEvent(RuntimeAssembly assembly, String assemblyFullName) at System.Reflection.RuntimeAssembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks, IntPtr ptrLoadContextBinder) at System.Reflection.RuntimeAssembly.InternalGetSatelliteAssembly(String name, CultureInfo culture, Version version, Boolean throwOnFileNotFound, StackCrawlMark& stackMark) at System.Resources.ManifestBasedResourceGroveler.GetSatelliteAssembly(CultureInfo lookForCulture, StackCrawlMark& stackMark) at System.Resources.ManifestBasedResourceGroveler.GrovelForResourceSet(CultureInfo culture, Dictionary2 localResourceSets, Boolean tryParents, Boolean createIfNotExists, StackCrawlMark& stackMark)
at System.Resources.ResourceManager.InternalGetResourceSet(CultureInfo requestedCulture, Boolean createIfNotExists, Boolean tryParents, StackCrawlMark& stackMark)
at System.Resources.ResourceManager.InternalGetResourceSet(CultureInfo culture, Boolean createIfNotExists, Boolean tryParents)
at System.Resources.ResourceManager.GetString(String name, CultureInfo culture)
at System.SR.InternalGetResourceString(String key)
at System.SR.GetResourceString(String resourceKey, String defaultString)
at System.Diagnostics.StackTrace.ToString(TraceFormat traceFormat)
at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)
at System.Exception.GetStackTrace(Boolean needFileInfo)
at System.Exception.ToString(Boolean needFileLineInfo, Boolean needMessage)
at System.Exception.ToString()
at log4net.ObjectRenderer.DefaultRenderer.RenderObject(RendererMap rendererMap, Object obj, TextWriter writer)
at log4net.ObjectRenderer.RendererMap.FindAndRender(Object obj, TextWriter writer)
at log4net.ObjectRenderer.RendererMap.FindAndRender(Object obj)
at log4net.Core.LoggingEvent.GetExceptionString()
at log4net.Appender.AppenderSkeleton.RenderLoggingEvent(TextWriter writer, LoggingEvent loggingEvent)
at log4net.Appender.FileAppender.Append(LoggingEvent loggingEvent)
at log4net.Appender.AppenderSkeleton.DoAppend(LoggingEvent loggingEvent)
at log4net.Util.AppenderAttachedImpl.AppendLoopOnAppenders(LoggingEvent loggingEvent)
at log4net.Repository.Hierarchy.Logger.CallAppenders(LoggingEvent loggingEvent)
at log4net.Repository.Hierarchy.Logger.ForcedLog(Type callerStackBoundaryDeclaringType, Level level, Object message, Exception exception)
at log4net.Repository.Hierarchy.Logger.Log(Type callerStackBoundaryDeclaringType, Level level, Object message, Exception exception)
at log4net.Core.LogImpl.Error(Object message, Exception exception)
at NScrapy.Scheduler.RedisExt.RedisScheduler.ListenToRedisTopic() in C:\Users\xboxeer\source\repos\NScrapy\NScrapy.Scheduler\RedisExt\RedisScheduler.cs:line 111
at System.Threading.Thread.ThreadMain_ThreadStart()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()

Sentiment Test 4

I'm very very unhappy with the support! please help assign an engineer on this as soon as possible! I'm very upset now!

Sentiment test 2

I'm very very unhappy with the support! please help assign an engineer on this as soon as possible! I'm very upset now!

Adapte Consul framework as our Service Discovery Framework

Both Downloader and Spider node should register themself in Consul, and expose webapi so that an web admin page can use these web api to know how many Downloader and Spider are currently running and what are there status at this moment

Make Spider distributeable

After completing the Distributed Downloader, the bottom neck of NScrapy is that extract information from return web page is very slow, usually we use CssSelector or xPath selector to get information from response, which is very CPU resource cost.
If i Open 3 Downloader in distributed mode, there are actually 12 Worker Downloader working(Downloader pool set to 4) some times these 3 Downloader will waiting for New URLs since the extracted URL in redis MQ already consumed and the Spider is still analysis the output page, so we are actually not take full usage of the Distributed Downloader, Spider must be able to be separated to different process/machine

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.