Comments (2)
First proposal of how the elements could be structured: Extractor is only providing a common interface on which all extractors are based.
classDiagram
class Project {
<< User facing, used in the CLI >>
url: str
path: str
clone(url) -> str
gather(sources: List[str]) -> str
}
class GitExtractor {
path: str
creator: str
authors: Tuple[Person]
creation_date: datetime
releases: Tuple[Release]
extract()
to_graph(): -> Graph
}
class Person {
name: str
affiliation: Optional[str]
email: Optional[str]
givenName: Optional[str]
familyName: Optional[str]
}
class GithubExtractor {
name: str
url: str
author: Person
Maintainer: Person
codeRepository: str
dateCreated: datetime
keywords: Optional[List[str]]
license: Optional[str[]
Organization: Optional[str]
description: Optional[str]
version: Optional[str]
extract()
to_graph(): -> Graph
}
class GitlabExtractor {
<< ~ Same as github >>
extract()
to_graph(): -> Graph
}
class Release {
date: datetime
tag: str
commit_hash: str
}
class Extractor {
<< Abstract class defining a standard interface >>
path: str
extract()
to_graph(): -> Graph
serialize(format: str): -> str
}
class PypiExtractor {
identifier: str
name: str
runtimePlatform: Optional[str]
CommandLineApplication: Optional[str]
runtimePlatform: Optional[str]
version: Optionial[str]
applicationCategory: Optional[str]
executableName: Optional[str]
Ā
extract()
to_graph(): -> Graph
}
Extractor <|--GitExtractor: subclass
Extractor <|--GithubExtractor: subclass
Extractor <|--GitlabExtractor: subclass
Extractor <|--PypiExtractor: subclass
Project --* `Extractor`: composed of
- Example of what an extractor could look like: https://gist.github.com/cmdoret/90c95ee92bfc9b125f0e6acc5c0dba8a
- Example of how it is implemented in perceval for inspiration: https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/github.py
from gimie.
From the user-perspective, the different extractors could either be accessed directly:
from gimie.extractors import GithubExtractor, GitlabExtractor
gh = GIthubExtractor(url='https://...')
gl = GItlabExtractor(url='https://...')
Or we could provide some helper to act as a single entry point:
from gimie.extractors import get_remote_extractor, get_local_extractor
gh = get_remote_extractor('https://..', source='github')
gl = get_remote_extractor('https://..', source='gitlab')
This might also make it easy to call multiple extractors in an orchestrator function or a Project
class.
from gimie.
Related Issues (20)
- Incorrect value for codeRepository
- Licenses not being picked up correctly HOT 2
- Extract license when unavailable HOT 3
- provide generic file object
- Reduce extractor complexity
- Add Parser concept HOT 1
- [ IP] publishing date - not pulled out of gimie HOT 2
- Fix docker push CI HOT 5
- Implement license detection for GitExtractor
- make list_files recursive HOT 9
- write contribution guide
- enforce conventional PR titles
- prevent docs rebuild on non-default branches HOT 1
- Implement license matcher HOT 2
- Optimize Github Actions HOT 1
- Optimize gimie container size
- License identifier not matched correctly HOT 1
- License (SPDX) maintenance strategy HOT 1
- Move code out of __init__ files
- graphql error on some large repos
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gimie.