Giter VIP home page Giter VIP logo

Comments (12)

onclave avatar onclave commented on June 11, 2024 1

@RJoshlan it means that, usually, when you are using binary encoding on your chromosomes, it is rare that you would use your dataset values directly into your chromosome's genetic code. In the above example the data from the dataset was never fed directly into the NSGA-II algorithm, instead a reference was kept and the only interfacing that the dataset did with the algorithm was at the Objective Function level.

from nsga-ii.

onclave avatar onclave commented on June 11, 2024 1

I'm closing this issue since I hope I was able to resolve your issue. Reopen it if you have more queries.

from nsga-ii.

onclave avatar onclave commented on June 11, 2024

For jFree error, refer to issue #8 .

from nsga-ii.

onclave avatar onclave commented on June 11, 2024

I shall provide a detailed documentation on how to use external datasets with this library shortly. Working on it.

from nsga-ii.

RJoshlan avatar RJoshlan commented on June 11, 2024

thanks

from nsga-ii.

onclave avatar onclave commented on June 11, 2024

Hello @RJoshlan, refer to the documentation here under the Getting Started section to understand how you can use your own custom datasets with the library. Let me know if you face any issues.

from nsga-ii.

RJoshlan avatar RJoshlan commented on June 11, 2024

I am using this dataset (CICIDS2017) and i am not sure how to read the dataset using GeneticCodeProducer. I tried but the code doesn't even compile. Can you help me with this because i don't know where i am going wrong. Also i tried using permutation based encoding and still it doesn't work

` public static GeneticCodeProducer geneticCodeProducerFromDataset(String path) {
return (length) -> {

		List<BooleanAllele> geneticCode =  new ArrayList<>();
		
		try {
			DataSource dataSource = new DataSource(path);
	        // Loading the dataset
	        Instances getData = dataSource.getDataSet();
	        //length = getData.numAttributes();
	    	String geneFormat = "%0"+ calculateGeneSize(path) +"d";
	    	length  = getData.size();
	        
			while (geneticCode.size() < length) {
				int data = ThreadLocalRandom.current().nextInt(1, getData.size());
				
				String gene = String.format(geneFormat, returnBinaryValueFromInt(getData.get(data).numAttributes()));
				
				for (char alleleChar: gene.toCharArray()) {
					geneticCode.add(new BooleanAllele(returnBooleanValueFromChar(alleleChar)));
				}
			}
	        
			
			
		} catch (Exception e1) {
			e1.printStackTrace();
		}

		return geneticCode;

	};
}`

from nsga-ii.

onclave avatar onclave commented on June 11, 2024

@RJoshlan I shall need more information about your work before I can help you. Firstly, provide me with the dataset you are working with so that I can take a look into it. Next, give me a very brief idea about how you want to encode your chromosomes with your dataset. Third, let me know what kind of encoding you want to use with your chromosomes.

I see you are trying to use BooleanAllele to which I assume you have tried to use binary encoding. Do keep in mind that for binary encoding, usually, you do not encode your dataset directly into the chromosome, rather keep a reference to it.

from nsga-ii.

RJoshlan avatar RJoshlan commented on June 11, 2024

@onclave Thanks for replying.

This is the datatset that i am using. It has 81 variables and 25000 instances. (Original has around 200,000 instances and 81 variables but i've uploaded a small one because of file size)
Dataset.zip

The chromosome encoding that i am trying to acheive is in a way which directly depends on the dataset. Eg producing values which represent the dataset variable index's from 1- 80 where chromosome length for example might be 6 alleles chosen from the 80 variables.

Also you were write that i used binary encoding but i am getting the logic wrong when trying to keep dataset as reference to chromosomes as a result i contacted you for help.

Thanks.

from nsga-ii.

onclave avatar onclave commented on June 11, 2024

If I may make a guess, you basically have a 2D dataset with 81 columns (attributes) and 25k rows (samples). You would probably want to create a population out of this. Since I don't know what your work is and what you are trying to achieve, I shall take an example problem out of it and explain how to solve that using this library and then you can use that knowledge to see how that fits to your problem set.

Problem: Let's say, considering samples, you want to do feature selection among the 81 attributes trying to select 5 marker attributes.

Solution:

Each of your chromosomes shall be binary encoded of length 81. In the beginning, randomly generate a population of N number of chromosomes. The genetic code for each chromosome represents a probable solution. The indices with Allele value 1 is considered as selected attribute and 0 is considered not selected. This is how you keep reference to your dataset with the chromosome.

Prepare your own objective functions against your dataset. They can be maximization problems or minimization problems. This library considers all objective functions to be maximization problems. Hence, for any minimization problem, take its inverse.

For each chromosome, based on its genetic code, prepare a subset of your dataset selecting only those attributes which are "1". Again, this is how you keep reference to your dataset with your NSGA-II code. NSGA-II will run the objective functions for you and the objective functions will work with your dataset to provide objective values or "fitness" for your chromosomes. NSGA-II will use these values for each chromosomes to then perform non-dominated sorting, rank assignment and crowding-distance assignment. After G generations, NSGA-II will return you the Pareto Front.

All this will be managed by NSGA-II and you do not have to actually change any code within the library. All you have to do is to write your own objective function and provide it to NSGA-II. You usually do not need to directly feed your dataset to the GeneticCodeProducer.

For your objective functions, it takes a chromosome. So, given a chromosome, you write your own logic on how this chromosome is used to prepare a subset of your original dataset in reference to its genetic code and what operations to perform on this subset in order to return a double value.

Once you have your Pareto Front, you use your own logic to select one chromosome as your final solution. This is not part of the NSGA-II package.

Once you have your selected solution, you use your own logic to select 5 best markers as your resultant biomarkers. This is not part of the NSGA-II package.

I hope this is explanatory enough to understand how to use your own dataset and work with this package.

from nsga-ii.

RJoshlan avatar RJoshlan commented on June 11, 2024

Thanks it more clear now. I do have the objective functions but i was getting it wrong when trying to encode using the dataset.

Just a point - you mentioned

""You usually do not need to directly feed your dataset to the GeneticCodeProducer"".

What you mean by this?

from nsga-ii.

RJoshlan avatar RJoshlan commented on June 11, 2024

@onclave Thanks for for your help.

from nsga-ii.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.