pedrobcst / xerus Goto Github PK
View Code? Open in Web Editor NEWXRay Estimation and Refinement Using Similarity (XERUS)
License: MIT License
XRay Estimation and Refinement Using Similarity (XERUS)
License: MIT License
Currently OptimadeQuery interface does not use the predefined REQUESTS_TIMEOUT define in settings.py.
Add this timeout value to the requests.get call of OptimadeQuery interface.
update the codebase to be compatible with newest versions of pymatgen and bump pymatgen version
MaterialsProject has released a new default interface and API. Currently the API key from the newest version of Materials project does not work in Xerus, only the legacy ones works.
TODO:
As of currently, we have a possible new 'cif' testing method which is much faster and does not do any refinement. After the changes on how the default 'refinement' is done and modifying the simulations tests to be handled in place (when the data is being simulated), Xerus became much more stabler. The only 'test' now required is if GSASII engine can actually parse a CIF coming from a provider. The new tests justs then tries to open a CIF file using GSASII using a dummy project. This leads to a large increase of possible structures available that before were considered "system breaking".
This has to be implemented before #22.
TODO:
I am new to the Xerus package, and I have successfully installed it using the provided instructions. I have already set up the MongoDB server and the materials project API key.
I ran tests and noticed that some of the tests were failing. Upon debugging, I discovered that Xerus continues to use dependencies' deprecated functions from versions that were installed alongside the Xerus installation.
I have downgraded to the following packages:
Doing this has caused some tests to pass, but now, the test session gets stuck at "test_solvers.py::test_boxauto ".
When I tried running the Examples.ipynb, I get the following error:
FileNotFoundError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_18928\2514958583.py in <module>
15
16 # Analyze and store.
---> 17 run.analyze(n_runs="auto")
18
19 # Save objecti nto memory?
~\Xerus\Xerus\__init__.py in analyze(self, n_runs, grabtop, delta, combine_filter, select_cifs, plot_all, ignore_provider, ignore_comb, ignore_ids, solver, group_method, auto_threshold, r_ori, n_jobs)
545
546 # Get the cifs, simulate the patterns, run correlation (first phase)
--> 547 self.get_cifs(
548 ignore_provider=ignore_provider,
549 ignore_comb=ignore_comb,
~\Xerus\Xerus\__init__.py in get_cifs(self, ignore_provider, ignore_comb, ignore_ids)
223 self
224 """
--> 225 cif_meta, cif_notran, cif_notsim = LocalDB().get_cifs_and_write(
226 element_list=self.elements,
227 outfolder=self.working_folder,
~\Xerus\Xerus\db\localdb.py in get_cifs_and_write(self, element_list, name, outfolder, maxn, max_oxy)
238 final_path = os.path.join(outfolder, folder_to_write)
239 queries = make_system_types(element_list, maxn)
--> 240 self.check_all(queries, name = name)
241
242 # check oxygen limit
~\Xerus\Xerus\db\localdb.py in check_all(self, system_types, name)
198 else:
199 print("Checking the following combination:{}".format(combination))
--> 200 self.check_and_download(combination, name = name)
201 return self
202
~\Xerus\Xerus\db\localdb.py in check_and_download(self, system_type, name)
174 if not self.check_system(system_type):
175 elements = system_type.split("-")
--> 176 multiquery(elements, max_num_elem=len(elements), name = name)
177 return self
178
~\Xerus\Xerus\queriers\multiquery.py in multiquery(element_list, max_num_elem, name, resync)
160 # ## UPDATE DB ##
161 print("Uploading database with cifs..")
--> 162 data = load_json(os.path.join(test_folder, 'cif.json'))
163 print(len(data))
164 if len(data) == 0:
~\Xerus\Xerus\queriers\multiquery.py in load_json(path)
77 """
78
---> 79 with open(path, "r") as fp:
80 return json.load(fp)
81
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Ali Bhatti\\Xerus\\Xerus\\queriers\\LiMn2O4+Li2MnO3_parsed.csv_Mn_cifs\\cif.json'
As of last release, Xerus is locally installable by pip, removing the need of path hacks. The notebooks have to be update to reflect this change.
Currently there are few bottlenecks that can make a first run analysis slow. Investigate on how to solve these bottlenecks to makes things smoother
CIF test: Currently, all structures downloaded from a provider goes through a simple test by GSAS II, to check if they will work fine or not (some structures can literally break GSASII). Investigate the following:
Caching Simulation: Currently, everytime an analysis is run, Xerus will simulate the patterns for all structures. In this case, even if the analysis is done for the sample just with different paramaters (ie, increase n_runs, change box width, ignore a structure etc), all the simulations will be redone. When there is a lot of structures to be simulated it can take a while. Check the following options:
Change the input paramaters to st.forms so we dont always reload the app.
Currently, after a search is done all the results are saved in dataframe and exported through CSV. However, there is no option to load this CSV to re-visualize past results. The only way is by rerunning the search. Solve this issue so it is possible to use Xerus plotting functions to quick revisualize results and also allow to do optimization if necessary.
As of PR #24, we do not do test refinements anymore. Since things became much stabler and there is no more errors that breaks and requires a totally script rerunning, the purpose of tcif.py can be moved elsewhere.
TODO:
Hello,
Thank you for the nice work!
While using Xerus, i feel there might be many repeating querying from the database?
For instance a system with elements A, B, C, and D
the program try to query combinations A, B, C, D, (AB), (AC), (AD), (BC),(CD).....
However, if you just query for (ABCD) you will get all the combinations from the database? Or i am totally wrong?
To increase ease of use, add functionality to support the creation of Streamlit interface
TODO:
Xerus-streamlit project will be done in a different repo
Currently when a provider sends a large amount of data back from an optimade structure query, it results in a error 503/504 and the querier stop, example is when querying COD for ['Si', 'O'] system.
For example this query URL will return 504
Example of extremely large file:
COD ID 1552091
Investigate how to handle this.
Ideas:
Currently, the CI is failed due to a connection issue with AFLOW.
Github actions is failing to connect to AFLOW using urllib, even though the connection to the COD seems to be working fine. This is leading to a error in the CI tests since the structures cannot be downloaded for testing the solvers. If the tests are run into a local machine (ie, my machine), they will successfuly pass.
Have to find a way to fix this. For the moment I will create a seperate branch where AFLOW is disabled and confirm the tests pass there.
As of latest version, the "dummy" entry created into the database when no structures exist for a given element combination in any of the databases providers to avoid continuosly requerying that element combination is not being added anymore. This probably appeared after the testing method changed. Fix this.
Currently, as designed, Xerus can only handle one user for one installation when querying for missing CIFs for a given chemical space.
With the development of the Streamlit beta interface, the possibility of one installation being used by many users will be possible.
In this scenario, Xerus cannot handle concurrent query for different chemical spaces, as it always save and tests the cifs using fixed names.
Change this to support multiquery use by multiple users simultaneously.
Ideas:
TODO:
Currently, everytime the analyze
function is ran (even if the same paremeters), Xerus will re-simulate, re-query the database. This can be time consuming, and actually makes the sometimes needed iterative process of hyperparameter tuning (ie, g, delta, n_runs, provider settings and so on) time consuming. In light of this problem, the following changes are needed:
It is usually very nice when software presents itself in the HTTP User-Agent
header. Web browsers almost always present themselves in a verbose but neat way. This way the server-side developers are informed about what software makes requests to their servers, and may occasionally forward common issues to the client developers. Moreover, usage statistics could be drawn at the server-size, which is nice.
Xerus seems to be using Python requests
package for HTTP requests. By default its User-Agent
is python-requests/<version>
. Changing this seems to be rather simple. It would be nice to see Xerus/<version>
at least.
If user privacy is concern, it should not be forgotten that this is F/LOSS, thus anyone concerned with their privacy are free to patch their copy of the client to their liking.
Hi there, very interesting package!
I am just wondering if you would be interested in implementing an OPTIMADE API querier class, which would provide unified access to all crystal structure databases in the OPTIMADE consortium. I would be interested in helping with the implementation!
Hi all, I get this error message tryinig to install xerus in a conda environment.
with pip install -e . I get this error:
ImportError: Error importing numpy: you should not try to import numpy from
its source directory; please exit the numpy source tree, and relaunch
your python interpreter from there.
I have no glue what can I do with this message.
Thank you
Klemens
Regarding CI, Ive looked around it but I could not find a way to set it up. Specially how to automatically start mongo, create a conda env, do pip, and set up all the config files automatically. If you have any clues on how to do it let me know. I was also planning to create a docker enviroment to easily deploy it but I got stuck on the same issue.
Originally posted by @pedrobcst in #4 (comment)
As far as I see it, there are a few packaging/testing things that could be added to the package for better re-use:
pip install
'edI can spend an hour or so on this here and there to get the ball rolling, as these are all requirements for me to use the package elsewhere. Perhaps the overall containerization can wait until we have had a discussion in the future.
Hi
when running Examples notebook, I encounter such issue when getting to optimization step (both on Windows and Mac):
mixture.run_optimizer(n_trials = 200, # How many runs to try
n_startup = 20, # How many trials to start the search with
allow_pref_orient = True, # To allow pref orietnation to be refined
allow_atomic_params = True, # Allows atomic paramaters (X, U) to be considered
allow_broad = True, # Allow broadenign terms (Check GSAS II doc.) (test)
allow_strain = True, # Allow strain terms (test)
allow_angle = True, # To allow acute angle refinement first.
force_ori = False, # To always consider pref. orientation
verbose = 'silent', # Verbose
param = 'rwp', # Objetive goal
n_jobs = -1, # Number of cores to use
plot_best = True, # plot best result after opt.
show_lattice = True, # Prints obtained lattice parameters
random_state = 71, # Random state
)
[W 2023-06-26 21:38:25,378] Trial 5 failed because of the following error: AttributeError("Can't pickle local object 'BlackboxOptimizer.objective_mp..evaluate'")
Traceback (most recent call last):
Recently the CI job is failing at:
Unable to locate package libgfortran4
This is probably (?) due to newest version of Ubuntu (?, ie at ubuntu-latest). Think how to fix this.
Any ideas @ml-evs ?
Write tests coverages for the following cases
As of possible new release (1.1b), we might support all OSes. In light of this, it might be necessary to update the CI to test in all oses.
This (hypothetically) might work [basically move to conda for enviroment management for CI]:
Hi
I have an issue on Linux version (through a virtual box) with the GSASII scriptable installation. When I do the test, I run into an error for test_gsas2.py
ERROR GSAS-II binary libraries not found.
I did install it as well on windows and it worked fine. Any advice?
Currently, one of the main issues of when querying the COD is the lack of control on what structure we obtain. As discussed in the paper, one of the main issues of missclassifications is when a distorted low temperature structure (that usually comes from the COD) is matched instead of the of room temperature one.
In this situtation, one possibility to avoid this is to implement one extra filter on the OPTIMADE querier of COD (_cod_celltemp) to restrict structures around room temperature only ( maybe 293 +- 5 K ?).
This info seems to not be available on the COD REST API, therefore the Optimade querier should become the main one.
To do this, evaluate:
When we are doing powder matching for ceramics it would be a ideal if we could have a check button that would search for 'oxygen' only spaces.
TODO:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.