1. I have not received the email with the results link.
  2. First check the email account you used to submit calculations is valid. If it is a valid account, then check the spam inbox of your email account. Be patient, calculations can take time. If it has been more than 1 day since you submitted your calculations and you still have no results, please contact us, an error may have happened.


  3. How long will my calculations take? Can't you show me an estimation?
  4. That is difficult to foresee. Computation time depends on many factors such as the size of the dataset, the workload of the cluster or the number of model being applied. A simple model, such as DT, can take just a few minutes. However, other configurations can take longer.


  5. What models can I train?
  6. Several models are available for training:

  7. DT: Decision Tree
  8. RF: Random Forest
  9. SVM: Support Vector Machines
  10. XGBOOST: Extreme Gradient Boosting Machines
  11. KNN: K-Nearest Neighbors
  12. ANN: Artificial Neural Networks
  13. RP: Repeated Incremental Pruning to Produce Error Reduction (RIPPER)
  14. RLF: RuleFit

  15. ANN is not achieving good results. Can I train my own topology?
  16. No. User topologies are not allowed due to the difficulties to implement one that is suitable for ML SERVER. Instead, should you want to explore new topologies, please, contact us at hperez@ucam.edu or ajbanegas@ucam.edu.


  17. How do the models find the best set of parameters?
  18. All the models, except ANN, implement a grid search approach through which they explore a vast set of configurations and return the best one.


  19. How large can my dataset be?
  20. Based on the idea that multiple users could submit tasks at the same time, the input datasets are limited to 100 features and 300 samples. Please, keep in mind that, within the 100 features, a record ID and a target class must be present.


  21. How should I prepapre my input data?
  22. Input datasets should be submitted either in CSV or PKL format. Notice that, although PKL is a binary format, the underlying structure of the data remains the same as when CSV is provided.
    Datasets must be structured like this:

      a) The first column must be the sample ID. It is recommended to use a integer value.
      b) The last column must be the output class. It can be a discrete value in classification datasets or a continuous value in regression.
      c) All the other columns must contain numeric values. Note that it's up to you to pre-process and format the dataset. Only numeric values are accepted.


  23. I have some problem or question that is not listed in this FAQ.
  24. Contact the administrator (hperez@ucam.edu or ajbanegas@ucam.edu).

UCAMH2020NLHPC