Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Computer Vision pipelines #560

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

ENH: Computer Vision pipelines #560

wants to merge 9 commits into from

Conversation

radheyakale
Copy link
Contributor

@radheyakale radheyakale commented Jun 21, 2022

No description provided.

* TODO: Need a robust way to find if the MLHandler expects image data for training/testing
* TODO: Separate class is required for training and testing CV models
* TODO: Code has to be made generic to dynamically select Resnet50/VGG, etc. models
* TODO: Model training parameters viz. epochs, batch size have to be taken from the user with defaults set
* TODO: setup and get_model functions have to be made more scalable
* TODO: ModelStore __init__ has to be made scalable
@radheyakale radheyakale requested review from sanand0 and jaidevd June 21, 2022 07:45
@radheyakale
Copy link
Contributor Author

radheyakale commented Jun 21, 2022

  • TODO: Need a robust way to find if the MLHandler expects image data for training/testing
  • TODO: Separate class is required for training and testing CV models
  • TODO: Code has to be made generic to dynamically select Resnet50/VGG, etc. models
  • TODO: Model training parameters viz. epochs, batch size have to be taken from the user with defaults set
  • TODO: setup and get_model functions have to be made more scalable
  • TODO: ModelStore init has to be made scalable

@jaidevd @sanand0 Please add/edit/delete any more TODOs that you feel may be required.

@radheyakale radheyakale changed the title Computer Vision pipelines ENH: Computer Vision pipelines Jun 21, 2022
@jaidevd
Copy link
Member

jaidevd commented Jun 21, 2022

@radheyakale

  1. Start by creating a wrapper class for Keras apps which inherits from this. Refer to the HFTransformer class for an example.
  2. Move as much code away from MLHandler as possible. It's okay if the example only supports ResNet50 for now.
  3. Provide a sample gramex.yaml which shows the usage.

@radheyakale
Copy link
Contributor Author

radheyakale commented Jun 24, 2022

@radheyakale

  1. Start by creating a wrapper class for Keras apps which inherits from this. Refer to the HFTransformer class for an example.
  2. Move as much code away from MLHandler as possible. It's okay if the example only supports ResNet50 for now.
  3. Provide a sample gramex.yaml which shows the usage.

@jaidevd Please look at the recent commit fe2c85e. Is this the correct approach to implement it?

* Predict is working with default VGG16 and Resnet models.
* TODO: Test and add all other models supported by Keras
* TODO: Implement training functionality
* TODO: Implement functionality to predict from trained models provided by user/trained in gramex
if op.exists(cls.store.model_path): # If the pkl exists, load it
if op.isdir(cls.store.model_path):
mclass, wrapper = ml.search_modelclass(mclass)
cls.model = locate(wrapper).from_disk(mclass, cls.store.model_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is required for loading transformers. Why is this removed?

gramex/ml_api.py Outdated
@@ -46,6 +46,10 @@
"statsmodels.tsa.statespace.sarimax",
],
"gramex.ml_api.HFTransformer": ["gramex.transformers"],
"gramex.ml_api.KerasApplications": [
"tensorflow.keras.applications.vgg16",
"tensorflow.keras.applications.resnet50"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just leave this at tensorflow.keras.applications, because

from tensorflow.keras.applications import *

covers everything

gramex/ml_api.py Outdated Show resolved Hide resolved
gramex/ml_api.py Show resolved Hide resolved
gramex/ml_api.py Outdated
@@ -426,3 +435,49 @@ def _predict(
):
text = X["text"]
return self.model.predict(text)


class KerasApplications(AbstractModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this KerasApplication - singular.

gramex/handlers/mlhandler.py Outdated Show resolved Hide resolved
data = imutils.resize(cv2.imdecode(np.fromstring(
self.request.files['image'][0].body, np.uint8), cv2.IMREAD_UNCHANGED),
width=224)
data = cv2.resize(data, (224, 224))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. All of this logic should be in the wrapper class.
  2. OpenCV should not be a dependency. For loading images from files / streams, use PIL.Image.open
  3. For resizing, use skimage.transform.resize or tf.image.resize or PIL.Image.resize.
  4. The target size after resizing should not be hardcoded to [224, 224] - there are other sizes in Keras apps too. For this, you can check the shape of the input tensor in the corresponding model with model.input_shape.

self.set_header('Content-Disposition',
f'attachment; filename={op.basename(self.store.model_path)}')
with open(self.store.model_path, 'rb') as fout:
self.write(fout.read())
elif '_model' in self.args:
self.write(json.dumps(self.model.get_params(), indent=2))
elif len(self.request.files.keys()) and \
self.request.files['image'][0].content_type in \
['image/jpeg', 'image/jpg', 'image/png']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should not happen here. As far as possible, send the request payload blindly to the _fit or _predict methods. The wrapper class should take care of everything if it is written correctly.

Also note that there are two ways one can send images in a request.

  1. Send files (multipart form data) - in which case you look at the mimetypes of the files received and open them accordingly. For this, take a look at the definition of self._parse_multipart_form_data
  2. Send the raw bytestream of an image with a Content-Type: image/whatever header. In this case, write functions called _parse_image_jpeg, _parse_image_png, etc. The handler will automatically call them.

Basically the handler knows how to parse data given the content type of the request.

if 'training_data' in data.keys():
training_results = yield gramex.service.threadpool.submit(
self._train, data=data['training_data'].iloc[0])
self.write(json.dumps(training_results, indent=2, cls=CustomJSONEncoder))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

json.dump(class_names, fout)
keras_model.save(config_dir)
return class_names

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this and move it to the wrapper class.

* Training keras models would be done in ml_api in KerasApplication wrapper
* _parse_multipart_form_data used for parsing images
@@ -184,7 +186,10 @@ def _transform(self, data, **kwargs):
return data

def _predict(self, data=None, score_col=''):
self._check_model_path()
import io
if type(data) == io.BytesIO:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use isinstance for checking types.

data = self.args['training_data']
training_results = yield gramex.service.threadpool.submit(
self._train, data=data)
self.write(json.dumps(training_results, indent=2, cls=CustomJSONEncoder))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Training should not happen in GET. Only in POST or PUT.

self.store.load('class'), self.store.load('params'),
data=data, target_col=target_col,
nums=self.store.load('nums'), cats=self.store.load('cats')
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaidevd - find a way to remove dataframe-specific code from MLHandler - it should deal only with train / predict semantics.

gramex/ml_api.py Outdated
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to __init__.

gramex/ml_api.py Outdated
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should do only self.model.predict.

* Model initialisation code moved to __init__
* Training added to POST request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants