Backpropagation

Considering a cost functional with the same characteristics as the one used for the SGD case, then

Problem

Modify the SGD routine so as to integrate it along with the backpropagation algorithm
Test such modification on the classification problem solved by an architecture that includes one hidden layer

Architecture & gradients

As detailed in the ELM simulation the Softmax function along with a hidden layer can be summarized by

\[\begin{eqnarray*} F(\mathbf{W}_1, \mathbf{Z}_1) & = & \frac{1}{N}\sum_{n=0}^N \langle \mathbf{Z}_1\mathbf{c}_n^{^{(\mathbf{z})}}, \mathbf{W}_1\mathbf{h}_n \rangle + \frac{1}{N} \sum_{n=0}^N \log\left( \sum_{l=0}^{L-1} \mbox{e}^{ -\langle \mathbf{Z}_1\mathbf{c}_n^{^{(\mathbf{z})}}, \mathbf{W}_1\mathbf{c}_l \rangle }\right) \qquad (1) \end{eqnarray*}\]

where

\(\mathbf{Z}_0 = \mathbf{X} \in \mathbb{R}^{r_0 \times N}\) represents the input dataset which has \(L\) classes.
\(\mathbf{Z}_1 = \varphi_1(\mathbf{W_0}\mathbf{Z}_0) \in \mathbb{R}^{r_1 \times N}\) represents a dense hidden layer.
\(\mathbf{W}_0 \in \mathbb{R}^{r_1 \times r_0}\): hidden layer’s weights.
\(\varphi_1(\cdot)\) is a non-linear activation function.
\(\mathbf{W}_1 \in \mathbb{R}^{r_1 \times L}\): softmax’s layer weights.
Typical case \(r_1 \ll r_0 \) (at least \(r_1 < r_0 \)).

and the vectors

\(\mathbf{c}_n^{^{(\mathbf{z})}}\) represents a canonical vector of length \(\mathbf{Z}.\mbox{shape}[1]\)
\(\mathbf{c}_l\) represents a canonical vector of length L
\(\mathbf{h}_n \in \mathbb{R}^L\) is the one-hot representation of \(n^{th}\) training element’s class.

are used to extract a particular column of a given matrix.

Using this notation, then the backpropagation algorithm applied to this case (for a full explanation see Lecture 3b’s Introduction & BP: softmax+HL sections) may be summarized by

\[\begin{split}\begin{array}{rcll} \nabla_{\mathbf{W}_1} F(\mathbf{W_1}, \mathbf{Z}_1) & = & \frac{1}{N}\mathbf{Z_1}\mathbf{H}^T - \frac{1}{N}\mathbf{Z}_1 \mbox{softmax}(-\mathbf{Z}_1^T \mathbf{W_1}, \mbox{axis=1}) & \quad(2.1) \\ \nabla_{\mathbf{W_0}} \varphi_1(\mathbf{W_0}\mathbf{Z}_0) & = & \psi(\mathbf{W_0}\mathbf{Z}_0) \mathbf{Z}_0^T & \quad(2.2) \\ \nabla_{\mathbf{Z_1}} F(\mathbf{W_1}, \mathbf{Z}_1) & = & \frac{1}{N}\mathbf{W_1}\mathbf{H} - \frac{1}{N}\mathbf{W}_1 \mbox{softmax}^T(-\mathbf{Z}_1^T \mathbf{W_1}, \mbox{axis=1}) & \quad(2.3) \\ & & & \\ \nabla_{\mathbf{W_0}} F(\mathbf{W_1}, \mathbf{Z}_1) & = & \left( \psi(\mathbf{W_0}\mathbf{Z}_0) \odot \nabla_{\mathbf{Z_1}} F(\mathbf{W_1}, \mathbf{Z}_1) \right) \mathbf{Z}_0^T & \quad(2.4) \end{array}\end{split}\]

where (2.1)-(2.3) are part of the forward pass, and (2.4) is the backward pass; furthermore, it is relevant to highlight that

\(\mathbf{H} \in \mathbb{R}^{L \times N}\) represents the one-hot matrix associated with the training dataset.
\(\mbox{softmax}(\mathbf{V}, \mbox{axis=1}) = \mathbf{S}\), where:
- \(\mathbf{V}, \mathbf{S} \in \mathbb{R}^{\kappa \times \eta}\)
- \(\mathbf{S}[k,:] = \frac{\mbox{e}^{\mathbf{V}[k,:]}}{\sum_{n=1}^\eta \mbox{e}^{\mathbf{V}[k,n]} }\)

Proposed solution

Among other pausible solutions, the proposed one is detailed below

Differences between SGD and SGD+BP

Each of the tabs below highlights the key difference between the original SGD routine and the one that includes the backpropagation and trains a hidden layer.

Weights initialization

SGD

SGD+BP

def hosgdSGD(OptProb, nEpoch, blkSize, alpha0, hyperFun):

    W = OptProb.randSol()

    stats = np.zeros( [nEpoch, OptProb.Nstats] )

    nBlk = np.floor_divide(OptProb.X.shape[1],blkSize)
    if np.remainder(OptProb.X.shape[1],blkSize) > 0:
       nBlk += 1

    k = 0

    for e in range(nEpoch):

      # Generate permutation
      blkInd = 0
      perm = np.random.permutation(OptProb.X.shape[1])

      for b in range(nBlk):

        # general iteration counter
        k += 1

        # select indexes
        if b <= nBlk-2:
           n = perm[blkInd:blkInd+blkSize]
        else:
           n = perm[blkInd:OptProb.X.shape[1]]

        blkInd += blkSize

        # Compute the gradient
        g     = OptProb.gradFun(W, n)

        alpha = hyperFun.LR(W, alpha0, e, None)

        W = W - alpha*g

        # _END_ for(b)

      stats[e,:] = OptProb.computeStats(W, alpha, e, g)
      OptProb.printStats(e, nEpoch, stats)


    return W, stats

def hosgdSGDHLSover(OptProb, sgdVariant, nEpoch, blkSize):

    W = []

    OptProb.setShapes()
    for l in range(len(OptProb.HL)):
        W.append( OptProb.HL[l].randSol( OptProb.zShape[l] ) )

    stats = np.zeros( [nEpoch, OptProb.Nstats] )


    nBlk = np.floor_divide(OptProb.X.shape[1],blkSize)

    if np.remainder(OptProb.X.shape[1],blkSize) > 0:
       nBlk += 1

    # pass OptProb's basic info to sgdVariant
    sgdVariant.getArchConf( OptProb.HL )

    k = 0

    for e in range(nEpoch):

      # Generate permutation
      blkInd = 0
      perm = np.random.permutation(OptProb.X.shape[1])

      for b in range(nBlk):

        # general iteration counter
        k += 1

        # select indexes
        if b <= nBlk-2:
           n = perm[blkInd:blkInd+blkSize]
        else:
           n = perm[blkInd:OptProb.X.shape[1]]

        blkInd += blkSize

        x = OptProb.X[:,np.ix_(n)].squeeze(1).copy()
        y = OptProb.Y[n]

        # -- fw operator, i.e: z_l = \phi_l( W[l-1], z[l-1]) --
        OptProb.fwOperator(W, x, y)

        # Compute the gradient -- (FwPass, bwPass) --
        for l in range(len(OptProb.HL)-1,-1,-1):

            # fw Pass
            OptProb.fwPass(W, y, l)

            # bw Pass
            g = OptProb.bwPass(W, y, l)

            # Update gradient
            W[l] = sgdVariant.update( W[l].copy(), g, e, l)


      stats[e,:] = OptProb.computeStats(W, sgdVariant.alpha0, e, g)
      OptProb.printStats(e, nEpoch, stats)


    return W, stats

Initialization of SGD sub-routine

SGD

SGD+BP

def hosgdSGD(OptProb, nEpoch, blkSize, alpha0, hyperFun):

    W = OptProb.randSol()

    stats = np.zeros( [nEpoch, OptProb.Nstats] )

    nBlk = np.floor_divide(OptProb.X.shape[1],blkSize)
    if np.remainder(OptProb.X.shape[1],blkSize) > 0:
       nBlk += 1

    k = 0

    for e in range(nEpoch):

      # Generate permutation
      blkInd = 0
      perm = np.random.permutation(OptProb.X.shape[1])

      for b in range(nBlk):

        # general iteration counter
        k += 1

        # select indexes
        if b <= nBlk-2:
           n = perm[blkInd:blkInd+blkSize]
        else:
           n = perm[blkInd:OptProb.X.shape[1]]

        blkInd += blkSize

        # Compute the gradient
        g     = OptProb.gradFun(W, n)

        alpha = hyperFun.LR(W, alpha0, e, None)

        W = W - alpha*g

        # _END_ for(b)

      stats[e,:] = OptProb.computeStats(W, alpha, e, g)
      OptProb.printStats(e, nEpoch, stats)


    return W, stats

def hosgdSGDHLSover(OptProb, sgdVariant, nEpoch, blkSize):

    W = []

    OptProb.setShapes()
    for l in range(len(OptProb.HL)):
        W.append( OptProb.HL[l].randSol( OptProb.zShape[l] ) )

    stats = np.zeros( [nEpoch, OptProb.Nstats] )


    nBlk = np.floor_divide(OptProb.X.shape[1],blkSize)

    if np.remainder(OptProb.X.shape[1],blkSize) > 0:
       nBlk += 1

    # pass OptProb's basic info to sgdVariant
    sgdVariant.getArchConf( OptProb.HL )

    k = 0

    for e in range(nEpoch):

      # Generate permutation
      blkInd = 0
      perm = np.random.permutation(OptProb.X.shape[1])

      for b in range(nBlk):

        # general iteration counter
        k += 1

        # select indexes
        if b <= nBlk-2:
           n = perm[blkInd:blkInd+blkSize]
        else:
           n = perm[blkInd:OptProb.X.shape[1]]

        blkInd += blkSize

        x = OptProb.X[:,np.ix_(n)].squeeze(1).copy()
        y = OptProb.Y[n]

        # -- fw operator, i.e: z_l = \phi_l( W[l-1], z[l-1]) --
        OptProb.fwOperator(W, x, y)

        # Compute the gradient -- (FwPass, bwPass) --
        for l in range(len(OptProb.HL)-1,-1,-1):

            # fw Pass
            OptProb.fwPass(W, y, l)

            # bw Pass
            g = OptProb.bwPass(W, y, l)

            # Update gradient
            W[l] = sgdVariant.update( W[l].copy(), g, e, l)


      stats[e,:] = OptProb.computeStats(W, sgdVariant.alpha0, e, g)
      OptProb.printStats(e, nEpoch, stats)


    return W, stats

Apply forward operator

SGD

SGD+BP

def hosgdSGD(OptProb, nEpoch, blkSize, alpha0, hyperFun):

    W = OptProb.randSol()

    stats = np.zeros( [nEpoch, OptProb.Nstats] )

    nBlk = np.floor_divide(OptProb.X.shape[1],blkSize)
    if np.remainder(OptProb.X.shape[1],blkSize) > 0:
       nBlk += 1

    k = 0

    for e in range(nEpoch):

      # Generate permutation
      blkInd = 0
      perm = np.random.permutation(OptProb.X.shape[1])

      for b in range(nBlk):

        # general iteration counter
        k += 1

        # select indexes
        if b <= nBlk-2:
           n = perm[blkInd:blkInd+blkSize]
        else:
           n = perm[blkInd:OptProb.X.shape[1]]

        blkInd += blkSize

        # Compute the gradient
        g     = OptProb.gradFun(W, n)

        alpha = hyperFun.LR(W, alpha0, e, None)

        W = W - alpha*g

        # _END_ for(b)

      stats[e,:] = OptProb.computeStats(W, alpha, e, g)
      OptProb.printStats(e, nEpoch, stats)


    return W, stats

def hosgdSGDHLSover(OptProb, sgdVariant, nEpoch, blkSize):

    W = []

    OptProb.setShapes()
    for l in range(len(OptProb.HL)):
        W.append( OptProb.HL[l].randSol( OptProb.zShape[l] ) )

    stats = np.zeros( [nEpoch, OptProb.Nstats] )


    nBlk = np.floor_divide(OptProb.X.shape[1],blkSize)

    if np.remainder(OptProb.X.shape[1],blkSize) > 0:
       nBlk += 1

    # pass OptProb's basic info to sgdVariant
    sgdVariant.getArchConf( OptProb.HL )

    k = 0

    for e in range(nEpoch):

      # Generate permutation
      blkInd = 0
      perm = np.random.permutation(OptProb.X.shape[1])

      for b in range(nBlk):

        # general iteration counter
        k += 1

        # select indexes
        if b <= nBlk-2:
           n = perm[blkInd:blkInd+blkSize]
        else:
           n = perm[blkInd:OptProb.X.shape[1]]

        blkInd += blkSize

        x = OptProb.X[:,np.ix_(n)].squeeze(1).copy()
        y = OptProb.Y[n]

        # -- fw operator, i.e: z_l = \phi_l( W[l-1], z[l-1]) --
        OptProb.fwOperator(W, x, y)

        # Compute the gradient -- (FwPass, bwPass) --
        for l in range(len(OptProb.HL)-1,-1,-1):

            # fw Pass
            OptProb.fwPass(W, y, l)

            # bw Pass
            g = OptProb.bwPass(W, y, l)

            # Update gradient
            W[l] = sgdVariant.update( W[l].copy(), g, e, l)


      stats[e,:] = OptProb.computeStats(W, sgdVariant.alpha0, e, g)
      OptProb.printStats(e, nEpoch, stats)


    return W, stats

Update weights

SGD

SGD+BP

def hosgdSGD(OptProb, nEpoch, blkSize, alpha0, hyperFun):

    W = OptProb.randSol()

    stats = np.zeros( [nEpoch, OptProb.Nstats] )

    nBlk = np.floor_divide(OptProb.X.shape[1],blkSize)
    if np.remainder(OptProb.X.shape[1],blkSize) > 0:
       nBlk += 1

    k = 0

    for e in range(nEpoch):

      # Generate permutation
      blkInd = 0
      perm = np.random.permutation(OptProb.X.shape[1])

      for b in range(nBlk):

        # general iteration counter
        k += 1

        # select indexes
        if b <= nBlk-2:
           n = perm[blkInd:blkInd+blkSize]
        else:
           n = perm[blkInd:OptProb.X.shape[1]]

        blkInd += blkSize

        # Compute the gradient
        g     = OptProb.gradFun(W, n)

        alpha = hyperFun.LR(W, alpha0, e, None)

        W = W - alpha*g

        # _END_ for(b)

      stats[e,:] = OptProb.computeStats(W, alpha, e, g)
      OptProb.printStats(e, nEpoch, stats)


    return W, stats

def hosgdSGDHLSover(OptProb, sgdVariant, nEpoch, blkSize):

    W = []

    OptProb.setShapes()
    for l in range(len(OptProb.HL)):
        W.append( OptProb.HL[l].randSol( OptProb.zShape[l] ) )

    stats = np.zeros( [nEpoch, OptProb.Nstats] )


    nBlk = np.floor_divide(OptProb.X.shape[1],blkSize)

    if np.remainder(OptProb.X.shape[1],blkSize) > 0:
       nBlk += 1

    # pass OptProb's basic info to sgdVariant
    sgdVariant.getArchConf( OptProb.HL )

    k = 0

    for e in range(nEpoch):

      # Generate permutation
      blkInd = 0
      perm = np.random.permutation(OptProb.X.shape[1])

      for b in range(nBlk):

        # general iteration counter
        k += 1

        # select indexes
        if b <= nBlk-2:
           n = perm[blkInd:blkInd+blkSize]
        else:
           n = perm[blkInd:OptProb.X.shape[1]]

        blkInd += blkSize

        x = OptProb.X[:,np.ix_(n)].squeeze(1).copy()
        y = OptProb.Y[n]

        # -- fw operator, i.e: z_l = \phi_l( W[l-1], z[l-1]) --
        OptProb.fwOperator(W, x, y)

        # Compute the gradient -- (FwPass, bwPass) --
        for l in range(len(OptProb.HL)-1,-1,-1):

            # fw Pass
            OptProb.fwPass(W, y, l)

            # bw Pass
            g = OptProb.bwPass(W, y, l)

            # Update gradient
            W[l] = sgdVariant.update( W[l].copy(), g, e, l)


      stats[e,:] = OptProb.computeStats(W, sgdVariant.alpha0, e, g)
      OptProb.printStats(e, nEpoch, stats)


    return W, stats

New structure for the OptProb class

class hosgdHLOptProb(hosgdOptProb):
    r""" F(W)   = (1/N) \sum_n f_n(W)
         f_n(W) = < Xn, Wl_n > + log( \sum_l e^{ -< Xl, Wl >} )
    """

    def __init__(self, X, Y, nClass, HL, Xtest=None, Ytest=None, lmbd=0,
                 gNormOrd=2, Nstats=4, verbose=10):

        self.X = X
        self.Y = Y
        self.nClass = nClass
        self.Xtest = Xtest
        self.Ytest = Ytest
        self.lmbd = lmbd
        self.Nstats   = Nstats
        self.gNormOrd = gNormOrd
        self.verbose  = verbose

        self.HL = HL

        self.fwOpFlag   = False
        self.fwPassFlag = False
    # ---------


    def setShapes(self):

        self.zShape = np.zeros( len(self.HL), dtype=int )

        # input layer
        self.zShape[0] = self.X.shape[0]

        #
        for l in range( 1, len(self.HL) ):
            self.zShape[l] = self.HL[l-1].nShp

    # ---------

    def fwOperator(self, W, x, y):

        if self.fwOpFlag:

           #self.z[0] = self.HL[0].fwd(W[0], x)
           self.z[0] = x

           for l in range( 1, len(self.HL) ):
               self.z[l] = self.HL[l-1].fwd( W[l-1], self.z[l-1] )



        else:  #first call

           self.z = []

           self.z.append( x )


           for l in range( 1, len(self.HL) ):
               self.z.append( self.HL[l-1].fwd(W[l-1], self.z[l-1] ) )


           self.fwOpFlag = True


    # ---------

    def fwPass(self, W, y, l):

        if self.fwPassFlag:

           if l == len(self.HL)-1:

              self.grad[l] = self.HL[l].gradFun( W[l], self.z[l], y)

           else:

               self.nabZ[l+1] = self.HL[l+1].gradFunZVar(W[l+1], self.z[l+1], y)
               self.nabW[l]   = self.HL[l].gradFunWVar( W[l], self.z[l], y)


        else:  #first call

           if l == len(self.HL)-1:

              self.grad = [None]*len(self.HL)
              self.nabZ = [None]*len(self.HL)
              self.nabW = [None]*len(self.HL)

              self.grad[l] = self.HL[l].gradFun( W[l], self.z[l], y)

           else:

               self.nabZ[l+1] = self.HL[l+1].gradFunZVar(W[l+1], self.z[l+1], y)
               self.nabW[l]   = self.HL[l].gradFunwVar( W[l], self.z[l], y)


           self.fwPassFlag = True


    # ---------


    def bwPass(self, W, y, l):

        if l == len(self.HL)-1:
           pass # do nothing since the 1st level (output's perspetive) gradient
                # is already computed

        else:
            self.grad[l] = self.HL[l].gradFun(self.nabW[l], self.nabZ[l+1], self.z[l])

        return self.grad[l]
    # ---------

    def computeStats(self, W, alpha, k, g):

        cost  = self.costFun(W)

        success = self.computeSuccess(W)

        return np.array([k, cost, alpha, success])

    # ---------

    def costFun(self, W):

        for l in range( len(self.HL) ):
            if l == 0:
                Z = self.HL[l].fwd(W[l], self.X)
            else:
                Z = self.HL[l].fwd(W[l], Z)

        fCost = self.HL[-1].costFun(W[-1], Z, self.Y)

        return(fCost)


    def computeSuccess(self, W):

        if self.Xtest is not None and self.Ytest is not None:
          for l in range( len(self.HL) ):
            if l == 0:
                Z = self.HL[l].fwd(W[l], self.Xtest)
            else:
                Z = self.HL[l].fwd(W[l], Z)

          success = self.HL[-1].computeSuccess(W[-1], Z, self.Ytest)

          return( success )

        else:
            return None

    # ---------

    def printStats(self, k, nEpoch, v):

        if k == 0:
            print('\n')

        if self.verbose > 0:

            if np.remainder(k,self.verbose)==0 or k==nEpoch-1:
                if v[k,3] is None:
                    print('{:>3d}\t {:.3e}\t {:.2e}'.format(int(v[k,0]),v[k,1],v[k,2]))
                else:
                    print('{:>3d}\t {:.3e}\t {:.2e}\n \t \
                          success rate (training): {:.3e}'.format(int(v[k,0]),v[k,1],v[k,2],v[k,3]))

        return

    # ---------

    def gradNorm(self, g, ord=2):

        pass

Layers (Softmax and dense) routines that are BP compatible

Softmax

class hosgdHLFunSoftMax(object):

    def __init__(self, nShp, lmbd=0, initSol=None):

        self.nShp    = nShp
        self.lmbd    = lmbd
        self.initSol = initSol      #  if not None --> reproducible initial solution

    # ---------

    def gradFun(self, W, z, y):

        ##z = self.X[:,np.ix_(n)].squeeze(1)      # select batch elements

        g = -z.dot( softmax(-z.transpose().dot(W),axis=1) )

        for k in range(z.shape[1]):
            g[:,y[k] ] += z[:,k]

        g /= float(g.shape[1])

        if self.lmbd > 0:
            g += self.lmbd*W

        return g

    # ---------

    def gradFunZVar(self, W, z, y):

        ## see notes: WH - (1/s)*WE,  E = exp(-W^T Z), s = sum(E, axis=0)

        #g = softmax(-z.transpose().dot(W),axis=1).dot( W.transpose() ).transpose()
        g = -W.dot( softmax(-z.transpose().dot(W),axis=1).transpose() )       # (1/s)*WE

        for k in range(z.shape[1]):     # g += WH
            g[:, k ] += W[:, y[k] ]

        return g

    # ---------

    def costFun(self, W, Z, Y):

        fCost = np.sum(np.log(np.sum(np.exp(Z.transpose().dot(W)), axis=1)))
        fCost += np.sum( Z*W[:,Y] )

        fCost /= float(Z.shape[1])

        if self.lmbd > 0:
           fCost += self.lmbd*(sum( np.power(W.ravel(),2) ))/float(Z.shape[1])

        return fCost

    # ---------

    def randSol(self, zShape):

        if self.initSol is None:
           rng = np.random.default_rng()
        else:
           rng = np.random.default_rng(self.initSol)

        return 0.01*rng.normal(size=[zShape, self.nShp] )

    # ---------

    def fwd(self, A, z, flag=True):

        return( z )

    # ---------

    def computeSuccess(self, W, Xtest, Ytest):

        classVal = -np.matmul(Xtest.transpose(), W)
        success = sum( np.argmax(classVal,axis=1) == Ytest )

        return float(success)/float(Xtest.shape[1])

Dense

class hosgdHLFunDense(object):

    def __init__(self, nShp, hlFun=hosgdDef.hlFun.dense, actFun=None, par1=None, par2=None, initSol=None):

        self.nShp = nShp
        self.initSol  = initSol      #  if not None --> reproducible initial solution

        self.par1   = par1
        self.par2   = par2

        self.hlFun  = hlFun

        self.AFclass = hosgdFunAF(actFun=actFun, par1=par1, par2=par2)
        self.actFun = self.AFclass.sel_AF()

    # ---------

    def gradFunWVar(self, W, z, y):

        # Note: nabla actFun( W.dot(z) ) = actFun( W.dot(z), True ).dot( z.transpose() ),
        #       nonetheless, since z is accessible, here only actFun( W.dot(z), True ) is returned

        return self.actFun( W.dot(z), True )

    # ---------

    def gradFunZVar(self, W, z, y):

        return 1.

    # ---------

    def gradFun(self, nabW, nabZ, z):

        return (nabW*nabZ).dot( z.transpose() )

    # ---------

    def costFun(self, W, Z, Y):

        pass

    # ---------

    def fwd(self, A, z, flag=True):

        if flag:
           return( self.actFun( A.dot(z), False ) )
        else:
           return( z )

    # ---------

    def randSol(self, zShape):

        if self.initSol is None:
           rng = np.random.default_rng()
        else:
           rng = np.random.default_rng(self.initSol)

        return 0.01*rng.normal(size=[self.nShp, zShape] )

Full solution

Routines to execute example and plot results

def exHLMultiClass(nEpoch, blkSize, alpha, dataset='mnist', actFun=hosgdDef.actFun.ReLU, lrPolicy=hosgdDef.lrSGD.Cte, lrSDecay=20, lrTau = 0.5, verbose=5):

    # Examples
    #  HL.exHLMultiClass(20, 128, np.array([0.001,0.00025]), dataset='cifar10', verbose=10, actFun=hosgdDef.actFun.ReLU)
    # sol = HL.exHLMultiClass(50, 128, np.array([0.001,0.00025]), dataset='cifar10', verbose=10,
    #                         actFun=hosgdDef.actFun.ReLU, lrPolicy=hosgdDef.lrSGD.StepDecay, lrSDecay=10, lrTau=0.5)
    #  w, stats = H.exMultiClass(25, 64, 0.02, dataset='cifar10', lrPolicy=hosgdDef.lrSGD.StepDecay, lrSDecay=10, lrTau=0.5)

    # --- Load data
    # -------------------

    flagDataset = False

    if dataset.lower() == 'mnist':
       (X, Y), (Xtest, Ytest) = mnist.load_data()
       flagDataset = True

    if dataset.lower() == 'cifar10':
       (X, Y), (Xtest, Ytest) = cifar10.load_data()
       flagDataset = True

    if flagDataset is False:
       print('Dataset {} is no valid'.format(dataset))
       return

    if verbose == 1:
       print('Shape (training data):', X.shape)
       print('Shape (valid/testing data):', Xtest.shape)
       print('\n')

    # pre-processing
    X     = stdPreproc(X)
    Xtest = stdPreproc(Xtest)


    # Data vectorization
    X     = np.transpose( X.reshape(X.shape[0], np.array(X.shape[1::]).prod()) )
    Xtest = np.transpose( Xtest.reshape(Xtest.shape[0], np.array(Xtest.shape[1::]).prod()) )
    Y     = Y.ravel()
    Ytest = Ytest.ravel()


    # --- Layers
    # ----------------------

    HL = []
    HL.append( hosgdHLFunDense(nShp=512, actFun=actFun) )
    HL.append( hosgdHLFunSoftMax(nShp=10) )

    # --- Optimization model
    # ----------------------

    OptProb = hosgdHLOptProb(X, Y, 10, HL, verbose=verbose, Xtest=Xtest, Ytest=Ytest)

    # --- HyperPar (this is kept in order to have a similar structure to the SD case (see Lab 1a)
    # ------------

    hyperP = hosgdHyperFunSGD(lrPolicy=lrPolicy, lrSDecay=lrSDecay, lrTau=lrTau)

    # --- Select SGD sgdVariant
    # ----------------------

    if isinstance(alpha,float):

       sgdVar = hosgdBBSGD(alpha, hyperP)
       mtmVar = hosgdBBSGDmtm(alpha, hyperP,gamma=0.9)

    else:
       sgdVar = hosgdBBSGD(alpha[0], hyperP)
       mtmVar = hosgdBBSGDmtm(alpha[1], hyperP,gamma=0.9)

    # --- Call GD
    # -----------

    W = []
    stats = []
    nameVar = []

    # SGD
    sol = hosgdSGDHLSover(OptProb, sgdVar, nEpoch, blkSize)
    W.append(sol[0])
    stats.append(sol[1])
    nameVar.append('SGD')

    # Momentum
    sol = hosgdSGDHLSover(OptProb, mtmVar, nEpoch, blkSize)
    W.append(sol[0])
    stats.append(sol[1])
    nameVar.append('SGD-MTM')


    # --- Plot results
    # ----------------

    plotSGDStatsList(stats, nameVar)

    return W, stats

Simple Simulation (CIFAR-10)

import numpy as np
import HoSGDdefs as hosgdDef
import HoSGDlabHL  as HL

sol = HL.exHLMultiClass(50, 128, np.array([0.001,0.00025]), dataset='cifar10', verbose=10, actFun=hosgdDef.actFun.ReLU)

Sim Sim