๐Ÿ˜‡
Deep Multi-Agent Reinforcement Learning
  • Deep Multi-Agent Reinforcement Learning
  • Abstract & Contents
    • Abstract
  • 1. Introduction
    • 1. INTRODUCTION
      • 1.1 The Industrial Revolution, Cognition, and Computers
      • 1.2 Deep Multi-Agent Reinforcement-Learning
      • 1.3 Overall Structure
  • 2. Background
    • 2. BACKGROUND
      • 2.1 Reinforcement Learning
      • 2.2 Multi-Agent Settings
      • 2.3 Centralized vs Decentralized Control
      • 2.4 Cooperative, Zero-sum, and General-Sum
      • 2.5 Partial Observability
      • 2.6 Centralized Training, Decentralized Execution
      • 2.7 Value Functions
      • 2.8 Nash Equilibria
      • 2.9 Deep Learning for MARL
      • 2.10 Q-Learning and DQN
      • 2.11 Reinforce and Actor-Critic
  • I Learning to Collaborate
    • 3. Counterfactual Multi-Agent Policy Gradients
      • 3.1 Introduction
      • 3.2 Related Work
      • 3.3 Multi-Agent StarCraft Micromanagement
      • 3.4 Methods
        • 3.4.1 Independent Actor-Critic
        • 3.4.2 Counterfactual Multi-Agent Policy Gradients
        • 3.4.2.1 baseline lemma
        • 3.4.2.2 COMA Algorithm
      • 3.5 Results
      • 3.6 Conclusions & Future Work
    • 4 Multi-Agent Common Knowledge Reinforcement Learning
      • 4.1 Introduction
      • 4.2 Related Work
      • 4.3 Dec-POMDP and Features
      • 4.4 Common Knowledge
      • 4.5 Multi-Agent Common Knowledge Reinforcement Learning
      • 4.6 Pairwise MACKRL
      • 4.7 Experiments and Results
      • 4.8 Conclusion & Future Work
    • 5 Stabilizing Experience Replay
      • 5.1 Introduction
      • 5.2 Related Work
      • 5.3 Methods
        • 5.3.1 Multi-Agent Importance Sampling
        • 5.3.2 Multi-Agent Fingerprints
      • 5.4 Experiments
        • 5.4.1 Architecture
      • 5.5 Results
        • 5.5.1 Importance Sampling
        • 5.5.2 Fingerprints
        • 5.5.3 Informative Trajectories
      • 5.6 Conclusion & Future Work
  • II Learning to Communicate
    • 6. Learning to Communicate with Deep Multi-Agent ReinforcementLearning
      • 6.1 Introduction
      • 6.2 Related Work
      • 6.3 Setting
      • 6.4 Methods
        • 6.4.1 Reinforced Inter-Agent Learning
        • 6.4.2 Differentiable Inter-Agent Learning
      • 6.5 DIAL Details
      • 6.6 Experiments
        • 6.6.1 Model Architecture
        • 6.6.2 Switch Riddle
        • 6.6.3 MNIST Games
        • 6.6.4 Effect of Channel Noise
      • 6.7 Conclusion & Future Work
    • 7. Bayesian Action Decoder
      • 7.1 Introduction
      • 7.2 Setting
      • 7.3 Method
        • 7.3.1 Public belief
        • 7.3.2 Public Belief MDP
        • 7.3.3 Sampling Deterministic Partial Policies
        • 7.3.4 Factorized Belief Updates
        • 7.3.5 Self-Consistent Beliefs
      • 7.4 Experiments and Results
        • 7.4.1 Matrix Game
        • 7.4.2 Hanabi
        • 7.4.3 Observations and Actions
        • 7.4.4 Beliefs in Hanabi
        • 7.4.5 Architecture Details for Baselines and Method
        • 7.4.6 Hyperparamters
        • 7.4.7 Results on Hanabi
      • 7.5 Related Work
        • 7.5.1 Learning to Communicate
        • 7.5.2 Research on Hanabi
        • 7.5.3 Belief State Methods
      • 7.6 Conclusion & Future Work
  • III Learning to Reciprocate
    • 8. Learning with Opponent-Learning Awareness
      • 8.1 Introduction
      • 8.2 Related Work
      • 8.3 Methods
        • 8.3.1 Naive Learner
        • 8.3.2 Learning with Opponent Learning Awareness
        • 8.3.3. Learning via Policy gradient
        • 8.3.4 LOLA with Opponent modeling
        • 8.3.5 Higher-Order LOLA
      • 8.4 Experimental Setup
        • 8.4.1 Iterated Games
        • 8.4.2 Coin Game
        • 8.4.3 Training Details
      • 8.5 Results
        • 8.5.1 Iterated Games
        • 8.5.2 Coin Game
        • 8.5.3 Exploitability of LOLA
      • 8.6 Conclusion & Future Work
    • 9. DiCE: The Infinitely Differentiable Monte Carlo Estimator
      • 9.1 Introduction
      • 9.2 Background
        • 9.2.1 Stochastic Computation Graphs
        • 9.2.2 Surrogate Losses
      • 9.3 Higher Order Gradients
        • 9.3.1 Higher Order Gradient Estimators
        • 9.3.2 Higher Order Surrogate Losses
        • 9.3.3. Simple Failing Example
      • 9.4 Correct Gradient Estimators with DiCE
        • 9.4.1 Implement of DiCE
        • 9.4.2 Casuality
        • 9.4.3 First Order Variance Reduction
        • 9.4.4 Hessian-Vector Product
      • 9.5 Case Studies
        • 9.5.1 Empirical Verification
        • 9.5.2 DiCE For multi-agent RL
      • 9.6 Related Work
      • 9.7 Conclusion & Future Work
  • Reference
    • Reference
  • After
    • ๋ณด์ถฉ
    • ์—ญ์ž ํ›„๊ธฐ
Powered by GitBook
On this page

Was this helpful?

  1. III Learning to Reciprocate
  2. 9. DiCE: The Infinitely Differentiable Monte Carlo Estimator

9.4 Correct Gradient Estimators with DiCE

์ด๋ฒˆ section์—์„œ๋Š” ์ด๋ฅผ ๋ชจ๋‘ ํ•ด๊ฒฐํ•  Infinitely Differentiable Monte-Carlo Estimator(DiCE)๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” SCG์—์„œ ์–ด๋–ค ์ฐจ์ˆ˜์˜ ๋ฏธ๋ถ„๋„ ์ •ํ™•ํ•˜๊ฒŒ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋Š” ์‹ค์šฉ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ํŠน์ • ์ฐจ์ˆ˜์˜ ๋ฏธ๋ถ„์„ ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ 9.3.1์— ๋‚˜์˜จ ๋ฐฉ๋ฒ•์„ ์žฌ๊ท€์ ์œผ๋กœ ๊ณ„์† ์‚ฌ์šฉํ•˜๋ฉด ๋˜๊ฒ ์ง€๋งŒ, ์ด๋Š” ๋‘ ๊ฐ€์ง€ ๊ฒฐ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ฒซ๋ฒˆ์งธ๋กœ gradient๋ฅผ ์ด๋ ‡๊ฒŒ ์ •์˜ํ•˜๋Š” ๊ฒƒ์ด auto-diff library์— ์ ์šฉํ•˜๊ธฐ ํž˜๋“ค๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๋‘˜์งธ๋กœ, ๋‹จ์ˆœํ•˜๊ฒŒ gradient estimator๋ฅผ ๊ตฌํ•˜๋ฉด โˆ‡ฮธf(x;ฮธ)โ‰ g(x;ฮธ) \nabla_{\theta}f(x;\theta) \neq g(x;\theta)โˆ‡ฮธโ€‹f(x;ฮธ)๎€ =g(x;ฮธ)์ด๊ธฐ ๋•Œ๋ฌธ์— ์ œ๋Œ€๋กœ ์—…๋ฐ์ดํŠธ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์‹œ์ž‘์ „์— ์•ž์—์„œ ์ •์˜ํ•œ ๊ฒƒ๊ณผ ๊ฐ™์ด L=E[โˆ‘cโˆˆCc] \mathcal{L} = \mathbb{E}[\sum_{c\in\mathcal{C}}c]L=E[โˆ‘cโˆˆCโ€‹c]๋ฅผ SCG์—์„œ์˜ objective๋กœ ์ •์˜ํ•˜๊ณ  ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ๋ชจ๋“  ์˜์กด์„ฑ์„ ๋งŒ์กฑํ•˜๋Š” gradient estimator๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โˆ‡ฮธL=E[โˆ‘cโˆˆC(cโˆ‘wโˆˆWcโˆ‡ฮธlogโกp(wโˆฃDEPSw)+โˆ‡ฮธc(DEPSc))]โ‹ฏ(1)\nabla_\theta\mathcal{L} = \mathbb{E}[\sum_{c\in\mathcal{C}}(c\sum_{w\in \mathcal{W}_c}\nabla_\theta\log p(w|\mathrm{DEPS}_w)+ \nabla_\theta c(\mathrm{DEPS}_c))] \cdots (1)โˆ‡ฮธโ€‹L=E[โˆ‘cโˆˆCโ€‹(cโˆ‘wโˆˆWcโ€‹โ€‹โˆ‡ฮธโ€‹logp(wโˆฃDEPSwโ€‹)+โˆ‡ฮธโ€‹c(DEPScโ€‹))]โ‹ฏ(1)

Wc \mathcal{W}_cWcโ€‹๋Š” stochastic nodes์— ์†ํ•˜๊ณ , cost nodes์— ์˜ํ–ฅ์„ ๋ผ์น˜๋ฉด์„œ ฮธ\thetaฮธ์— ์˜ํ–ฅ์„ ๋ฐ›๋Š” ๋ชจ๋“  node๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.ancestors node์— ์ž˜ ์กฐ๊ฑดํ™”๋˜์—ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ  ์ด์ œ๋ถ€ํ„ฐ DEPS์˜ ํ‘œ๊ธฐ์— ๋Œ€ํ•ด ์ƒ๋žตํ•ด์„œ ํ‘œ๊ธฐํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ด์ „ ๋ถ€ํ„ฐ ์†Œ๊ฐœํ–ˆ์ง€๋งŒ, DiCE์—์„œ๋Š” ๋†’์€ ์ฐจ์ˆ˜์˜ ๋ฏธ๋ถ„์„ ์ •ํ™•ํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด MagicBox โ–ก\squareโ–ก๋ผ๋Š” operator๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , input์œผ๋กœ๋Š” stochastic nodes W\mathcal{W}W, ๊ทธ๋ฆฌ๊ณ  ์•„๋ž˜์™€ ๊ฐ™์€ ๋‘๊ฐ€์ง€ ์„ฑ์งˆ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

  • โ–ก(W)โ†’1\square (\mathcal{W}) \rightarrow 1โ–ก(W)โ†’1

  • โˆ‡ฮธโ–ก(W)=โ–ก(W)โˆ‘wโˆˆWโˆ‡ฮธlogโก(p(w;ฮธ))\nabla_{\theta}\square(\mathcal{W})=\square (\mathcal{W})\sum_{w \in \mathcal{W}}\nabla_{\theta}\log(p(w;\theta))โˆ‡ฮธโ€‹โ–ก(W)=โ–ก(W)โˆ‘wโˆˆWโ€‹โˆ‡ฮธโ€‹log(p(w;ฮธ))

์ฒซ๋ฒˆ์งธ ์„ฑ์งˆ์˜ โ†’\rightarrowโ†’๋Š” ํ‰๊ฐ€ํ•œ๋‹ค(evaluates to)๋ผ๋Š” ์˜๋ฏธ๋กœ ๋ชจ๋“  gradient์˜ ๊ฐ™์Œ์„ ์˜๋ฏธํ•˜๋Š” full equality(=)์™€๋Š” ๋Œ€์กฐ์ ์ž…๋‹ˆ๋‹ค. auto-diff์—์„œ๋Š” ์ด๋ฅผ forward pass evaluation์˜ ์˜๋ฏธ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋‘๋ฒˆ์งธ ์„ฑ์งˆ์€ โ–ก\squareโ–ก๋ฅผ ์‚ฌ์šฉํ•ด์„œ sample์ด ์–ด๋””์„œ sampling๋๋Š”์ง€ ๊ทธ ๋ถ„ํฌ์— ๋Œ€ํ•œ ์˜์กด์„ฑ์„ ๋ณด์ž…๋‹ˆ๋‹ค.(www์— ๋Œ€ํ•œ ํ™•๋ฅ  ํ•ฉ ํ˜•ํƒœ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.) ๊ทธ๋ฆฌ๊ณ  ๋ฏธ๋ถ„ํ•˜๋ฉด log likelihood trick์„ ์ด์šฉํ•ด logํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋‚œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์ด ์„ฑ์งˆ์„ ๋งŒ์กฑํ•˜๋ฉด ์ฒซ๋ฒˆ์งธ ์„ฑ์งˆ์€ ์‰ฝ๊ฒŒ ๋งŒ์กฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (์ด ํ™•๋ฅ  ํ•ฉ์ด 1์ด๋ฏ€๋กœ)

๋‘๋ฒˆ์งธ ํŠน์„ฑ์„ ๋งŒ์กฑํ•œ๋‹ค๋ฉด, L=E[โˆ‘cโˆˆCc] \mathcal{L} = \mathbb{E}[\sum_{c\in\mathcal{C}}c] L=E[โˆ‘cโˆˆCโ€‹c]์ธ objective์— ๋Œ€ํ•ด ๋‹ค์Œ๊ฐ™์ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Lโ–ก=โˆ‘cโˆˆCโ–ก(Wc)cย ย (โˆตโ–ก(Wc)โ†’1) \mathcal{L}_\square = \sum_{c\in\mathcal{C}}\square(\mathcal{W}_c)c \ \ (\because \square(\mathcal{W}_c) \rightarrow 1)Lโ–กโ€‹=โˆ‘cโˆˆCโ€‹โ–ก(Wcโ€‹)cย ย (โˆตโ–ก(Wcโ€‹)โ†’1)

์ด Lโ–ก\mathcal{L}_{\square}Lโ–กโ€‹์„ ๊ฐ€์ง€๊ณ  ์–ด๋–ป๊ฒŒ ์ •ํ™•ํ•˜๊ฒŒ ๊ณ ์ฐจ๋ฏธ๋ถ„์„ ํ•  ์ˆ˜ ์žˆ๋Š”์ง€์— ๋Œ€ํ•ด ์ฆ๋ช…ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Theroem1.ย ย ย ย ย ย E[โˆ‡ฮธnLโ–ก]โ†’โˆ‡ฮธnL,โˆ€nโˆˆ{0,1,2,โ‹ฏโ€‰}\bm{\mathrm{Theroem 1.\ \ \ \ \ \ }} \mathbb{E}[\nabla^n_{\theta}\mathcal{L}_\square] \rightarrow \nabla^n_\theta\mathcal{L},\forall n \in \{0,1,2,\cdots \}Theroem1.ย ย ย ย ย ย E[โˆ‡ฮธnโ€‹Lโ–กโ€‹]โ†’โˆ‡ฮธnโ€‹L,โˆ€nโˆˆ{0,1,2,โ‹ฏ}

๋ชจ๋“  cost nodes cโˆˆCc \in \mathcal{C}cโˆˆC์— ๋Œ€ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

ย ย ย ย ย c0ย ย ย =ย ย ย ย ย ย cE[cn+1]=โˆ‡ฮธE[cn]\ \ \ \ \ c^0 \ \ \ = \ \ \ \ \ \ c \\ \mathbb{E}[c^{n+1}] = \nabla_\theta\mathbb{E}[c^n]ย ย ย ย ย c0ย ย ย =ย ย ย ย ย ย cE[cn+1]=โˆ‡ฮธโ€‹E[cn]

์ฆ‰ cnc^ncn๋Š” objective E[c]\mathbb{E}[c]E[c]์˜ n์ฐจ ๋ฏธ๋ถ„๊ฐ’์ž…๋‹ˆ๋‹ค.

๋‹ค์Œ์œผ๋กœ cโ–กnc^n_{\square} cโ–กnโ€‹์€ cnโ–ก(Wcn)c^n\square(\mathcal{W}_{c^n})cnโ–ก(Wcnโ€‹)์ธ๋ฐ, magicbox operator์˜ ์ฒซ๋ฒˆ์งธ ํŠน์„ฑ์œผ๋กœ ์ธํ•ด, โ–กWcn \square \mathcal{W}_c^nโ–กWcnโ€‹์€ 1์ด ๋˜์–ด, cโ–กnโ†’cnc^n_\square \rightarrow c^ncโ–กnโ€‹โ†’cn์ž„์„ ์•Œ์•˜์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด, cโ–กnc^n_\squarecโ–กnโ€‹๋˜ํ•œ objective์˜ n๋ฒˆ์งธ ๋ฏธ๋ถ„๊ฐ’์ด๋ž‘ ๊ฐ™๋‹ค๋Š” ์˜๋ฏธ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด, ๋งˆ์ง€๋ง‰์œผ๋กœ โˆ‡ฮธcโ–กn=cโ–กn+1\nabla_{\theta}c^n_\square = c^{n+1}_\squareโˆ‡ฮธโ€‹cโ–กnโ€‹=cโ–กn+1โ€‹์ž„์„ ๋ณด์ด๋ฉด n์ฐจ ๋ฏธ๋ถ„ ์ „์ฒด์— ๋Œ€ํ•ด magicbox operator๋กœ ๊ตฌํ•  ์ˆ˜ ์žˆ๊ณ , ๊ทธ๊ฒƒ์ด ์‹ค์ œ ๋ฏธ๋ถ„๊ฐ’๊ณผ ๊ฐ™๋‹ค๋Š” ์˜๋ฏธ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

โˆ‡ฮธcโ–กn=โˆ‡ฮธ(cnโ–ก(Wcn))\nabla_\theta c^n_\square = \nabla_\theta(c^n\square(\mathcal{W}_{c^n}))โˆ‡ฮธโ€‹cโ–กnโ€‹=โˆ‡ฮธโ€‹(cnโ–ก(Wcnโ€‹))

=cnโˆ‡ฮธโ–ก(Wcn)+โˆ‡ฮธ(Wcn)โ–กcn= c^n\nabla_\theta\square(\mathcal{W}_{c^n})+ \nabla_\theta (\mathcal{W}_{c^n}) \square c^n=cnโˆ‡ฮธโ€‹โ–ก(Wcnโ€‹)+โˆ‡ฮธโ€‹(Wcnโ€‹)โ–กcn

=cnโ–ก(Wcn)(โˆ‘wโˆˆWcnโˆ‡ฮธlogโก(p(w;ฮธ)))+โ–ก(Wcn)โˆ‡ฮธcn= c^n\square(\mathcal{W}_{c^n})(\sum_{w\in\mathcal{W}_{c^n}}\nabla_\theta\log(p(w;\theta)))+ \square(\mathcal{W}_{c^n}) \nabla_\theta c^n=cnโ–ก(Wcnโ€‹)(โˆ‘wโˆˆWcnโ€‹โ€‹โˆ‡ฮธโ€‹log(p(w;ฮธ)))+โ–ก(Wcnโ€‹)โˆ‡ฮธโ€‹cn

=โ–ก(Wcn)(โˆ‡ฮธcn+cnโˆ‘wโˆˆWcnโˆ‡ฮธlogโก(p(w;ฮธ)))โ‹ฏ(9.4.4)= \square(\mathcal{W}_{c^n}) (\nabla_\theta c^n+c^n\sum_{w\in\mathcal{W}_{c^n}}\nabla_\theta\log(p(w;\theta))) \cdots (9.4.4)=โ–ก(Wcnโ€‹)(โˆ‡ฮธโ€‹cn+cnโˆ‘wโˆˆWcnโ€‹โ€‹โˆ‡ฮธโ€‹log(p(w;ฮธ)))โ‹ฏ(9.4.4)

โ–ก(Wcn+1)cn+1=cโ–กn+1โ‹ฏ(9.4.5) \square(\mathcal{W}_{c^{n+1}})c^{n+1} = c^{n+1}_\square \cdots (9.4.5)โ–ก(Wcn+1โ€‹)cn+1=cโ–กn+1โ€‹โ‹ฏ(9.4.5)

์ด ๋•Œ, (9.4.4)์—์„œ (9.4.5)๋กœ๊ฐˆ ๋•Œ, ๋‘๊ฐ€์ง€ ํ…Œํฌ๋‹‰์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ฒซ๋ฒˆ์งธ๋กœ, L=E[cn] \mathcal{L} = \mathbb{E}[c^n]L=E[cn]์˜ ํ˜•ํƒœ๋ฅผ ๋ณธ๋ฌธ ์œ„(1)ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•ด ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ฒŒ ๋˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

cn+1=โˆ‡ฮธcn+cnโˆ‘wโˆˆWcnโˆ‡ฮธlogโกp(w;ฮธ) c^{n+1} = \nabla_{\theta}c^n + c^n \sum_{w \in \mathcal{W}_{c^n}}\nabla_\theta \log p(w;\theta)cn+1=โˆ‡ฮธโ€‹cn+cnโˆ‘wโˆˆWcnโ€‹โ€‹โˆ‡ฮธโ€‹logp(w;ฮธ)

์ด๋ฅผ ์ž์„ธํžˆ ๋ณด๋ฉด (9.4.4)์˜ ํ‘œํ˜„๊ณผ ๊ฐ™์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‘˜ ์งธ๋กœ, Wcn\mathcal{W}_{c^n}Wcnโ€‹๊ณผ Wcn+1\mathcal{W}_{c^{n+1}}Wcn+1โ€‹์€ ๊ฐ™์€ stochastic nodes๋ฅผ ๊ฐ€๋ฆฌํ‚ค๊ณ ์žˆ์„ ๊ฒƒ์ด๋ฏ€๋กœ, Wcn=Wcn+1\mathcal{W}_{c^n} =\mathcal{W}_{c^{n+1}}Wcnโ€‹=Wcn+1โ€‹์ด ์ž๋ช…ํ•ฉ๋‹ˆ๋‹ค.

Previous9.3.3. Simple Failing ExampleNext9.4.1 Implement of DiCE

Last updated 4 years ago

Was this helpful?